A better understanding of human perception of videos can help with multi-modal tasks like recommendations, retrieval, and more. While there has been research on the visual sentiment of images, there is currently no model for the visual sentiment of videos. In applications like social media platforms, the sentiment of videos can be a crucial factor for the popularity of a video and its similarity to others. In this research, we propose a framework to model the sentiment of social media videos by first analyzing the sentiment of its respective user comments. Through this, we decide an emotion annotation for each video in our base video dataset. Using the generated annotations, we train a model towards the prediction of emotion by analyzing audio-visual features. A preliminary study can show promising performance in predicting the annotations. For future work, we plan to include sentiment-based audio and visual models as well as textual analysis of video subtitles. Furthermore, we want to investigate sentiment differences across different communities.
Type: Poster at MIRU Symposium (画像の認識・理解シンポジウム)
Publication date: August 2020