Affective video content analysis, such as estimating viewers’ evoked emotions on an input video, has been attracting increasing attention since it plays an important role in attractiveness computing and can be used for video creation and recommendation. In this field, researchers combine information from different modalities such as audio and vision, which leads to more accurate predictions. However, annotation is typically the hardest part in this kind of research, and intensified due to the subjectivity and ambiguity of emotion. As a result, people hardly agree on a ground truth even if annotators are familiar with emotions. Therefore, in this study, we use viewers’ textual comments to estimate evoked emotions and annotate social media videos automatically. To deal with subjectivity, we introduce label distribution learning, which targets to predict the probabilities of all classes instead of the domiant class, into multimodal evoked emotion recognition. The experimental results show the validity of this idea.
Type: Poster at MIRU Symposium (画像の認識・理解シンポジウム)
Publication date: July 2022