Dr. Marc A. Kastner

About me

Other languages

Deutsch

Esperanto

日本語

A multi-modal dataset for analyzing the imageability of concepts across modalities

Back to publications

Authors: Marc A. Kastner, Chihaya Matsuhira, Ichiro Ide, Shin'ichi Satoh

Abstract:

Recently, multi-modal applications bring a need for a human-like understanding of the perception differences across modalities. For example, while something might have a clear image in a visual context, it might be perceived as too technical in a textual context. Such differences related to a semantic gap make a transfer between modalities or a combination of modalities in multi-modal processing a difficult task. Imageability as a concept from Psycholinguistics gives promising insight to the human perception of vision and language. In order to understand cross- modal differences of semantics, we create and analyze a cross- modal dataset for imageability. We estimate three imageability values grounded in 1) a visual space from a large set of images, 2) a textual space from Web-trained word embeddings, and 3) a phonetic space based on word pronunciations. A subset of the corpus is evaluated with an existing imageability dictionary to ensure a basic generalization, but otherwise targets finding cross-modal differences and outliers. We visualize the dataset and analyze it regarding outliers and differences for each modality. As additional source of knowledge, part-of-speech and etymological origin of all words are estimated and analyzed in context of the modalities. The dataset of multi-modal imageability values and an interactive browser will be made publicly available.

Type: 4th IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR2021)

Publication date: September 2021

DOI: 10.1109/MIPR51284.2021.00039

Links: [ github ] [ supplemental visualizations ]

Presentation

Attached Files

slides

If you have questions or ideas about this research, feel free to leave a comment below or send me an email. I will reply quickly.