D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

A multi-modal dataset for analyzing the imageability of concepts across modalities

Reen al la antaŭa paĝo

Aŭtoroj: Marc A. Kastner, Chihaya Matsuhira, Ichiro Ide, Shin'ichi Satoh

Resumo:

Recently, multi-modal applications bring a need for a human-like understanding of the perception differences across modalities. For example, while something might have a clear image in a visual context, it might be perceived as too technical in a textual context. Such differences related to a semantic gap make a transfer between modalities or a combination of modalities in multi-modal processing a difficult task. Imageability as a concept from Psycholinguistics gives promising insight to the human perception of vision and language. In order to understand cross- modal differences of semantics, we create and analyze a cross- modal dataset for imageability. We estimate three imageability values grounded in 1) a visual space from a large set of images, 2) a textual space from Web-trained word embeddings, and 3) a phonetic space based on word pronunciations. A subset of the corpus is evaluated with an existing imageability dictionary to ensure a basic generalization, but otherwise targets finding cross-modal differences and outliers. We visualize the dataset and analyze it regarding outliers and differences for each modality. As additional source of knowledge, part-of-speech and etymological origin of all words are estimated and analyzed in context of the modalities. The dataset of multi-modal imageability values and an interactive browser will be made publicly available.

Tipo: 4th IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR2021)

Dato de publikigo: September 2021

DOI: 10.1109/MIPR51284.2021.00039

Linkoj: [ github ] [ supplemental visualizations ]

Prezento

Dosieroj

slides

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.