D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

Leverage semantic alignment of object relations for image captioning

Reen al la antaŭa paĝo

Aŭtoroj: Da Huo, Marc A. Kastner, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide

Resumo:

Image captioning is a popular task in vision and language, which aims to generate proper textual descriptions of images. Recently, some works use objects to ease image and text alignment for learning better cross-modal representation, resulting in good performance in this task. In this paper, we consider relation is also important for learning semantics, here we use relations between objects to explore if relations as a prior can also improve performance. First, we consider the annotated relations between objects, and use them as tags in an image captioning model for aligning the image and text. Moreover, we also aim at integrating relationships between text to image features. For this, we focus on the masking strategy and change the strategy from random masking to relation masking to further study the training strategy for enhancing semantic alignment of object relations. In the experiments, we found that considering object relations improved the captioning performance in common metrics. Further, when changing the masking strategy for focusing on a specific part in caption to be masked when training, we found that it could lead to capturing more object relations of an image, while it destroyed the randomness when training, the performance decreases and the relations appear to be not compatible with the image contents.

Tipo: Poster at MIRU Symposium (画像の認識・理解シンポジウム)

Dato de publikigo: July 2023

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.