D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

Action Semantic Alignment for Image Captioning

Reen al la antaŭa paĝo

Aŭtoroj: Da Huo, Marc A. Kastner, Takahiro Komamizu, Ichiro Ide

Resumo:

Image captioning is one of the main goals in vision and language processing, which aims to generate proper descriptions of images. Recently, the attention mechanisms became crucial in captioning tasks, as they can capture global dependencies between modalities. Moreover, research have used objects detected from the input image as anchor points, so called object tags, to ease such alignments resulting in good performance for this task. In this paper, we newly introduce action information as a prior to further improve this. We propose adding action tags for training in image captioning. The action tags can learn alignment at action semantic level and catch the previously ignored dimension of action, that could be very important in image captioning. We found that adjusting action tags can be used to describe images in a dynamic style. Furthermore, we found it can actually lead to a significant enhancement compared with other methods in captioning performance for common metrics.

Tipo: 5th IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR2022)

Dato de publikigo: August 2022

DOI: 10.1109/MIPR54900.2022.00041

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.