D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach

Reen al la antaŭa paĝo

Aŭtoroj: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

Resumo:

Most content summarization models from the field of natural language processing summarize the textual contents of a collection of documents or paragraphs. In contrast, summarizing the visual contents of a collection of images has not been researched to this extent. In this paper, we present a framework for summarizing the visual contents of an image collection. The key idea is to collect the scene graphs for all images in the image collection, create a combined representation, and then generate a visually summarizing caption using a scene-graph captioning model. Note that this aims to summarize common contents across all images in a single caption rather than describing each image individually. After aggregating all the scene graphs of an image collection into a single scene graph, we normalize it by using an additional concept generalization component. This component selects the common concept in each sub-graph with ConceptNet based on word embedding techniques. Lastly, we refine the captioning results by replacing a specific noun phrase with a common concept from the concept generalization component to improve the captioning results. We construct a dataset for this task based on the MS-COCO dataset using techniques from image classification and image-caption retrieval. An evaluation of the proposed method on this dataset shows promising performance.

Tipo: 29th Intl. Conf. on MultiMedia Modeling (MMM2023)

Dato de publikigo: January 2023

DOI: 10.1007/978-3-031-27077-2_14

Prezento

Dosieroj

preprint

slides

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.