Dr. Marc A. Kastner

Über mich

Towards captioning an image collection from a combined scene graph representation approach

Zurück zu Veröffentlichungen

Authoren: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide


Most content summarization models from the field of natural language processing summarize the textual contents of a collection of documents or paragraphs. In contrast, summarizing the visual contents of a collection of images has not been researched to this extent. In this paper, we present a framework for summarizing the visual contents of an image collection. The key idea is to collect the scene graphs for all images in the image collection, create a combined representation, and then generate a visually summarizing caption using a scene-graph captioning model. Note that this aims to summarize common contents across all images in a single caption rather than describing each image individually. After aggregating all the scene graphs of an image collection into a single scene graph, we normalize it by using an additional concept generalization component. This component selects the common concept in each sub-graph with ConceptNet based on word embedding techniques. Lastly, we refine the captioning results by replacing a specific noun phrase with a common concept from the concept generalization component to improve the captioning results. We construct a dataset for this task based on the MS-COCO dataset using techniques from image classification and image-caption retrieval. An evaluation of the proposed method on this dataset shows promising performance.

Typ: 29th Intl. Conf. on MultiMedia Modeling (MMM2023)

Veröffentlichungsdatum: To be published in Jan 2023

Wenn Sie Fragen oder Kommentare zu dieser Forschung haben, zögern Sie nicht einen Kommentar zu hinterlassen oder mir eine email zu schreiben. Ich werde mich zeitnahe zurückmelden.
© 2013-2022 Marc A. Kastner. Powered by KirbyCMS. Some rights reserved. Privacy policy.