In this report, we study a method to detect birds from a panorama video aided by Sound Source Localization (SSL). In the video, birds are relatively tiny to be detected from panorama frames. In the proposed method, birds are roughly localized in audio data by SSL algorithms, then corresponding regions are cropped from video frames and input to a Convolutional Neural Network (CNN) for detection. By narrowing down the searching area with SSL, relatively tiny birds in large video frames can be detected, and both detection precision and time performance are improved. Finally, we applied our method to a bird dataset and confirmed its effectiveness.
Type: Talk at Meeting of the Technical Committee on Media Experience and Virtual Environment, MVE (メディアエクスペリエンス・バーチャル環境基礎研究会)
Publication date: March 2022