Akinrinmade, A. and Adetiba, E. and Badejo, J. A. and Oshin, Oluwadamilola (2023) An Active Speaker Detection Method in Videos using Standard Deviations of Color Histogram. In: 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals (SEB-SDG), 05-04-April 2023, Omu-Aran, Nigeria.
PDF
Download (99kB) |
Abstract
Active Speaker Detection (ASD) is a process that predicts who the speaker is amongst those whose faces appear in a video (if any) at any given point in time within the recorded video. This work presents a novel algorithm capable of detecting the active speakers in each video using the standard deviations of Color Histograms (CHs) computed at the mouth region from one frame to another. This paper relies on the assumption that the lips of an active speaker are in motion. They open and close and thus reveal the inner parts of the mouth, like the tongue, teeth, and the vocal cavity which are of diverse colors in the process of talking. It is possible to use already existing algorithms to detect the mouth region. This region can be analyzed during the speaking process for the changes in color activity, and this can be used to predict whether a user is speaking or not. If a person is not speaking, the lips are at rest the CH of such mouth regions such candidates would be stable. As a result, the standard deviations of such regions would be negligible. A threshold can be experimentally determined which is thus capable of predicting if a person is speaking or otherwise. This paper explores 53 online videos from Channels TV station, these videos were employed in the creation of 250 video clips. Each clip is between 15 to 60 seconds with a total of 3.6 hours. Each video contained the faces of at most two speakers in no particular order. Sometimes, only one of the speakers' faces appears, at other times both appear in the duration of the video. The status of the speakers whether active or not was manually labeled to be used for the performance evaluation of the proposed algorithm. This method was able to predict the active speakers with an accuracy of 99.19%.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Subjects: | T Technology > T Technology (General) T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science |
Depositing User: | ORIGBOEYEGHA |
Date Deposited: | 16 Sep 2024 12:06 |
Last Modified: | 16 Sep 2024 12:06 |
URI: | http://eprints.covenantuniversity.edu.ng/id/eprint/18407 |
Actions (login required)
View Item |