University Links: Home Page | Site Map
Covenant University Repository

An Active Speaker Detection Method in Videos using Standard Deviations of Color Histogram

Akinrinmade, A. and Adetiba, E. and Badejo, Joke A. (2022) An Active Speaker Detection Method in Videos using Standard Deviations of Color Histogram. Research Square.

[img] PDF
Download (964kB)

Abstract

Active Speaker Detection (ASD) refers to the process of predicting who amongst a number of speakers whose faces appear on screen is speaking (if any) at any given time within the duration of a video. This paper proposes a novel method for determining active speakers in videos based on the standard deviations of Color Histograms (CHs) of the mouth region from frame-to-frame. The reasoning behind this is that the lips of an active speaker will open and close exposing and concealing the inner contents of the mouth such as the vocal cavity, teeth and tongue at fairly regular intervals in the process which are of different colors. Therefore, if the mouth region can be accurately localized and the changes in the color activities in that region analyzed during speaking such information can be used to detect if a person is actively speaking or not. The lips of a non-speaker are usually closed and at rest, so the CHs for such mouth region are expected to be fairly constant and as such the standard deviations should be low. If an experimentally determined threshold could be set, it can draw the line between active and non-active speakers. In this work, 53 videos available online from Channels TV news, one of Nigeria’s most popular TV stations were used to create 250 video clips totaling 3.6 hours, each ranging from between 15 seconds to 1 minute in such a way that the faces of two speakers were always simultaneously visible in any order in the duration of each video clip. The active speakersActive Speaker Detection (ASD) refers to the process of predicting who amongst a number of speakers whose faces appear on screen is speaking (if any) at any given time within the duration of a video. This paper proposes a novel method for determining active speakers in videos based on the standard deviations of Color Histograms (CHs) of the mouth region from frame-to-frame. The reasoning behind this is that the lips of an active speaker will open and close exposing and concealing the inner contents of the mouth such as the vocal cavity, teeth and tongue at fairly regular intervals in the process which are of different colors. Therefore, if the mouth region can be accurately localized and the changes in the color activities in that region analyzed during speaking such information can be used to detect if a person is actively speaking or not. The lips of a non-speaker are usually closed and at rest, so the CHs for such mouth region are expected to be fairly constant and as such the standard deviations should be low. If an experimentally determined threshold could be set, it can draw the line between active and non-active speakers. In this work, 53 videos available online from Channels TV news, one of Nigeria’s most popular TV stations were used to create 250 video clips totaling 3.6 hours, each ranging from between 15 seconds to 1 minute in such a way that the faces of two speakers were always simultaneously visible in any order in the duration of each video clip. The active speakers in each second of the video clips were manually labeled and used to evaluate the performance of the proposed methodology which achieved a prediction accuracy of up to 99.19%. in each second of the video clips were manually labeled and used to evaluate the performance of the proposed methodology which achieved a prediction accuracy of up to 99.19%.

Item Type: Article
Uncontrolled Keywords: Active Speaker Detection, Color Histograms, Standard Deviations
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: nwokealisi
Date Deposited: 16 Jan 2023 12:32
Last Modified: 16 Jan 2023 12:32
URI: http://eprints.covenantuniversity.edu.ng/id/eprint/16520

Actions (login required)

View Item View Item