Proceedings Article, Paper
@InProceedings
Beitrag in Tagungsband, Workshop


Show entries of:

this year (2019) | last year (2018) | two years ago (2017) | Notes URL

Action:

login to update

Options:








Author, Editor

Author(s):

Li, Zhao
Herfet, Thorsten
Grochulla, Martin
Thormählen, Thorsten

dblp
dblp
dblp
dblp

Not MPG Author(s):

Li, Zhao
Herfet, Thorsten

Editor(s):





BibTeX cite key*:

Grochulla2012b

Title, Booktitle

Title*:

Audio-Visual Multiple Active Speaker Localisation in Reverberant Environments

Booktitle*:

15th International Conference on Digital Audio Effects (DAFx-12)

Event, URLs

URL of the conference:

http://dafx12.york.ac.uk/

URL for downloading the paper:


Event Address*:

York, UK

Language:

English

Event Date*
(no longer used):


Organization:


Event Start Date:

17 September 2012

Event End Date:

21 September 2012

Publisher

Name*:


This proceedings has no publisher!

URL:


Address*:

York, UK

Type:


Vol, No, Year, pp.

Series:


Volume:


Number:


Month:

September

Pages:

1-8

Year*:

2012

VG Wort Pages:


ISBN/ISSN:


Sequence Number:


DOI:




Note, Abstract, ©


(LaTeX) Abstract:

Localisation of multiple active speakers in natural environments with only two microphones is a challenging problem. Reverberation degrades the performance of speaker localisation based exclusively on directional cues. This paper presents an approach based on audio-visual fusion. The audio modality performs the multiple speaker localisation using the {\em Skeleton} method, energy weighting, and precedence effect filtering and weighting. The video modality performs the active speaker detection based on the analysis of the lip region of the detected speakers. The audio modality alone has problems with localisation accuracy, while the video modality alone has problems with false detections. The estimation results of both modalities are represented as probabilities in the azimuth domain. A Gaussian fusion method is proposed to combine the estimates in a late stage. As a consequence, the localisation accuracy and robustness compared to the audio/video modality alone is significantly increased. Experimental results in different scenarios confirmed the improved performance of the proposed method.



Download
Access Level:

Internal

Correlation

MPG Unit:

Max-Planck-Institut für Informatik



MPG Subunit:

Computer Graphics Group

Appearance:

MPII WWW Server, MPII FTP Server, MPG publications list, university publications list, working group publication list, Fachbeirat, VG Wort



BibTeX Entry:

@INPROCEEDINGS{Grochulla2012b,
AUTHOR = {Li, Zhao and Herfet, Thorsten and Grochulla, Martin and Thorm{\"a}hlen, Thorsten},
TITLE = {Audio-Visual Multiple Active Speaker Localisation in Reverberant Environments},
BOOKTITLE = {15th International Conference on Digital Audio Effects (DAFx-12)},
YEAR = {2012},
PAGES = {1--8},
ADDRESS = {York, UK},
MONTH = {September},
}


Entry last modified by Anja Becker, 04/04/2013
Show details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)
Hide details for Edit History (please click the blue arrow to see the details)Edit History (please click the blue arrow to see the details)

Editor(s)
Martin Peter Grochulla
Created
02/08/2013 05:06:07 PM
Revisions
3.
2.
1.
0.
Editor(s)
Anja Becker
Oliver Klehm
Oliver Klehm
Martin Peter Grochulla
Edit Dates
04.04.2013 13:30:46
02/12/2013 07:21:12 PM
02/12/2013 06:27:30 PM
02/08/2013 05:06:07 PM