Access the full text.
Sign up today, get DeepDyve free for 14 days.
J. Barker, Ning Ma, André Coy, M. Cooke (2010)
Speech fragment decoding techniques for simultaneous speaker identification and speech recognitionComput. Speech Lang., 24
Ron Weiss, D. Ellis (2007)
Monaural Speech Separation using Source-Adapted Models2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
M. Radfar, R. Dansereau (2007)
Single-Channel Speech Separation Using Soft Mask FilteringIEEE Transactions on Audio, Speech, and Language Processing, 15
M. Gales (1998)
Maximum likelihood linear transformations for HMM-based speech recognitionComput. Speech Lang., 12
(2009)
Foundat . and
T. Virtanen (2006)
Speech recognition using factorial hidden Markov models for separation in the feature space
Geoffrey Hinton, L. Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, N. Jaitly, A. Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury (2017)
Top Downloads in IEEE Xplore [Reader's Choice]IEEE Signal Processing Magazine, 34
Geoffrey Hinton (2012)
A Practical Guide to Training Restricted Boltzmann Machines
Abdel-rahman Mohamed, George Dahl, Geoffrey Hinton (2012)
Acoustic Modeling Using Deep Belief NetworksIEEE Transactions on Audio, Speech, and Language Processing, 20
(2010)
In Proc . annual conference of international speech communication association . ( INTERSPEECH ) .
Geoffrey Hinton, Simon Osindero, Y. Teh (2006)
A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 18
Chao Weng, Dong Yu, M. Seltzer, J. Droppo (2015)
Deep Neural Networks for Single-Channel Multi-Talker Speech RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 23
Yang Shao, Soundararajan Srinivasan, Z. Jin, Deliang Wang (2010)
A computational auditory scene analysis system for speech segregation and robust speech recognitionComput. Speech Lang., 24
D. Wang, G. Brown (2006)
Computational Auditory Scene Analysis: Principles, Algorithms, and ApplicationsIEEE Trans. Neural Networks, 19
Jun Du, Yanhui Tu, Lirong Dai, Chin-Hui Lee (2016)
A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural NetworksIEEE/ACM Transactions on Audio, Speech, and Language Processing, 24
Yaodong Zhang, James Glass (2009)
Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams2009 IEEE Workshop on Automatic Speech Recognition & Understanding
M. Cooke, J. Barker, S. Cunningham, Xu Shao (2006)
An audio-visual corpus for speech perception and automatic speech recognition.The Journal of the Acoustical Society of America, 120 5 Pt 1
Yong Wang, Yixin Yang, Shiduo Yu (2018)
Design of unidirectional acoustic probes with flexible directivity patterns using two acoustic particle velocity sensors.The Journal of the Acoustical Society of America, 144 1
D. Reynolds, R. Rose (1995)
Robust text-independent speaker identification using Gaussian mixture speaker modelsIEEE Trans. Speech Audio Process., 3
(2009)
This PDF file includes: Materials and Methods
C. Nadeu, Dusan Macho, J. Hernando (2000)
Time and frequency filtering of filter-bank energies for robust HMM speech recognitionSpeech Commun., 34
M. Cooke, J. Hershey, Steven Rennie (2010)
Monaural speech separation and recognition challengeComput. Speech Lang., 24
Zoubin Ghahramani, Michael Jordan (2001)
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES
(2016)
http://staffwww.dcs.shef.ac.uk/ people/M.Cooke/SpeechSeparationChallenge.htm
Po-Sen Huang, Minje Kim, M. Hasegawa-Johnson, P. Smaragdis (2015)
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 23
F. Seide, Gang Li, Xie Chen, Dong Yu (2011)
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription2011 IEEE Workshop on Automatic Speech Recognition & Understanding
T. Kristjansson, J. Hershey, P. Olsen, Steven Rennie, R. Gopinath (2006)
Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system
J. Ming, Timothy Hazen, James Glass (2006)
Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separationComput. Speech Lang., 24
Mehryar Mohri, Fernando Pereira, M. Riley (2002)
Weighted finite-state transducers in speech recognitionComput. Speech Lang., 16
(2007)
In Proc. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp. 114–117)
Matthias Zöhrer, Robert Peharz, F. Pernkopf (2015)
Representation Learning for Single-Channel Source Separation and Bandwidth ExtensionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 23
Yong Xu, Jun Du, Lirong Dai, Chin-Hui Lee (2014)
An Experimental Study on Speech Enhancement Based on Deep Neural NetworksIEEE Signal Processing Letters, 21
P. Boer, Dirk Kroese, Shie Mannor, R. Rubinstein (2005)
A Tutorial on the Cross-Entropy MethodAnnals of Operations Research, 134
We propose a novel speaker-dependent (SD) multi-condition (MC) training approach to joint learning of deep neural networks (DNNs) of acoustic models and an explicit speech separation structure for recognition of multi-talker mixed speech in a single-channel setting. First, an MC acoustic modeling framework is established to train a SD-DNN model in multi-talker scenarios. Such a recognizer significantly reduces the decoding complexity and improves the recognition accuracy over those using speaker-independent DNN models with a complicated joint decoding structure assuming the speaker identities in mixed speech are known. In addition, a SD regression DNN for mapping the acoustic features of mixed speech to the speech features of a target speaker is jointly trained with the SD-DNN based acoustic models. Experimental results on Speech Separation Challenge (SSC) small-vocabulary recognition show that the proposed approach under multi-condition training achieves an average word error rate (WER) of 3.8%, yielding a relative WER reduction of 65.1% from a top performance, DNN-based pre-processing only approach we proposed earlier under clean-condition training (Tu et al. 2016). Furthermore, the proposed joint training DNN framework generates a relative WER reduction of 13.2% from state-of-the-art systems under multi-condition training. Finally, the effectiveness of the proposed approach is also verified on the Wall Street Journal (WSJ0) task with medium-vocabulary continuous speech recognition in a simulated multi-talker setting.
Journal of Signal Processing Systems – Springer Journals
Published: Oct 4, 2017
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.