Automatic Speech Recognition: A Deep Learning Approach

Automatic Speech Recognition (ASR) 은 사람의 음성과 기계간의 상호 작용을 꾀하기 위한 기술로, 다음과 같은 다양한 기술이 적용됩니다.

이 책에서는 앞서 나열된 ASR 을 위한 기술들을 소개 및 설명하고 있습니다.

또한, 책에서는 ASR과 Deep Learning에 관련된 다양한 교재들도 소개하고 있습니다.

Deep Learning: Methods and Applications, by Li Deng and Dong Yu (June 2014)
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, by Joseph Keshet, Samy Bengio (January 2009)
Speech Recognition Over Digital Channels: Robustness and Standards, by Antonio Peinado and Jose Segura (September 2006)
Pattern Recognition in Speech and Language Processing, by Wu Chou and Biing-Hwang Juang (February 2003)
Speech Processing—A Dynamic and Optimization-Oriented Approach, by Li Deng and Doug O’Shaughnessy (June 2003)
Spoken Language Processing: A Guide to Theory, Algorithm and System Development, by Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon (April 2001)
Digital Speech Processing: Synthesis, and Recognition, Second Edition, by Sadaoki Furui (June 2001)
Speech Communications: Human and Machine, Second Edition, by Douglas O’Shaughnessy (June 2000)
Speech and Language Processing—An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky and James Martin (April 2000)
Speech and Audio Signal Processing, by Ben Gold and Nelson Morgan (April 2000)
Statistical Methods for Speech Recognition, by Fred Jelinek (June 1997)
Fundamentals of Speech Recognition, by Lawrence Rabiner and Biing-Hwang Juang (April 1993)
Acoustical and Environmental Robustness in Automatic Speech Recognition, by Alex Acero (November 1992).

이 포스트의 시리즈에서는 이 책에서 다루는 다양한 내용들을 공부하고 정리한 내용들을 작성할 계획입니다. 책의 목차의 대제목은 다음과 같습니다.

Introduction
Part 1 Conventional Acoustic Models
Gaussian Mixture Models
Hidden Markov Models and the Variants
Part 2 Deep Neural Networks
Deep Neural Networks
Advanced Model Initialization Techniques
Part 3 Deep Neural Network-Hidden Markov Model Hybrid Systems for Automatic Speech Recognition
Deep Neural Network-Hidden Markov Model Hybrid Systems
Training and Decoding Speedup
Deep Neural Network Sequence-Discriminative Training
Part 4 Representation Learning in Deep Neural Networks
Feature Representation Learning in Deep Neural Networks
Fuse Deep Neural Network and Gaussian Mixture Model Systems
Adaptation of Deep Neural Networks
Part 5 Advanced Deep Models
Representation Sharing and Transfer in Deep Neural Networks
Recurrent Neural Networks and Related Models
Computational Network
Summary and Future Directions

References

[1] Dong Yu, Li Deng, Automatic Speech Recognition: A Deep Learning Approach, 2015