Study - Automatic Speech Recognition`:` A Deep Learning Approach

Reading time ~2 minutes

Automatic Speech Recognition: A Deep Learning Approach

Automatic Speech Recognition (ASR) 은 사람의 음성과 기계간의 상호 작용을 꾀하기 위한 기술로, 다음과 같은 다양한 기술이 적용됩니다.

  • Gaussian mixture models (GMMs)
  • hidden Markov models (HMMs)
  • mel-frequency cepstral coefficients (MFCCs) and their derivatives
  • ngram language models (LMs)
  • discriminative training, and various adaptation techniques
  • GMM-HMM sequence discriminative training

이 책에서는 앞서 나열된 ASR 을 위한 기술들을 소개 및 설명하고 있습니다.

또한, 책에서는 ASR과 Deep Learning에 관련된 다양한 교재들도 소개하고 있습니다.

  • Deep Learning: Methods and Applications, by Li Deng and Dong Yu (June 2014)
  • Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, by Joseph Keshet, Samy Bengio (January 2009)
  • Speech Recognition Over Digital Channels: Robustness and Standards, by Antonio Peinado and Jose Segura (September 2006)
  • Pattern Recognition in Speech and Language Processing, by Wu Chou and Biing-Hwang Juang (February 2003)
  • Speech Processing—A Dynamic and Optimization-Oriented Approach, by Li Deng and Doug O’Shaughnessy (June 2003)
  • Spoken Language Processing: A Guide to Theory, Algorithm and System Development, by Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon (April 2001)
  • Digital Speech Processing: Synthesis, and Recognition, Second Edition, by Sadaoki Furui (June 2001)
  • Speech Communications: Human and Machine, Second Edition, by Douglas O’Shaughnessy (June 2000)
  • Speech and Language Processing—An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky and James Martin (April 2000)
  • Speech and Audio Signal Processing, by Ben Gold and Nelson Morgan (April 2000)
  • Statistical Methods for Speech Recognition, by Fred Jelinek (June 1997)
  • Fundamentals of Speech Recognition, by Lawrence Rabiner and Biing-Hwang Juang (April 1993)
  • Acoustical and Environmental Robustness in Automatic Speech Recognition, by Alex Acero (November 1992).

이 포스트의 시리즈에서는 이 책에서 다루는 다양한 내용들을 공부하고 정리한 내용들을 작성할 계획입니다. 책의 목차의 대제목은 다음과 같습니다.

  1. Introduction

    Part 1 Conventional Acoustic Models

  2. Gaussian Mixture Models
  3. Hidden Markov Models and the Variants

    Part 2 Deep Neural Networks

  4. Deep Neural Networks
  5. Advanced Model Initialization Techniques

    Part 3 Deep Neural Network-Hidden Markov Model Hybrid Systems for Automatic Speech Recognition

  6. Deep Neural Network-Hidden Markov Model Hybrid Systems
  7. Training and Decoding Speedup
  8. Deep Neural Network Sequence-Discriminative Training

    Part 4 Representation Learning in Deep Neural Networks

  9. Feature Representation Learning in Deep Neural Networks
  10. Fuse Deep Neural Network and Gaussian Mixture Model Systems
  11. Adaptation of Deep Neural Networks

    Part 5 Advanced Deep Models

  12. Representation Sharing and Transfer in Deep Neural Networks
  13. Recurrent Neural Networks and Related Models
  14. Computational Network
  15. Summary and Future Directions

References

[1] Dong Yu, Li Deng, Automatic Speech Recognition: A Deep Learning Approach, 2015