Subscribe to our Newsletter

DeepSpeech: Scaling up end-to-end speech recognition

Baidu, chinese search engine, presented a state-of-the-art speech recognition system developed using end-to-end deep learning.
The authors claims that their "architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns afunction that is robust to such effects. We do not need a phoneme dictionary,nor even the concept of a “phoneme.” Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called DeepSpeech, outperforms previously published results on the widely studied Switchboard Hub5’00, achieving 16.5% error on the full test set. DeepSpeech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech system."
  
Structure of our RNN model and notation.
  
Here is the source of the original article: DeepSpeech: Scaling up end-to-end speech recognition
  

DSC Resources

Additional Reading

Views: 890

Comment

You need to be a member of BigDataNews to add comments!

Join BigDataNews

On Data Science Central

© 2019   BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service