Search

Andrew Gibiansky Phones & Addresses

  • San Francisco, CA
  • North Potomac, MD
  • Claremont, CA

Publications

Us Patents

Real-Time Neural Text-To-Speech

View page
US Patent:
20210027762, Jan 28, 2021
Filed:
Oct 1, 2020
Appl. No.:
17/061433
Inventors:
- Sunnyvale CA, US
Mike CHRZANOWSKI - Palo Alto CA, US
Adam COATES - Mountain View CA, US
Gregory DIAMOS - San Jose CA, US
Andrew GIBIANSKY - Mountain View CA, US
John MILLER - Palo Alto CA, US
Andrew NG - Mountain View CA, US
Jonathan RAIMAN - Palo Alto CA, US
Mohammad SHOEYBI - Los Altos CA, US
Assignee:
Baidu USA LLC - Sunnyvale CA
International Classification:
G10L 13/08
G10L 13/027
G10L 25/30
G06N 3/08
G06N 3/04
Abstract:
Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Systems And Methods For Real-Time Neural Text-To-Speech

View page
US Patent:
20180247636, Aug 30, 2018
Filed:
Jan 29, 2018
Appl. No.:
15/882926
Inventors:
- Sunnyvale CA, US
Mike CHRZANOWSKI - Palo Alto CA, US
Adam COATES - Mountain View CA, US
Gregory DIAMOS - San Jose CA, US
Andrew GIBIANSKY - Mountain View CA, US
John MILLER - Palo Alto CA, US
Andrew NG - Mountain View CA, US
Jonathan RAIMAN - Palo Alto CA, US
Mohammad SHOEYBI - Los Altos CA, US
Assignee:
Baidu USA LLC - Sunnyvale CA
International Classification:
G10L 13/08
G10L 13/027
G10L 25/30
Abstract:
Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.
Andrew Gibiansky from San Francisco, CA, age ~31 Get Report