Parallel WaveNet: Fast High-Fidelity Speech Synthesis.
Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik...
View Article[Invited] Generative Model-Based Text-to-Speech Synthesis.
Heiga Zen: [Invited] Generative Model-Based Text-to-Speech Synthesis. GCCE 2018: 327-328
View ArticleLearning to Speak Fluently in a Foreign Language: Multilingual Speech...
Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran: Learning to Speak Fluently in a Foreign Language: Multilingual Speech...
View ArticleLibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu: LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. CoRR abs/1904.02882 (2019)
View ArticleLingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen, Ye Jia, Anjuli Kannan, Tara N. Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James...
View ArticleLearning to Speak Fluently in a Foreign Language: Multilingual Speech...
Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran: Learning to Speak Fluently in a Foreign Language: Multilingual Speech...
View ArticleLibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu: LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. INTERSPEECH 2019: 1526-1530
View ArticleHierarchical Generative Modeling for Controllable Speech Synthesis.
Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang: Hierarchical Generative Modeling for Controllable...
View ArticleSample Efficient Adaptive Text-to-Speech.
Yutian Chen, Yannis M. Assael, Brendan Shillingford, David Budden, Scott E. Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Çaglar Gülçehre, Aäron van den Oord, Oriol Vinyals, Nando...
View ArticleSpeech Processing for Digital Home Assistants: Combining signal processing...
Reinhold Haeb-Umbach, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen, Mehrez Souden: Speech Processing for Digital Home Assistants: Combining...
View ArticleParallel Tacotron: Non-Autoregressive and Controllable TTS.
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Wu: Parallel Tacotron: Non-Autoregressive and Controllable TTS. CoRR abs/2010.11439 (2020)
View ArticleNon-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis...
Jonathan Shen, Ye Jia, Mike Chrzanowski, Yu Zhang, Isaac Elias, Heiga Zen, Yonghui Wu: Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling....
View ArticleWaveGrad: Estimating Gradients for Waveform Generation.
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan: WaveGrad: Estimating Gradients for Waveform Generation. CoRR abs/2009.00713 (2020)
View ArticleGenerating diverse and natural text-to-speech samples using a quantized...
Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu: Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE...
View ArticleFully-hierarchical fine-grained prosody modeling for interpretable speech...
Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu: Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis. CoRR abs/2002.03785 (2020)
View ArticleGenerating Diverse and Natural Text-to-Speech Samples Using a Quantized...
Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu: Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE...
View ArticleFully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech...
Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu: Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis. ICASSP 2020: 6264-6268
View ArticleWaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan: WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. CoRR abs/2106.09660 (2021)
View ArticlePnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu: PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. CoRR abs/2103.15060 (2021)
View ArticleParallel Tacotron 2: A Non-Autoregressive Neural TTS Model with...
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Jia Ye, R. J. Skerry-Ryan, Yonghui Wu: Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. CoRR...
View ArticleWaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan: WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Interspeech 2021: 3765-3769
View ArticleSemi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based...
Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno: Semi-Supervision in ASR: Sequential MixMatch and...
View ArticlePnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu: PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. Interspeech 2021: 151-155
View ArticleParallel Tacotron 2: A Non-Autoregressive Neural TTS Model with...
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu: Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. Interspeech...
View ArticleWaveGrad: Estimating Gradients for Waveform Generation.
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan: WaveGrad: Estimating Gradients for Waveform Generation. ICLR 2021
View ArticleParallel Tacotron: Non-Autoregressive and Controllable TTS.
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Wu: Parallel Tacotron: Non-Autoregressive and Controllable TTS. ICASSP 2021: 5709-5713
View ArticleResidual Adapters for Few-Shot Text-to-Speech Speaker Adaptation.
Nobuyuki Morioka, Heiga Zen, Nanxin Chen, Yu Zhang, Yifan Ding: Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation. CoRR abs/2210.15868 (2022)
View ArticleVirtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for...
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran: Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised...
View ArticleWaveFit: An Iterative and Non-autoregressive Neural Vocoder based on...
Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani: WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration. CoRR abs/2210.01029 (2022)
View ArticleTraining Text-To-Speech Systems From Synthetic Data: A Practical Approach For...
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark: Training Text-To-Speech Systems From...
View ArticleMAESTRO: Matched Speech Text Representations through Modality Matching.
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen: MAESTRO: Matched Speech Text Representations through Modality Matching. CoRR abs/2204.03409 (2022)
View ArticleSpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive...
Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani: SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. CoRR abs/2203.16749 (2022)
View ArticleCVSS Corpus and Massively Multilingual Speech-to-Speech Translation.
Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen: CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. CoRR abs/2201.03713 (2022)
View ArticleWavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on...
Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani: Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration. SLT 2022: 884-891
View ArticleCVSS Corpus and Massively Multilingual Speech-to-Speech Translation.
Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen: CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. LREC 2022: 6691-6703
View ArticleTraining Text-To-Speech Systems From Synthetic Data: A Practical Approach For...
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark: Training Text-To-Speech Systems From...
View ArticleMAESTRO: Matched Speech Text Representations through Modality Matching.
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen: MAESTRO: Matched Speech Text Representations through Modality Matching. INTERSPEECH 2022: 4093-4097
View ArticleSpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive...
Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani: SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. INTERSPEECH 2022: 803-807
View ArticleSayTap: Language to Quadrupedal Locomotion.
Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada: SayTap: Language to Quadrupedal Locomotion. CoRR abs/2306.07580 (2023)
View ArticleLibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna: LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus. CoRR...
View ArticleTranslatotron 3: Speech to Speech Translation with Monolingual Data.
Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich: Translatotron 3: Speech to Speech Translation with Monolingual Data. CoRR abs/2305.17547...
View ArticleMiipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech...
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani: Miipher: A Robust Speech Restoration Model Integrating...
View ArticleMiipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech...
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani: Miipher: A Robust Speech Restoration Model Integrating...
View ArticleVirtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for...
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran: Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised...
View ArticleGuest Editorial: Special Issue on Affective Speech and Language Synthesis,...
Shahin Amiriparian, Björn W. Schuller, Nabiha Asghar, Heiga Zen, Felix Burkhardt: Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion. IEEE Trans....
View ArticleTwenty-Five Years of Evolution in Speech and Language Processing.
Dong Yu, Yifan Gong, Michael A. Picheny, Bhuvana Ramabhadran, Dilek Hakkani-Tür, Rohit Prasad, Heiga Zen, Jan Skoglund, Jan Honza Cernocký, Lukás Burget, Abdelrahman Mohamed: Twenty-Five Years of...
View ArticleExtracting representative subset from extensive text data for training...
Jun Suzuki, Heiga Zen, Hideto Kazawa: Extracting representative subset from extensive text data for training pre-trained language models. Inf. Process. Manag. 60(3): 103249 (2023)
View ArticleLightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.
Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira...
View ArticleSayTap: Language to Quadrupedal Locomotion.
Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada: SayTap: Language to Quadrupedal Locomotion. CoRL 2023: 3556-3570
View ArticleExtending Multilingual Speech Synthesis to 100+ Languages without Transcribed...
Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov: Extending Multilingual Speech Synthesis to...
View Article
More Pages to Explore .....