Quantcast
Channel: dblp: Heiga Zen
Browsing latest articles
Browse All 139 View Live

Image may be NSFW.
Clik here to view.

Parallel WaveNet: Fast High-Fidelity Speech Synthesis.

Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik...

View Article



Image may be NSFW.
Clik here to view.

[Invited] Generative Model-Based Text-to-Speech Synthesis.

Heiga Zen: [Invited] Generative Model-Based Text-to-Speech Synthesis. GCCE 2018: 327-328

View Article

Image may be NSFW.
Clik here to view.

Learning to Speak Fluently in a Foreign Language: Multilingual Speech...

Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran: Learning to Speak Fluently in a Foreign Language: Multilingual Speech...

View Article

Image may be NSFW.
Clik here to view.

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.

Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu: LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. CoRR abs/1904.02882 (2019)

View Article

Image may be NSFW.
Clik here to view.

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen, Ye Jia, Anjuli Kannan, Tara N. Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James...

View Article


Image may be NSFW.
Clik here to view.

Learning to Speak Fluently in a Foreign Language: Multilingual Speech...

Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran: Learning to Speak Fluently in a Foreign Language: Multilingual Speech...

View Article

Image may be NSFW.
Clik here to view.

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.

Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu: LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. INTERSPEECH 2019: 1526-1530

View Article

Image may be NSFW.
Clik here to view.

Hierarchical Generative Modeling for Controllable Speech Synthesis.

Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang: Hierarchical Generative Modeling for Controllable...

View Article


Image may be NSFW.
Clik here to view.

Sample Efficient Adaptive Text-to-Speech.

Yutian Chen, Yannis M. Assael, Brendan Shillingford, David Budden, Scott E. Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Çaglar Gülçehre, Aäron van den Oord, Oriol Vinyals, Nando...

View Article


Image may be NSFW.
Clik here to view.

Speech Processing for Digital Home Assistants: Combining signal processing...

Reinhold Haeb-Umbach, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen, Mehrez Souden: Speech Processing for Digital Home Assistants: Combining...

View Article

Image may be NSFW.
Clik here to view.

Parallel Tacotron: Non-Autoregressive and Controllable TTS.

Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Wu: Parallel Tacotron: Non-Autoregressive and Controllable TTS. CoRR abs/2010.11439 (2020)

View Article

Image may be NSFW.
Clik here to view.

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis...

Jonathan Shen, Ye Jia, Mike Chrzanowski, Yu Zhang, Isaac Elias, Heiga Zen, Yonghui Wu: Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling....

View Article

Image may be NSFW.
Clik here to view.

WaveGrad: Estimating Gradients for Waveform Generation.

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan: WaveGrad: Estimating Gradients for Waveform Generation. CoRR abs/2009.00713 (2020)

View Article


Image may be NSFW.
Clik here to view.

Generating diverse and natural text-to-speech samples using a quantized...

Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu: Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE...

View Article

Image may be NSFW.
Clik here to view.

Fully-hierarchical fine-grained prosody modeling for interpretable speech...

Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu: Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis. CoRR abs/2002.03785 (2020)

View Article


Image may be NSFW.
Clik here to view.

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized...

Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu: Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE...

View Article

Image may be NSFW.
Clik here to view.

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech...

Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Yonghui Wu: Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis. ICASSP 2020: 6264-6268

View Article


Image may be NSFW.
Clik here to view.

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan: WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. CoRR abs/2106.09660 (2021)

View Article

Image may be NSFW.
Clik here to view.

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.

Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu: PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. CoRR abs/2103.15060 (2021)

View Article

Image may be NSFW.
Clik here to view.

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with...

Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Jia Ye, R. J. Skerry-Ryan, Yonghui Wu: Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. CoRR...

View Article

Image may be NSFW.
Clik here to view.

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan: WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Interspeech 2021: 3765-3769

View Article


Image may be NSFW.
Clik here to view.

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based...

Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno: Semi-Supervision in ASR: Sequential MixMatch and...

View Article


Image may be NSFW.
Clik here to view.

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.

Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu: PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. Interspeech 2021: 151-155

View Article

Image may be NSFW.
Clik here to view.

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with...

Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu: Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. Interspeech...

View Article

Image may be NSFW.
Clik here to view.

WaveGrad: Estimating Gradients for Waveform Generation.

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan: WaveGrad: Estimating Gradients for Waveform Generation. ICLR 2021

View Article


Image may be NSFW.
Clik here to view.

Parallel Tacotron: Non-Autoregressive and Controllable TTS.

Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, Ron J. Weiss, Yonghui Wu: Parallel Tacotron: Non-Autoregressive and Controllable TTS. ICASSP 2021: 5709-5713

View Article

Image may be NSFW.
Clik here to view.

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation.

Nobuyuki Morioka, Heiga Zen, Nanxin Chen, Yu Zhang, Yifan Ding: Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation. CoRR abs/2210.15868 (2022)

View Article

Image may be NSFW.
Clik here to view.

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for...

Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran: Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised...

View Article

Image may be NSFW.
Clik here to view.

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on...

Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani: WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration. CoRR abs/2210.01029 (2022)

View Article



Image may be NSFW.
Clik here to view.

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For...

Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark: Training Text-To-Speech Systems From...

View Article

Image may be NSFW.
Clik here to view.

MAESTRO: Matched Speech Text Representations through Modality Matching.

Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen: MAESTRO: Matched Speech Text Representations through Modality Matching. CoRR abs/2204.03409 (2022)

View Article

Image may be NSFW.
Clik here to view.

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive...

Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani: SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. CoRR abs/2203.16749 (2022)

View Article

Image may be NSFW.
Clik here to view.

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation.

Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen: CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. CoRR abs/2201.03713 (2022)

View Article


Image may be NSFW.
Clik here to view.

Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on...

Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani: Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration. SLT 2022: 884-891

View Article

Image may be NSFW.
Clik here to view.

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation.

Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen: CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. LREC 2022: 6691-6703

View Article

Image may be NSFW.
Clik here to view.

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For...

Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark: Training Text-To-Speech Systems From...

View Article


Image may be NSFW.
Clik here to view.

MAESTRO: Matched Speech Text Representations through Modality Matching.

Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen: MAESTRO: Matched Speech Text Representations through Modality Matching. INTERSPEECH 2022: 4093-4097

View Article


Image may be NSFW.
Clik here to view.

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive...

Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani: SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. INTERSPEECH 2022: 803-807

View Article

Image may be NSFW.
Clik here to view.

SayTap: Language to Quadrupedal Locomotion.

Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada: SayTap: Language to Quadrupedal Locomotion. CoRR abs/2306.07580 (2023)

View Article

Image may be NSFW.
Clik here to view.

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.

Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna: LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus. CoRR...

View Article

Image may be NSFW.
Clik here to view.

Translatotron 3: Speech to Speech Translation with Monolingual Data.

Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich: Translatotron 3: Speech to Speech Translation with Monolingual Data. CoRR abs/2305.17547...

View Article


Image may be NSFW.
Clik here to view.

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech...

Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani: Miipher: A Robust Speech Restoration Model Integrating...

View Article

Image may be NSFW.
Clik here to view.

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech...

Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani: Miipher: A Robust Speech Restoration Model Integrating...

View Article


Image may be NSFW.
Clik here to view.

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for...

Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran: Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised...

View Article

Image may be NSFW.
Clik here to view.

Guest Editorial: Special Issue on Affective Speech and Language Synthesis,...

Shahin Amiriparian, Björn W. Schuller, Nabiha Asghar, Heiga Zen, Felix Burkhardt: Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion. IEEE Trans....

View Article


Image may be NSFW.
Clik here to view.

Twenty-Five Years of Evolution in Speech and Language Processing.

Dong Yu, Yifan Gong, Michael A. Picheny, Bhuvana Ramabhadran, Dilek Hakkani-Tür, Rohit Prasad, Heiga Zen, Jan Skoglund, Jan Honza Cernocký, Lukás Burget, Abdelrahman Mohamed: Twenty-Five Years of...

View Article

Image may be NSFW.
Clik here to view.

Extracting representative subset from extensive text data for training...

Jun Suzuki, Heiga Zen, Hideto Kazawa: Extracting representative subset from extensive text data for training pre-trained language models. Inf. Process. Manag. 60(3): 103249 (2023)

View Article

Image may be NSFW.
Clik here to view.

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.

Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira...

View Article

Image may be NSFW.
Clik here to view.

SayTap: Language to Quadrupedal Locomotion.

Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada: SayTap: Language to Quadrupedal Locomotion. CoRL 2023: 3556-3570

View Article


Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed...

Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov: Extending Multilingual Speech Synthesis to...

View Article

Browsing latest articles
Browse All 139 View Live




Latest Images