Xingyu Na

Senior Speech R&D Engineer, Apple
Mailing Address: No. 2 Kexueyuan South Street, Beijing, China
Office: Raycom Tower A
Email: asr.naxingyu -at-

A printable version CV is here.

I received my Ph.D. degree from Beijing Institute of Technology in 2014 under supervision of Prof. Jingming Kuang and Prof. Xiang Xie. I was a visiting Ph.D. student in Dr. Philip N. Garner's group at Idiap Research Institute in 2012 and 2013.


  • 07/09/2020: I'm joining Apple!
  • 30/03/2020: Our book "Speech Recognition with Kaldi" is available on JoyBuy, Amazon and DangDang
  • 18/09/2017: Our paper "AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline" is accepted by Oriental COCOSDA 2017 as an oral presentation!
  • 21/08/2017: I'm joining Microsoft China!
  • 02/12/2016: I'm joining Alibaba Robotics Corp. as a Senior Staff Engineer!
  • 05/09/2016: Attending Interspeech 2016 at San Francisco!
  • 10/12/2015: I'm joining Letv as a Senior Researcher on speech recognition.
  • 19/03/2015: Our paper "Incremental Syllable-Context Phonetic Vocoding" was accepted by TASLP.
  • 14/07/2014: I gave a talk about "Real-Time Speech Synthesis" at IEEE ICME 2014.
  • 07/03/2014: I'm joining Chinese Academy of Sciences, Institute of Acoustics as an Assistant Researcher.


    Playing with Kaldi (the most popular open-source speech recognition toolkit)
    My contributions are
    • created components for convolutional neural network in nnet2
    • created and tuned left-biphone setups for Chain model
    • modified transition model and HMM topology kernel
    • maintainer of aishell, fisher_swbd, hkust, gale_mandarin and thchs30 benchmarks
    Playing with HTS (the most popular open-source speech synthesis toolkit)
    I used HTS a lot for my PhD thesis. I shared the tools that were useful to me on Github and some of them are included as HTS extentions
    • HTS_PDFparser: a lite parser for hts_engine model
    • StreamGenerator: a single stream speech parameter generator for customizable hts_engine
    • MGETraining: HTS training scripts supporting minimum-generation-error training
    Voxforge for Chinese Calling for voluntary participants!


    Enhancing CTC-based speech recognition with diverse modeling units
    Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang
    Interspeech, Kos, Greece, 2024 [paper]

    Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge
    Guoguo Chen, Xingyu Na, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Sifan Ma, Yujun Wang
    arXiv, 2011.04547, 2020 [paper]

    AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
    Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu
    arXiv, 1808.10583, 2018 [paper] [Kaldi recipe]

    AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
    Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng
    O-COCOSDA, Seoul, R. O. Korea, 2017 [paper] [Kaldi recipe]

    Purely Sequence-trained Neural Networks for ASR based on Lattice-free MMI
    Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
    Interspeech, San Francisco, US, 2016 [paper] [Kaldi recipe]

    An Emperical Exploration of CTC Acoustic Models
    Yajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko, Florian Metze, Alexander Waibel
    IEEE Conference on Acoustic, Speech and Signal Processing, Shanghai, China, 2016 [paper] [Eesen recipe]

    Two-stage ASGD Framework for Parallel Training of DNN Acoustic Models using Ethernet
    Zhichao Wang, Xingyu Na, Yonghong Yan
    IEEE Automatic Speech Recognition and Understanding Workshop, Arizona, US, 2015 [paper]

    Incremental Syllable-Context Phonetic Vocoding
    Milos Cernak, Phil Garner, Alexandros Lazaridis, Petr Motlicek, Xingyu Na
    IEEE/ACM Transactions on Acoustic, Speech and Language Processing, 23(6), 2015 [paper] [project]

    Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion
    Lakshmi Saheer, Xingyu Na, Milos Cernak
    Idiap Research Report, Martigny, Switzerland, 2015 [paper]

    Low-Latency Parameter Generation for Real-time Embedded Speech Synthesis System
    Xingyu Na, Xiang Xie, Jingming Kuang
    IEEE International Conference on Multimedia And Expo, Chengdu, China, 2014 [paper]

    Improving Voice Quality of HMM-based Speech Synthesis Using Voice Conversion Method
    Yishan Jiao, Xiang Xie, Xingyu Na, Ming Tu
    IEEE Conference on Acoustic, Speech and Signal Processing, Florence, Italy, 2014 [paper]

    Syllable-based Pitch Encoding for Low Bit Rate Speech Coding with Recognition/Synthesis Architecture
    Milos Cernak, Xingyu Na, Phil Garner
    Interspeech, Lyon, France, 2013 [paper]

    Convolutional Pitch Target Approximation Model for Speech Synthesis
    Xingyu Na, Phil Garner
    Idiap Research Report, Martigny, Switzerland, 2013 [paper]

    An Improved Tone Labeling and Prediction Method with Non-uniform Segmentation of F0 Contour
    Xingyu Na, Xiang Xie, Jingming Kuang, Yaling He
    IEEE International Symposium on Chinese Spoken Language Processing, Hongkong, China, 2012 [paper]

    Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis
    Xingyu Na, Chaomin Wang, Xiang Xie, Jingming Kuang, Yaling He
    Speech Prosody, Shanghai, China, 2012 [paper]

Professional Activities

  • Reviewer:
    • Speech Communication
    • EURASIP Journal on Audio, Speech, and Music Processing
    • KSII Transactions on Internet and Information Systems
    • IEEE Signal Processing Letters
    • Journal of the Audio Engineering Society
    • Pattern Recognition Letters

Github Activities

Free counters!