Xingyu NA

Homepage contact (86)-13811763663 LinkedIn Github

Education

Beijing Institute of Technology `2008.9 - 2014.3`

Beijing, China

Ph.D. thesis : Personalization of HMM-based Speech Synthesis
Co-advised by Jingming Kuang and Xiang Xie

Beijing Institute of Technology `2004.9 - 2008.7`

Beijing, China

B.Eng. in Mechanical and Electronic Engineering
GPA 4.0, ranking 1 / 60

Experience

Apple, AIML `2020.9 -`

Senior Speech R&D Engineer
Work on speech recognition that powers

Siri
Dictation
Live audio transcription shipped with Apple Intelligence on iOS 18 and macOS Sequoia (WWDC24).

Microsoft, STC Asia `2017.8 - 2020.8`

Senior Applied Scientist
Work on speech recognition features for Xiaoice, in both full-duplex and half-duplex fashion, covering various applicational scenarios, such as IoT. My duties are:

Designed and developed acoustic model training system for speech recognition
Delivered AMs for Xiaoice and Rinna applications
Lead the optimization of SR decoder and cloud service

Alibaba, Robotics Subsidiary `2016.12 - 2017.6`

Senior Staff Engineer
Alibaba Robotics was founded for localized operations of Softbanks robot called Pepper. I acted as leader of the Speech & Dialog team. My contributions were:

Designed the architacture of light voice interaction system for robot.
Optimized audio noise supporesion modules on Pepper.

LeTV, LeLe Innovation Subsidiary `2015.12 - 2016.12`

Senior Researcher
Worked on acoustic modelling for SR and voice wake-up.

Chinese Academy of Sciences, Institute of Acoustics `2014.3 - 2015.12`

Assistant Researcher

Samsung R&D Institute of China, Languge Computing Lab `2014.1 - 2014.2`

Intern Engineer
Worked on optimization of TTS training pipelines.

Idiap Research Institute, Speech and Audio Group `2012.9 - 2013.8`

Research Intern
I was sponsored by Chinese Scholarship Council as joint Ph.D. at Idiap for a year, advised by Phil Garner.

Publications

Speech Recognition with Kaldi (Chinese)

Guoguo Chen, Jiayu Du, Xingyu Na, Junbo Zhang.
Publishing House of Electronics Industry, available on JoyBuy Amazon DangDang

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu.
[pdf] [code]

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng.
O-COCOSDA, Seoul, R. O. Korea, 2017.
[paper] [data] [code]

Purely Sequence-trained Neural Networks for ASR based on Lattice-free MMI

Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur.
Interspeech, San Francisco, US, 2016.
[pdf] [code]

An Emperical Exploration of CTC Acoustic Models

Yajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko, Florian Metze, Alexander Waibel.
IEEE Conference on Acoustic, Speech and Signal Processing, Shanghai, China, 2016.
[paper] [code]

Two-stage ASGD Framework for Parallel Training of DNN Acoustic Models using Ethernet

Zhichao Wang, Xingyu Na, Yonghong Yan.
IEEE Automatic Speech Recognition and Understanding Workshop, Arizona, US, 2015. [paper]

Incremental Syllable-Context Phonetic Vocoding

Milos Cernak, Phil Garner, Alexandros Lazaridis, Petr Motlicek, Xingyu Na.
IEEE/ACM Transactions on Acoustic, Speech and Language Processing, 23(6), 2015 [paper]

Low-Latency Parameter Generation for Real-time Embedded Speech Synthesis System

Xingyu Na, Xiang Xie, Jingming Kuang.
IEEE International Conference on Multimedia And Expo, Chengdu, China, 2014 [paper]

Improving Voice Quality of HMM-based Speech Synthesis Using Voice Conversion Method

Yishan Jiao, Xiang Xie, Xingyu Na, Ming Tu.
IEEE Conference on Acoustic, Speech and Signal Processing, Florence, Italy, 2014 [paper]

Syllable-based Pitch Encoding for Low Bit Rate Speech Coding with Recognition/Synthesis Architecture

Milos Cernak, Xingyu Na, Phil Garner.
Interspeech, Lyon, France, 2013. [pdf]

Convolutional Pitch Target Approximation Model for Speech Synthesis

Xingyu Na, Phil Garner.
Idiap Research Report, Martigny, Switzerland, 2013. [pdf]

An Improved Tone Labeling and Prediction Method with Non-uniform Segmentation of F0 Contour

Xingyu Na, Xiang Xie, Jingming Kuang, Yaling He.
IEEE International Symposium on Chinese Spoken Language Processing, Hongkong, China, 2012. [paper]

Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis

Xingyu Na, Chaomin Wang, Xiang Xie, Jingming Kuang, Yaling He.
Speech Prosody, Shanghai, China, 2012. [pdf]

Service

Reviewer:

Speech Communication
EURASIP Journal on Audio, Speech, and Music Processing
KSII Transactions on Internet and Information Systems
IEEE Signal Processing Letters
Journal of the Audio Engineering Society