Xingyu NA
Education
Beijing Institute of Technology 2008.9 - 2014.3
Beijing, China
- Ph.D. thesis : Personalization of HMM-based Speech Synthesis
- Co-advised by Jingming Kuang and Xiang Xie
Beijing Institute of Technology 2004.9 - 2008.7
Beijing, China
- B.Eng. in Mechanical and Electronic Engineering
- GPA 4.0, ranking 1 / 60
Experience
Apple, AIML 2020.9 -
Senior Speech R&D Engineer
Work on speech recognition that powers
- Siri
- Dictation
- Live audio transcription shipped with Apple Intelligence on iOS 18 and macOS Sequoia (WWDC24).
Microsoft, STC Asia 2017.8 - 2020.8
Senior Applied Scientist
Work on speech recognition features for Xiaoice, in both full-duplex and half-duplex fashion, covering various applicational scenarios, such as IoT. My duties are:
- Designed and developed acoustic model training system for speech recognition
- Delivered AMs for Xiaoice and Rinna applications
- Lead the optimization of SR decoder and cloud service
Alibaba, Robotics Subsidiary 2016.12 - 2017.6
Senior Staff Engineer
Alibaba Robotics was founded for localized operations of Softbanks robot called Pepper. I acted as leader of the Speech & Dialog team. My contributions were:
- Designed the architacture of light voice interaction system for robot.
- Optimized audio noise supporesion modules on Pepper.
LeTV, LeLe Innovation Subsidiary 2015.12 - 2016.12
Senior Researcher
Worked on acoustic modelling for SR and voice wake-up.
Chinese Academy of Sciences, Institute of Acoustics 2014.3 - 2015.12
Assistant Researcher
Samsung R&D Institute of China, Languge Computing Lab 2014.1 - 2014.2
Intern Engineer
Worked on optimization of TTS training pipelines.
Idiap Research Institute, Speech and Audio Group 2012.9 - 2013.8
Research Intern
I was sponsored by Chinese Scholarship Council as joint Ph.D. at Idiap for a year, advised by Phil Garner.
Publications
Speech Recognition with Kaldi (Chinese)
Guoguo Chen, Jiayu Du, Xingyu Na, Junbo Zhang.
Publishing House of Electronics Industry, available on
JoyBuy
Amazon
DangDang
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu.
[pdf]
[code]
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng.
O-COCOSDA, Seoul, R. O. Korea, 2017.
[paper]
[data]
[code]
Purely Sequence-trained Neural Networks for ASR based on Lattice-free MMI
Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur.
Interspeech, San Francisco, US, 2016.
[pdf]
[code]
An Emperical Exploration of CTC Acoustic Models
Yajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko, Florian Metze, Alexander Waibel.
IEEE Conference on Acoustic, Speech and Signal Processing, Shanghai, China, 2016.
[paper]
[code]
Two-stage ASGD Framework for Parallel Training of DNN Acoustic Models using Ethernet
Zhichao Wang, Xingyu Na, Yonghong Yan.
IEEE Automatic Speech Recognition and Understanding Workshop, Arizona, US, 2015. [paper]
Incremental Syllable-Context Phonetic Vocoding
Milos Cernak, Phil Garner, Alexandros Lazaridis, Petr Motlicek, Xingyu Na.
IEEE/ACM Transactions on Acoustic, Speech and Language Processing, 23(6), 2015 [paper]
Low-Latency Parameter Generation for Real-time Embedded Speech Synthesis System
Xingyu Na, Xiang Xie, Jingming Kuang.
IEEE International Conference on Multimedia And Expo, Chengdu, China, 2014 [paper]
Improving Voice Quality of HMM-based Speech Synthesis Using Voice Conversion Method
Yishan Jiao, Xiang Xie, Xingyu Na, Ming Tu.
IEEE Conference on Acoustic, Speech and Signal Processing, Florence, Italy, 2014 [paper]
Syllable-based Pitch Encoding for Low Bit Rate Speech Coding with Recognition/Synthesis Architecture
Milos Cernak, Xingyu Na, Phil Garner.
Interspeech, Lyon, France, 2013. [pdf]
Convolutional Pitch Target Approximation Model for Speech Synthesis
Xingyu Na, Phil Garner.
Idiap Research Report, Martigny, Switzerland, 2013. [pdf]
An Improved Tone Labeling and Prediction Method with Non-uniform Segmentation of F0 Contour
Xingyu Na, Xiang Xie, Jingming Kuang, Yaling He.
IEEE International Symposium on Chinese Spoken Language Processing, Hongkong, China, 2012. [paper]
Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis
Xingyu Na, Chaomin Wang, Xiang Xie, Jingming Kuang, Yaling He.
Speech Prosody, Shanghai, China, 2012. [pdf]
Service
Reviewer:
- Speech Communication
- EURASIP Journal on Audio, Speech, and Music Processing
- KSII Transactions on Internet and Information Systems
- IEEE Signal Processing Letters
- Journal of the Audio Engineering Society