2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDHLT-15.2
Paper Title ATTENTION-BASED MULTI-ENCODER AUTOMATIC PRONUNCIATION ASSESSMENT
Authors Binghuai Lin, Liyuan Wang, Tencent Technology Co., Ltd, China
SessionHLT-15: Language Assessment
LocationGather.Town
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Human Language Technology: [HLT-LACL] Language Acquisition and Learning
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Automatic pronunciation assessment plays an important role in Computer-Assisted Pronunciation Training (CAPT). Traditional methods for pronunciation assessment of reading aloud tasks utilize features derived from automatic speech recognition (ASR) and thus are sensitive to the accuracy of ASR and the effectiveness of features. Moreover, the representation capability of the features is also affected by the inconsistent optimization goals between the ASR and scoring tasks. In this paper we propose an end-to-end (E2E) pronunciation scoring network based on attention mechanism and multi-encoder consisting of audio and text encoders. The network optimized by a multi-task learning (MTL) framework can provide scoring at sentence-level as well as detailed scoring at word-level. Due to data scarcity for pronunciation scoring, we utilize ASR data and synthetic data to pre-train the network in two steps, and then fine-tune the network using the limited high-quality scoring data. Experimental results based on the dataset recorded by Chinese English-as-second-language (ESL) learners and labeled by three experts demonstrate that the proposed model outperforms the baseline in Pearson correlation coefficient (PCC).