2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDHLT-16.1
Paper Title Recent Advances in Arabic Syntactic Diacritics Restoration
Authors Yasser Hifny, University of Helwan, Egypt
SessionHLT-16: Applications in Natural Language
LocationGather.Town
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Human Language Technology: [HLT-STPA] Segmentation, Tagging, and Parsing
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Restoring Arabic syntactic diacritics based on Long Short-Term Memory (LSTM) networks leads to state-of-the-art performance. These LSTM networks are commonly augmented with Maximum Entropy (MaxEnt) sparse direct connections between the input and the output layers of the tagger. One way to improve such tagger performance is to use an ensemble of taggers. However, an ensemble of taggers may require huge computational and memory resources. In this paper, we implement a knowledge distillation technique where an ensemble of teachers/taggers is used to train a single student tagger. On the other hand, Arabic is a morphologically rich language and has a high Out-Of-Vocabulary (OOV) rate. In addition to word embeddings, we propose to use character embeddings encoded using LSTMs for each word to overcome this problem. On the Arabic tree bank task, our hybrid LSTM/MaxEnt tagger achieves 1.0% absolute WER improvement over a strong baseline using the proposed two techniques.