IqraEval.2 Challenge Interspeech 2026

Interspeech 2026 Challenge Logo

Overview

This Challenge Interspeech 2026 is a shared task aimed at advancing automatic assessment of Modern Standard Arabic (MSA) pronunciation by leveraging computational methods to detect and diagnose pronunciation errors. The focus on MSA provides a standardized and well-defined context for evaluating Arabic pronunciation.

Participants will develop systems capable of detecting mispronunciations (e.g., substitution, deletion, or insertion of phonemes).

Timeline

Task Description: MSA Mispronunciation Detection System

Design a model to detect and provide detailed feedback on mispronunciations in MSA speech. Users read vowelized sentences; the model predicts the spoken phoneme sequence and flags deviations. Evaluation is on the MSA-Test dataset with human‐annotated errors.

System Overview

Figure: Overview of the Mispronunciation Detection Workflow

1. Read the Sentence

System shows a Reference Sentence plus its Reference Phoneme Sequence.

Example:

2. Save Recording

User speaks; system captures and stores the audio waveform.

3. Mispronunciation Detection

Model predicts the phoneme sequence—deviations from reference indicate mispronunciations.

Example of Mispronunciation:

Here, vs (substitution) represents a common pronunciation error.

Phoneme Set Description

The phoneme set used in this work is based on a specialized phonetizer developed for vowelized MSA. It includes a comprehensive range of phonemes designed to capture key phonetic and prosodic features of standard Arabic speech, such as stress, pausing, intonation, emphaticness, and notably, gemination. Gemination—the doubling of consonant sounds—is explicitly represented by duplicating the consonant symbol (e.g., /b/ becomes /bb/). This phoneme set provides a detailed yet practical representation of the speech sounds relevant for accurate mispronunciation detection in MSA. For further details, including the full phoneme inventory, see Phoneme Inventory.

Training Dataset: Description

Hosted on Hugging Face:

Columns:

Training Dataset: TTS Data (Optional)

Auxiliary high-quality TTS corpus for augmentation: (Will be released on 15 December 2025)

Test Dataset: MSA-Test

98 sentences × 18 speakers ≈ 2 h, with deliberate errors and human annotations. load_dataset("Interspeech26/MSA_Test_v2")

Submission Details (Draft)

Submit a UTF-8 CSV named teamID_submission.csv with two columns:

ID,Labels
0000_0001, y a t a H a d d a ...
0000_0002, m a a n a n s a ...
...
      

Note: no extra spaces, single CSV, no archives.

Evaluation Criteria

The Leaderboard is based on phoneme-level F1-score. We use a hierarchical evaluation (detection + diagnostic) per MDD Overview.

From these we compute:

Rates:

Plus standard Precision, Recall, F1 for detection:

Suggested Research Directions

  1. Advanced Mispronunciation Detection Models
    Apply state-of-the-art self-supervised models (e.g., Wav2Vec2.0, HuBERT), using variants that are pre-trained/fine-tuned on Arabic speech. These models can then be fine-tuned on MSA datasets to improve phoneme-level accuracy.
  2. Data Augmentation Strategies
    Create synthetic mispronunciation examples using pipelines like SpeechBlender. Augmenting limited Arabic speech data helps mitigate data scarcity and improves model robustness.
  3. Analysis of Common Mispronunciation Patterns
    Perform statistical analysis on the MSA-Test dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels). These insights can drive targeted training and tailored feedback rules.

Registration

Teams and individual participants must register to gain access to the test set. Please complete the registration form using the link below:

Registration Form

Registration opens on December 1, 2025.

Future Updates

Further details on the open-set leaderboard submission will be posted on the shared task website (December 15, 2025). Stay tuned!

Contact and Support

For inquiries and support, reach out to the task coordinators.

References