About Download Publications

UY/CH-CHILD-MA

A Children's Chinese L2 Speech Database (Multi-Annotator Edition)

UY/CH-CHILD-MA

UY/CH-CHILD-MA is an expanded release of UY/CH-CHILD. It provides multi-annotator pronunciation labels to support research on children’s L2 Mandarin acquisition, mispronunciation detection, and perceptually grounded assessment.

106
Uyghur children (speaker count)
24,958
speech segments (after segmentation & QC)
4–12
age range (years)

What is “MA”? MA stands for Multi-Annotator. This release includes labels produced by (i) a virtual collective annotator (majority-vote style workflow) and (ii) three independent annotators, enabling the study of inter-annotator variation and perceptual uncertainty.

About the database

Recordings were collected in Ili Prefecture (Xinjiang, China) from children in native Uyghur families who attend schools where Mandarin Chinese is the primary language. The speech was recorded in quiet rooms at 16 kHz, 16-bit, single-channel.

Highlights

Download

The database is public to universities and research institutes for research purpose only.

To request a copy of the database, please send an email to Prof. Dong Wang.

Data

Please send email to Prof. Dong Wang:

wangdong99@mails.tsinghua.edu.cn

License

All the resources contained in the dataset are free for research institutes and individuals. The copyright remains with the original owners of the audio/video. No commercial usage is permitted.

PUBLICATIONS

Please cite the following if you make use of the database.

1) UY/CH-CHILD (original release)

Mewlude Nijat, Chen Chen, Dong Wang*, Askar Hamdulla*

UY/CH-CHILD — A Public Chinese L2 Speech Database of Uyghur Children

INTERSPEECH 2024. (* Corresponding author)

@inproceedings{nijat2024uychchild,
  title     = {UY/CH-CHILD -- A Public Chinese L2 Speech Database of Uyghur Children},
  author    = {Nijat, Mewlude and Chen, Chen and Wang, Dong and Hamdulla, Askar},
  booktitle = {Proc. Interspeech},
  year      = {2024}
}

2) UY/CH-CHILD-MA (multi-annotator expansion)

Mewlude Nijat, Yang Wei, Askar Hamdulla

Perception Norm for Mispronunciation Detection: Modeling the Perceptual and Psychological Process with Multi-annotator Annotations on Uyghur-Accented Child Mandarin

(Preprint / extended journal version).

@article{nijat20xxuychchildma,
  title     = {Perception Norm for Mispronunciation Detection: Modeling the Perceptual and Psychological Process with Multi-annotator Annotations on Uyghur-Accented Child Mandarin},
  author    = {Nijat, Mewlude and Wei, Yang and Hamdulla, Askar},
  journal   = {TBD},
  year      = {20xx},
  note      = {Update this BibTeX entry once the paper is formally published.}
}
Copyright © THU-CSLT.