UY/CH-CHILD

We constructed UY/CH-CHILD, a new speech database consisting of 29,061 Chinese samples spoken by 106 Uyghur children.

106
children

We collected speech production data from 106 children aged from 4 to 8 in Ili Prefecture. Xiniiang Uyghur Autonomous Region in China. The participating children come from native Uyghur families and attend kindergartens or primary schools where Chinese is the primary language.

29,061
samples

The recorded speech was uploaded to the annotation platform for phonetic labeling. We invited 13 students in the International Cultural Exchange College Xinjiang University to conduct the labelling procedure. All the students are native Chinese, and have considerable experience and knowledge in Chinese pronunciation. After the phonetic labelling, there were more than 29,061 valid samples in total.

Download

The database is public to universities and research institutes for research purpose only.

To request a copy of the database, please send an email to Prof. Dong Wang.

Data

Please send email to Prof. Dong Wang:

wangdong99@mails.tsinghua.edu.cn

License

All the resources contained in the dataset are free for research institutes and individuals. The copyright remains with the original owners of the audio/video. No commerical usage is permitted.

Publications

Please cite the following if you make use of the database.

Mewlude Nijat, Chen Chen, Dong Wang*, Askar Hamdulla*

UY/CH-CHILD -- A Public Chinese L2 Speech Database of Uyghur Children

Bibtex


                           @inproceedings{nijat2024uychchild,
                               title     = {UY/CH-CHILD -- A Public Chinese L2 Speech Database of Uyghur Children},
                               author    = {Nijat, Mewlude and Chen, Chen and Wang, Dong and Hamdulla, Askar},
                               booktitle = {Proc. Interspeech},
                              year      = {2024}
                           }

Abstract

* Corresponding author

106 children