Deep Learning

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingCCF none

08/2022 – 05/2023 Supervised by Dr Emmanouil Benetos, Centre for Digital Music, Queen Mary University of London Built self-supervised learning systems, acquiring 50k+ downloading of checkpoints on Huggingface. Replaced the pseudo-tag from MFCCs to Chroma music features for harmonic information. Utilising deep features like Encodec instead of k-means for scaling up models to 1 B parameters.

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingCCF none

09/2021 – 07/2022 Research Assistant, Supervised by Prof. Richard Stern, Carnegie Mellon University Constructed 2-layer learnable front ends in Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters. Examined the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Analysis of the model performance among tags with different genres and instrument tags.

Learnable Frontend for Music, Speech and Audiow

Design learnable frontends for deep learning models inspired by classic filters, multi-rate sampling & modulation.

基于机器学习的笛子演奏技法 识别研究与实现 (Research & implementation of Chinese flute playing technique recognition based on ML)

Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Supervised by prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology, PKU. see at project for more information.

Tempo Detection of Chinese Pop Music

Jun. 2020 – Sept. 2020. Beijing, CHN. Summer internship in Beijing Deepmusic Technology Co. LTD Write literature review on song tempo/speed detection. Designing new model on tempo detection based on BiLSTM and Temporal Convolution Network(TCN) and compared them with the baselines of Librosa and MadMOM using the data provided by Renren Karaoke Company (more than 2000 songs manually marked by my colleagues). In the music with stable speed or with or a slightly slower ending, the accuracy of tempo recognition is above 87% without considering the double frequency (error is less than or equal to 0.

Research & implementation of Chinese flute playing technique recognition based on machine learning

Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Music object recognition and recording is the essential component of music information retrieval. Different from other fields of melody extraction and music transcription, the research on musical instrument technique detection is still in the early stage.

Chinese instrument recognition

Mar 2019 – Jun 2019. Beijing, CHN. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Main Information Set up a series of quartet database from DCMI database shared by China Conservatory of Music. Constructed an audio event detection model based on CRNN to detect and recognize instruments. Evaluated the percussion, recall rate and F-measure of the model and CNN baseline model, and compared the difference among different quartet databases generate from different music skills or music types.

基于卷积循环神经网的复音音乐中国民族乐器检测 (Detection of Chinese Instrumental Quartet based on CRNN)

CSMT. Dec. 26 -- Dec. 29, 2019. Haerbin, CHN.