08/2022 – 05/2023
Supervised by Dr Emmanouil Benetos, Centre for Digital Music, Queen Mary University of London
Built self-supervised learning systems, acquiring 50k+ downloading of checkpoints on Huggingface. Replaced the pseudo-tag from MFCCs to Chroma music features for harmonic information. Utilising deep features like Encodec instead of k-means for scaling up models to 1 B parameters.
09/2021 – 07/2022
Research Assistant, Supervised by Prof. Richard Stern, Carnegie Mellon University
Constructed 2-layer learnable front ends in Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters. Examined the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Analysis of the model performance among tags with different genres and instrument tags.
Design learnable frontends for deep learning models inspired by classic filters, multi-rate sampling & modulation.
Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Supervised by prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology, PKU. see at project for more information.
Jun. 2020 – Sept. 2020. Beijing, CHN. Summer internship in Beijing Deepmusic Technology Co. LTD Write literature review on song tempo/speed detection.
Designing new model on tempo detection based on BiLSTM and Temporal Convolution Network(TCN) and compared them with the baselines of Librosa and MadMOM using the data provided by Renren Karaoke Company (more than 2000 songs manually marked by my colleagues).
In the music with stable speed or with or a slightly slower ending, the accuracy of tempo recognition is above 87% without considering the double frequency (error is less than or equal to 0.
Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Music object recognition and recording is the essential component of music information retrieval. Different from other fields of melody extraction and music transcription, the research on musical instrument technique detection is still in the early stage.
Mar 2019 – Jun 2019. Beijing, CHN. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Main Information
Set up a series of quartet database from DCMI database shared by China Conservatory of Music. Constructed an audio event detection model based on CRNN to detect and recognize instruments. Evaluated the percussion, recall rate and F-measure of the model and CNN baseline model, and compared the difference among different quartet databases generate from different music skills or music types.
CSMT. Dec. 26 -- Dec. 29, 2019. Haerbin, CHN.