MIR | MIRer

Map-music2vec: A simple and effective baseline for self-supervised music audio representation learning

Abstract: The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter.

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingCCF none

09/2021 – 07/2022 Research Assistant, Supervised by Prof. Richard Stern, Carnegie Mellon University Constructed 2-layer learnable front ends in Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters. Examined the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Analysis of the model performance among tags with different genres and instrument tags.

Learnable Frontend for Music, Speech and Audiow

Design learnable frontends for deep learning models inspired by classic filters, multi-rate sampling & modulation.

Cover Song Detection & Evaluation of Automatic Speech Recognition

May 2020 – Aug. 2021. Beijing, CHN. Summer internship in Tencent Holdings Limited. (Beijing) Write literature review on coversong detection. Reproduced and refined music separation & covering detection. Evaluating different ASR models on business data.

基于机器学习的笛子演奏技法识别研究与实现 (Research & implementation of Chinese flute playing technique recognition based on ML)

Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Supervised by prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology, PKU. see at project for more information.

Tempo Detection of Chinese Pop Music

Jun. 2020 – Sept. 2020. Beijing, CHN. Summer internship in Beijing Deepmusic Technology Co. LTD Write literature review on song tempo/speed detection. Designing new model on tempo detection based on BiLSTM and Temporal Convolution Network(TCN) and compared them with the baselines of Librosa and MadMOM using the data provided by Renren Karaoke Company (more than 2000 songs manually marked by my colleagues). In the music with stable speed or with or a slightly slower ending, the accuracy of tempo recognition is above 87% without considering the double frequency (error is less than or equal to 0.

Research & implementation of Chinese flute playing technique recognition based on machine learning

Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Music object recognition and recording is the essential component of music information retrieval. Different from other fields of melody extraction and music transcription, the research on musical instrument technique detection is still in the early stage.

Chinese instrument recognition

Mar 2019 – Jun 2019. Beijing, CHN. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Main Information Set up a series of quartet database from DCMI database shared by China Conservatory of Music. Constructed an audio event detection model based on CRNN to detect and recognize instruments. Evaluated the percussion, recall rate and F-measure of the model and CNN baseline model, and compared the difference among different quartet databases generate from different music skills or music types.

基于卷积循环神经网的复音音乐中国民族乐器检测 (Detection of Chinese Instrumental Quartet based on CRNN)

CSMT. Dec. 26 -- Dec. 29, 2019. Haerbin, CHN.

Correspondence between Speech Melody and Pitch Contour in Sichuan Folk Song

Jul. 2019 – Sept. 2019. Rochester, NY, US. Research Assistant supervised by prof. DUAN Zhiyao, Deparment of Electronic Computer Engineer, University of Rochester. Main Information Made a literature review of linguistics papers on relations between speech melody and notes in tone language song. Set up a database on Sichuan folk song with music scores in MusicXML form, lyrics audio in Sichuan dialect in wav form and note-audio alignment character by character.