Abstract: The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter.
09/2021 – 07/2022
Research Assistant, Supervised by Prof. Richard Stern, Carnegie Mellon University
Constructed 2-layer learnable front ends in Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters. Examined the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Analysis of the model performance among tags with different genres and instrument tags.
May 2020 – Aug. 2021. Beijing, CHN. Summer internship in Tencent Holdings Limited. (Beijing)
Write literature review on coversong detection.
Reproduced and refined music separation & covering detection.
Evaluating different ASR models on business data.
Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Supervised by prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology, PKU. see at project for more information.
Jun. 2020 – Sept. 2020. Beijing, CHN. Summer internship in Beijing Deepmusic Technology Co. LTD Write literature review on song tempo/speed detection.
Designing new model on tempo detection based on BiLSTM and Temporal Convolution Network(TCN) and compared them with the baselines of Librosa and MadMOM using the data provided by Renren Karaoke Company (more than 2000 songs manually marked by my colleagues).
In the music with stable speed or with or a slightly slower ending, the accuracy of tempo recognition is above 87% without considering the double frequency (error is less than or equal to 0.
Feb. 2020 – May 2020. Beijing, CHN. One of the graduation theses that awarded the outstanding paper honor of School of Mathematical Science, Peking University. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Music object recognition and recording is the essential component of music information retrieval. Different from other fields of melody extraction and music transcription, the research on musical instrument technique detection is still in the early stage.
Mar 2019 – Jun 2019. Beijing, CHN. Research Assistant for prof. CHEN Xiaoou in Wangxuan Institute of Computer Technology at Peking University. Main Information
Set up a series of quartet database from DCMI database shared by China Conservatory of Music. Constructed an audio event detection model based on CRNN to detect and recognize instruments. Evaluated the percussion, recall rate and F-measure of the model and CNN baseline model, and compared the difference among different quartet databases generate from different music skills or music types.
Jul. 2019 – Sept. 2019. Rochester, NY, US. Research Assistant supervised by prof. DUAN Zhiyao, Deparment of Electronic Computer Engineer, University of Rochester. Main Information
Made a literature review of linguistics papers on relations between speech melody and notes in tone language song. Set up a database on Sichuan folk song with music scores in MusicXML form, lyrics audio in Sichuan dialect in wav form and note-audio alignment character by character.