Using Colab to achieve AI singing.

!!! Explanation:

This tutorial is only for AI learning and communication.
To avoid infringement, all data and models in this project need to be resolved by yourself.
Technology itself is neither good nor bad, but it is prohibited to use technology for illegal activities.

Intro#

I believe that in the past few days, you must have been flooded with videos of "obscure singers" on various video platforms, such as Sun Yanzi. AI has synthesized the realistic voice of Sun Yanzi and used it to sing other songs. Yes, the distorted version of "Wumeizi Sauce" above is achieved using this technology. Today, we will briefly introduce the technology behind it and teach you how to use Colab to achieve singing covers.

Project Introduction#

"Sovits" (So-vits-svc) is an open-source and free AI voice conversion software developed by Rcell, a Chinese amateur voice synthesis enthusiast, based on a series of projects such as VITS, soft-vc, and VISinger2. It can reproduce the timbre and can be simply understood as a powerful voice changer.

Introduction to Colab#

Why use Colab#

Friends with good computer performance can train on their own computers (must have an N card). My computer is a lightweight laptop and cannot run this project, so I use Google Colab for singing cover demonstrations.

What is Colab#

Simply put, Colab is an online computing power platform provided by Google for developers. For people like me who have computing power requirements during learning but cannot meet them with personal computers, Colab can be used.

Colab can be used for free or paid. The free version has slightly lower performance, and the paid version is billed based on computing power, but it is not expensive. I used it to run "stable diffusion" before, but later too many people used it for free, so Google prohibited free users from running stable diffusion on Colab. So I don't know when Google will restrict the use of this for singing training.

Data and Model Preparation#

In addition to AI singing covers, this project can also make AI repeat what you say, just like a voice changer. You can train your own model. Here, I will only teach you how to use the model for singing covers, using Li Ronghao's "Wumeizi Sauce" as an example.

First, you need to prepare the song you want to cover. Since this project is only for singing covers, you need to separate the vocals. You can use this online tool for separation.

Download the separated vocals and background music. Now you only need to use the vocals for singing covers. After the cover is done, you can combine the vocals and background music.

A song is usually three to four minutes long, and the graphics card performance is often insufficient. Therefore, you need to slice the vocals into segments, each controlled within one minute, and train them separately. Finally, combine them together.

Download the pre-trained voice model of Sun Yanzi

Vocal Separation

Please download the data and models yourself.

Okay, now let's start learning how to use Colab for AI singing covers.

Open the Project#

First, open the GitHub project address, go to the bottom, and find "Colab notebook scripts". Click on the arrow pointing to the cover link, and the other one is the training project.

Project Address

You can see that the Colab notebook page is similar to Jupyter that I mentioned before, it's actually the same. Since this is someone else's notebook, we need to click "Save a copy to Drive" and save it to our own Google Drive.

Save a Copy

Save a Copy 2

Configuration#

After saving, we need to check if our project is running on a GPU. You can click "Connect" and then run, which is equivalent to running the server, or you can directly click the first run. The "Tesla T4" here is the model of the GPU, it could be another graphics card model, which is automatically assigned by Google based on the current computing power requirements.

GPU Check

GPU

Next, run these two configuration codes one by one. The free machine runs slowly, so please be patient and wait until "Setup 1" is completed before running "Setup 2". After that, continue to run the code to download ContentVec and Hugging Face. You can see that the download speed is very fast.

ContentVec

After the HF model is downloaded, you can click on the list to download a specific model. I used the AI model of Sun Yanzi, so I need to upload other models.

Connect Drive and Upload Data#

Click on the Drive button in the upper left corner, and you will be prompted to run the code to connect to Drive. Follow the instructions to run and authorize. This connects the project to your Drive, not authorizing third parties, so you can use it with confidence.

Next, open your Google Drive, upload your own model, and click the share button to open the permissions, allowing anyone who receives this sharing link to use it. Copy the sharing link and paste it into the box below, then run it to automatically download the model.

Sharing Link

Sharing Link 2

Then run the decompression program below to unzip our model.

Unzip Model

Training#

Upload the sliced vocal audio files to the "raw" folder, set the parameters, and click "Convert" to start training.

Parameters

Start Training

Tips:

Make each audio segment as small as possible, no more than 1 minute, preferably around 40 seconds.

Only upload one audio segment at a time. After training is completed, download it and then upload the next segment for training.

Start with default parameters and adjust them slowly based on the training results.

Summary#

Today, we briefly introduced the "Sovits" project and used Colab to sing covers of our favorite songs. You can try it yourself. If you are interested, you can use the GitHub project to train your favorite models. There are also tutorials on Bilibili, but be sure to avoid illegal activities and infringement. Just have fun and learn AI.