00:00:00
hello and welcome to More nerdy rodent
00:00:02
geekery voice to voice technology a in
00:00:06
case you're not aware of what this is it
00:00:08
basically allows you to change one voice
00:00:10
into another voice a bit like having an
00:00:13
AI voice changer and the best part
00:00:16
everything you need is now all in one
00:00:20
app plus it's really quick to train as
00:00:23
well so here it is the retrieval based
00:00:27
voice conversion web user interface
00:00:29
hardly a mouthful at all you may be
00:00:33
aware that AI singing voice conversion
00:00:35
can be a bit of a task as there are
00:00:38
multiple stages involved before you can
00:00:41
create your Masterpiece video with John
00:00:43
Cena dancing while listening to Abraham
00:00:46
Lincoln singing the very latest K-Pop
00:00:48
song first you need to collect a bunch
00:00:50
of voice samples process them trainer
00:00:52
model separate the vocals from the music
00:00:55
track you're changing if you don't
00:00:57
already have them separately run your
00:00:59
new AI model on those vocals and finally
00:01:02
mix them back in with the music
00:01:04
thankfully that can now all be done via
00:01:07
this one web interface and what is the
00:01:09
quality like well let's have a listen
00:01:12
I've used an example song from pixabay
00:01:14
there it is meaning that in less than 30
00:01:17
minutes of training time I can be the
00:01:19
one singing instead so let's take a
00:01:21
quick listen to a clip of the original
00:01:23
so we know what I'm going to convert
00:01:25
from
00:01:26
[Music]
00:01:35
and then now with that voice changed by
00:01:38
this AI to sound like me instead
00:01:41
[Music]
00:01:48
want to do this yourself then stick with
00:01:50
me here and I'll show you exactly how as
00:01:53
with anything python installation is an
00:01:56
absolute Breeze and the best part is
00:01:58
that it works on a range of operating
00:02:01
systems even Microsoft Windows here's a
00:02:05
little table with some of the
00:02:07
requirements
00:02:08
if you use Microsoft Windows sorry if
00:02:11
you are using that I do hope things get
00:02:13
better what you could do is download and
00:02:16
install 7-Zip download the
00:02:19
rvc-beta 7-Zip file from the hugging
00:02:22
face page unzip it and then use go
00:02:25
hyphen web.bat
00:02:28
a normal install can also be done just
00:02:31
like they have here though you may want
00:02:33
to download the 7-Zip archive anyway as
00:02:35
that has all the models in it personally
00:02:38
I did the normal install using an
00:02:40
anaconda virtual python 3.10 environment
00:02:43
as I like simple App Management there is
00:02:47
also a Google collab available if you
00:02:50
prefer to use Google collab so with
00:02:53
whatever installation method you chose
00:02:54
you should now have your web interface
00:02:56
up and running let's dive into this
00:02:58
fascinating world of voice to voice
00:03:00
technology and see what amazing things
00:03:03
we can create
00:03:06
if you already have a model you can do
00:03:08
model inference straight away or like me
00:03:11
you can begin with training one if you
00:03:14
don't there is the training tab however
00:03:16
before we delve into the training
00:03:19
process let's just quickly go over these
00:03:21
five tabs so first of all you've got
00:03:23
model inference you've got separation of
00:03:26
accompaniment and vocal train checkpoint
00:03:30
processing so you can mix checkpoints
00:03:32
together there export onnx which I've
00:03:34
never used and also an FAQ as well to
00:03:38
begin with as mentioned we're going to
00:03:40
start with the training tab as this is
00:03:42
where you will create your very first
00:03:44
voice model step one for the experiment
00:03:48
name simply enter the name you want to
00:03:50
give your project so you could do for
00:03:52
example nerdy because that's me as for
00:03:55
the sample rate I personally prefer
00:03:57
always using 40K and I always leave this
00:04:01
on true as well as that seems to be the
00:04:03
best model architecture you can select
00:04:06
either version 1 or version 2.
00:04:08
personally I prefer version two number
00:04:11
of threads I think is probably picked
00:04:13
automatically
00:04:14
congratulations you have now completed
00:04:17
step one the next step is step two a the
00:04:20
first thing it asks for here is the path
00:04:23
to the training directory if you're not
00:04:26
familiar with terms like files and
00:04:27
directories on your computer this part
00:04:29
can be quite confusing you could think
00:04:33
of directories as computer boxes where
00:04:36
you organize your things files in this
00:04:38
case and I've put them into a training
00:04:41
directory so there is my path training
00:04:44
nerd
00:04:45
if we have a quick look at that
00:04:47
directory as you can see it's absolutely
00:04:50
full of audio files
00:04:52
if your name is different you may wish
00:04:54
to use something else but it's entirely
00:04:57
up to you even though I'd already split
00:04:59
my samples up into around 250 segments
00:05:02
you don't actually need to worry too
00:05:04
much about that because this program
00:05:06
will automatically handle long audio and
00:05:10
split it accordingly generally speaking
00:05:13
between 10 and 50 minutes total audio is
00:05:16
required any vocals are fine singing
00:05:19
Talking whatever just make sure that you
00:05:21
don't have any music in the background
00:05:23
it should be all one person vocals only
00:05:28
okay so now you've put in the directory
00:05:30
with all your samples in you can just
00:05:32
click process data that will take a few
00:05:35
seconds and process all of the samples
00:05:37
for you
00:05:38
now you're ready to move on to step two
00:05:41
B if you have multiple graphics cards
00:05:44
then you can put them in here but I've
00:05:47
only got a single GPU so I just leave
00:05:50
that as is the defaults are absolutely
00:05:51
fine next you have pitch extraction
00:05:54
which has three options personally I
00:05:57
always go with Harvest PM is fast but
00:06:00
low quality do is a bit slower but
00:06:03
better quality and harvest is the
00:06:05
slowest but the best quality so with
00:06:08
Harvest selected there I just click
00:06:10
feature extraction that will take a few
00:06:12
seconds and finish that task
00:06:16
step three well here for the most part
00:06:18
you can just go ahead and click that one
00:06:20
click training button come back in about
00:06:22
10 minutes and you'll have a model
00:06:24
however if you are like me and you do
00:06:27
like to change things a little bit
00:06:28
you've got some options there for how
00:06:30
often you want to save the full model
00:06:32
the total number of epochs the GPU batch
00:06:35
size and some options for saving
00:06:38
personally the way I like to set this up
00:06:40
for a version 2 model is to set that to
00:06:43
10 total training epochs I do to 200
00:06:47
which is about the maximum you'll ever
00:06:49
need as I have a very large GPU I've got
00:06:53
24 gig of vram the batch size up to 40
00:06:55
as that's the maximum my GPU will handle
00:06:58
I like to click yes to only save the
00:07:01
latest checkpoint I'll keep cash all on
00:07:04
no and I say yes to save small finished
00:07:07
models
00:07:08
so with your model training via that one
00:07:11
click training I would suggest also
00:07:13
going and having a look over at the
00:07:15
frequently asked questions tab there's
00:07:18
quite a lot of information here
00:07:19
particularly useful are question nine
00:07:22
and question 10 how many total epochs
00:07:24
are optimal and how much training set
00:07:27
duration is needed
00:07:29
now that you've got your very first
00:07:31
voice model it's time to do that AI
00:07:34
voice to voice thing if you already have
00:07:36
the voice that you want to convert you
00:07:39
can skip straight to model inference
00:07:41
however if you want to do something like
00:07:43
change the singer of a song that you
00:07:46
don't have the Vocal Stems for like I
00:07:48
did here then you'll first need to
00:07:50
separate those vocals out from the
00:07:53
background music and this is where the
00:07:56
separation tab comes in handy
00:07:58
once again those files and directories
00:08:01
come into play as you'll need to know
00:08:03
where you've saved your music files the
00:08:06
first boxes if you want to convert
00:08:08
multiple files from a given directory as
00:08:11
I tend to do just one at a time I delete
00:08:13
that and then use the box underneath
00:08:15
instead
00:08:17
model selection has two options like it
00:08:20
says at the top there hp2 is for input
00:08:23
without Harmony or if with Harmony and
00:08:26
instructed vocals do not need harmony
00:08:28
use hp5 basically if you're unsure use
00:08:31
both have a listen to the output and see
00:08:34
which is best for you in my case I'm
00:08:37
going to use hp2 here
00:08:40
by default the output goes into the opt
00:08:43
directory so feel free to change that if
00:08:46
you like
00:08:47
when you're ready push the huge orange
00:08:49
convert button and you'll have split the
00:08:51
vocals from the music
00:08:54
let's have a quick listen to that
00:08:57
of course there's a few seconds of
00:08:58
Silence
00:09:03
[Music]
00:09:05
there we go anyway that's done quite
00:09:06
well we've got the vocals there without
00:09:08
the music
00:09:09
even if there is a little bit of an echo
00:09:12
or something there in the voice alright
00:09:14
so now we're ready to go with inference
00:09:17
the page does look huge but really it's
00:09:20
two things in one the top half there is
00:09:22
for single voice conversion and there
00:09:25
you've got a batch as well so I'll just
00:09:27
be going through the one the batch is
00:09:29
essentially the same but you're doing
00:09:30
loads at a time again everything is
00:09:33
pretty straightforward here push that
00:09:35
huge refresh button and then you should
00:09:38
see your options appear in this little
00:09:41
pull down here my list is absolutely
00:09:43
huge as all the girls would agree but
00:09:46
you'll probably only have one option in
00:09:49
there the first time so pick that I'm
00:09:52
going to pick that one because that's my
00:09:53
trained voice
00:09:54
next you have to select a pitch just
00:09:57
like it says above for low to high
00:09:59
conversion news plus 12 if it's about
00:10:01
the same use zero and for high to low
00:10:05
voice conversion use minus 12. the
00:10:08
source voice in this case is quite High
00:10:10
my voice is a bit lower so I'm going to
00:10:13
use minus 12.
00:10:15
once again those files and directories
00:10:18
come into play here so put the path to
00:10:21
your vocals in if you did that default
00:10:23
voice separation then you'll have the
00:10:25
two files in your opt directory you want
00:10:29
the one which starts vocal so there in
00:10:31
my opt directory I have the long name of
00:10:34
that WAV file the one that starts with
00:10:36
vocal for pitch extraction again PM is
00:10:39
Fast and The Harvest is best so I like
00:10:42
to select Harvest everything else I
00:10:44
leave at the default apart from this
00:10:46
path to index which should have a pull
00:10:49
down menu there is the one that I want
00:10:51
to use because it matches that inference
00:10:54
voice
00:10:54
okay so now you can go ahead and click
00:10:57
that very tiny convert button and in
00:11:00
just a few seconds you should have your
00:11:02
output
00:11:04
and there it is
00:11:07
[Music]
00:11:14
yeah there we go that's pretty cool
00:11:16
that's pretty cool that's me now you can
00:11:19
right click that save audio as I'm going
00:11:22
to put it in my opt directory as well
00:11:24
I'm using audacity here I've got the
00:11:26
instrumental so I just drag that other
00:11:29
voice in and then I can file exporters
00:11:32
whatever I want and it will mix those
00:11:34
two voices together
00:11:36
thank you
00:11:40
on your bones
00:11:47
[Music]
00:11:56
plus if you thought that was cool then
00:11:59
you may also like this nerdy rodent
00:12:01
video