Use AI to Clone ANY Voice & Sing ANY Song for FREE | RVC WebUI Tutorial

00:12:14
https://www.youtube.com/watch?v=-JcvdDErkAU

Zusammenfassung

TLDRThis video introduces a web interface for voice-to-voice AI technology, which enables users to transform one voice into another seamlessly. The process, although previously complex, is simplified with this interface, which consolidates necessary steps like collecting voice samples, training models, and vocal separation. Users can efficiently install the application on multiple OS platforms, including Windows, with simple steps using Python, 7-Zip, and Anaconda. The video guides through the use of the interface including model inference, pitch extraction, and training configurations, aimed at achieving high-quality voice conversion. It further explains how to process audio files to ensure clean vocal training samples and offers an FAQ section for additional support. With only 30 minutes of training time, users can achieve high-quality voice conversion results, transforming not only static voices but also dynamic singing performances, making it particularly suitable for creative applications like AI singing voice conversions.

Mitbringsel

  • 🎤 This voice conversion interface can transform voices using AI technology.
  • 🖥️ The application is compatible with multiple operating systems, including Windows.
  • 🛠️ Installation is simplified through Python, Anaconda, and 7-Zip tools.
  • ⏱️ Training a voice model can take as little as 30 minutes.
  • 🎶 Users can separate vocals from music tracks for conversion purposes.
  • ⚙️ Features a user-friendly interface with tabs for different processing tasks.
  • 🔎 Offers pitch extraction methods, with Harvest providing the best quality.
  • 📁 Organizes audio training samples efficiently for voice model creation.
  • 💾 Provides options for single and batch voice conversions.
  • 📚 Includes an FAQ section for user guidance and troubleshooting.

Zeitleiste

  • 00:00:00 - 00:05:00

    The video introduces a new all-in-one app for voice-to-voice conversion, emphasizing its quick training process. Initially, the host explains the complexity of AI singing voice conversion, involving various stages like collecting voice samples, training models, separating vocals from music, and finally mixing them back. The host demonstrates this by using a retrieval-based voice conversion web interface, showcasing how in under 30 minutes, one's voice can be used to sing an example song from Pixabay. They then transition to showing how to set up the app on various operating systems, including Windows, and recommend using certain installation methods like Anaconda for ease of management.

  • 00:05:00 - 00:12:14

    The host guides the viewer through setting up their first voice model using the app, detailing steps like setting experiment names, sample rates, and utilizing the training directory. The app can process long audio files, requiring only individual vocal samples without background music. The host recommends settings for pitch extraction and training configuration, ensuring high-quality output using a GPU. After training, the model is ready for inference, allowing conversion of voice in songs without separate Vocal Stems. Separation of vocals and music is also discussed, with guidance on directory management and selecting the right model options. Finally, the host demonstrates using pitch settings to match voice characteristics, leading to successful voice conversion.

Mind Map

Video-Fragen und Antworten

  • What is voice-to-voice technology?

    Voice-to-voice technology allows the transformation of one voice into another, often using AI-based applications.

  • What applications does voice-to-voice technology have?

    It can be used for AI voice changers, voice acting, music production, and any application requiring voice transformation.

  • How simple is the installation of this voice conversion web interface?

    The installation process is user-friendly and works on various operating systems, including Windows, with options for a normal setup or using tools like Anaconda or Google Colab.

  • What is the typical training time required for this voice conversion technology?

    Training can take as little as 30 minutes to create a voice model using the provided web interface.

  • What are the key steps in using the voice conversion interface?

    Key steps include model inference, separating accompaniment from vocals, training checkpoints, and adjusting settings for optimal voice conversion.

  • What is the best pitch extraction method according to the video?

    The Harvest method is recommended for the best quality, though it is slower compared to alternatives like PM and DIO.

  • Can I convert a song without having separate vocal stems?

    Yes, the interface provides tools to separate vocals from background music to facilitate conversion.

  • What type of audio files are required for training?

    Audio files should contain only vocal segments without any music background for effective training.

  • Is there support available for those working with the interface for the first time?

    The interface includes an FAQ section, which provides detailed guidance and addresses common questions.

  • What are some installation tools suggested for Windows users?

    Tools like 7-Zip and Anaconda, along with Google Colab, are suggested for installation on Windows platforms.

Weitere Video-Zusammenfassungen anzeigen

Erhalten Sie sofortigen Zugang zu kostenlosen YouTube-Videozusammenfassungen, die von AI unterstützt werden!
Untertitel
en
Automatisches Blättern:
  • 00:00:00
    hello and welcome to More nerdy rodent
  • 00:00:02
    geekery voice to voice technology a in
  • 00:00:06
    case you're not aware of what this is it
  • 00:00:08
    basically allows you to change one voice
  • 00:00:10
    into another voice a bit like having an
  • 00:00:13
    AI voice changer and the best part
  • 00:00:16
    everything you need is now all in one
  • 00:00:20
    app plus it's really quick to train as
  • 00:00:23
    well so here it is the retrieval based
  • 00:00:27
    voice conversion web user interface
  • 00:00:29
    hardly a mouthful at all you may be
  • 00:00:33
    aware that AI singing voice conversion
  • 00:00:35
    can be a bit of a task as there are
  • 00:00:38
    multiple stages involved before you can
  • 00:00:41
    create your Masterpiece video with John
  • 00:00:43
    Cena dancing while listening to Abraham
  • 00:00:46
    Lincoln singing the very latest K-Pop
  • 00:00:48
    song first you need to collect a bunch
  • 00:00:50
    of voice samples process them trainer
  • 00:00:52
    model separate the vocals from the music
  • 00:00:55
    track you're changing if you don't
  • 00:00:57
    already have them separately run your
  • 00:00:59
    new AI model on those vocals and finally
  • 00:01:02
    mix them back in with the music
  • 00:01:04
    thankfully that can now all be done via
  • 00:01:07
    this one web interface and what is the
  • 00:01:09
    quality like well let's have a listen
  • 00:01:12
    I've used an example song from pixabay
  • 00:01:14
    there it is meaning that in less than 30
  • 00:01:17
    minutes of training time I can be the
  • 00:01:19
    one singing instead so let's take a
  • 00:01:21
    quick listen to a clip of the original
  • 00:01:23
    so we know what I'm going to convert
  • 00:01:25
    from
  • 00:01:26
    [Music]
  • 00:01:35
    and then now with that voice changed by
  • 00:01:38
    this AI to sound like me instead
  • 00:01:41
    [Music]
  • 00:01:48
    want to do this yourself then stick with
  • 00:01:50
    me here and I'll show you exactly how as
  • 00:01:53
    with anything python installation is an
  • 00:01:56
    absolute Breeze and the best part is
  • 00:01:58
    that it works on a range of operating
  • 00:02:01
    systems even Microsoft Windows here's a
  • 00:02:05
    little table with some of the
  • 00:02:07
    requirements
  • 00:02:08
    if you use Microsoft Windows sorry if
  • 00:02:11
    you are using that I do hope things get
  • 00:02:13
    better what you could do is download and
  • 00:02:16
    install 7-Zip download the
  • 00:02:19
    rvc-beta 7-Zip file from the hugging
  • 00:02:22
    face page unzip it and then use go
  • 00:02:25
    hyphen web.bat
  • 00:02:28
    a normal install can also be done just
  • 00:02:31
    like they have here though you may want
  • 00:02:33
    to download the 7-Zip archive anyway as
  • 00:02:35
    that has all the models in it personally
  • 00:02:38
    I did the normal install using an
  • 00:02:40
    anaconda virtual python 3.10 environment
  • 00:02:43
    as I like simple App Management there is
  • 00:02:47
    also a Google collab available if you
  • 00:02:50
    prefer to use Google collab so with
  • 00:02:53
    whatever installation method you chose
  • 00:02:54
    you should now have your web interface
  • 00:02:56
    up and running let's dive into this
  • 00:02:58
    fascinating world of voice to voice
  • 00:03:00
    technology and see what amazing things
  • 00:03:03
    we can create
  • 00:03:06
    if you already have a model you can do
  • 00:03:08
    model inference straight away or like me
  • 00:03:11
    you can begin with training one if you
  • 00:03:14
    don't there is the training tab however
  • 00:03:16
    before we delve into the training
  • 00:03:19
    process let's just quickly go over these
  • 00:03:21
    five tabs so first of all you've got
  • 00:03:23
    model inference you've got separation of
  • 00:03:26
    accompaniment and vocal train checkpoint
  • 00:03:30
    processing so you can mix checkpoints
  • 00:03:32
    together there export onnx which I've
  • 00:03:34
    never used and also an FAQ as well to
  • 00:03:38
    begin with as mentioned we're going to
  • 00:03:40
    start with the training tab as this is
  • 00:03:42
    where you will create your very first
  • 00:03:44
    voice model step one for the experiment
  • 00:03:48
    name simply enter the name you want to
  • 00:03:50
    give your project so you could do for
  • 00:03:52
    example nerdy because that's me as for
  • 00:03:55
    the sample rate I personally prefer
  • 00:03:57
    always using 40K and I always leave this
  • 00:04:01
    on true as well as that seems to be the
  • 00:04:03
    best model architecture you can select
  • 00:04:06
    either version 1 or version 2.
  • 00:04:08
    personally I prefer version two number
  • 00:04:11
    of threads I think is probably picked
  • 00:04:13
    automatically
  • 00:04:14
    congratulations you have now completed
  • 00:04:17
    step one the next step is step two a the
  • 00:04:20
    first thing it asks for here is the path
  • 00:04:23
    to the training directory if you're not
  • 00:04:26
    familiar with terms like files and
  • 00:04:27
    directories on your computer this part
  • 00:04:29
    can be quite confusing you could think
  • 00:04:33
    of directories as computer boxes where
  • 00:04:36
    you organize your things files in this
  • 00:04:38
    case and I've put them into a training
  • 00:04:41
    directory so there is my path training
  • 00:04:44
    nerd
  • 00:04:45
    if we have a quick look at that
  • 00:04:47
    directory as you can see it's absolutely
  • 00:04:50
    full of audio files
  • 00:04:52
    if your name is different you may wish
  • 00:04:54
    to use something else but it's entirely
  • 00:04:57
    up to you even though I'd already split
  • 00:04:59
    my samples up into around 250 segments
  • 00:05:02
    you don't actually need to worry too
  • 00:05:04
    much about that because this program
  • 00:05:06
    will automatically handle long audio and
  • 00:05:10
    split it accordingly generally speaking
  • 00:05:13
    between 10 and 50 minutes total audio is
  • 00:05:16
    required any vocals are fine singing
  • 00:05:19
    Talking whatever just make sure that you
  • 00:05:21
    don't have any music in the background
  • 00:05:23
    it should be all one person vocals only
  • 00:05:28
    okay so now you've put in the directory
  • 00:05:30
    with all your samples in you can just
  • 00:05:32
    click process data that will take a few
  • 00:05:35
    seconds and process all of the samples
  • 00:05:37
    for you
  • 00:05:38
    now you're ready to move on to step two
  • 00:05:41
    B if you have multiple graphics cards
  • 00:05:44
    then you can put them in here but I've
  • 00:05:47
    only got a single GPU so I just leave
  • 00:05:50
    that as is the defaults are absolutely
  • 00:05:51
    fine next you have pitch extraction
  • 00:05:54
    which has three options personally I
  • 00:05:57
    always go with Harvest PM is fast but
  • 00:06:00
    low quality do is a bit slower but
  • 00:06:03
    better quality and harvest is the
  • 00:06:05
    slowest but the best quality so with
  • 00:06:08
    Harvest selected there I just click
  • 00:06:10
    feature extraction that will take a few
  • 00:06:12
    seconds and finish that task
  • 00:06:16
    step three well here for the most part
  • 00:06:18
    you can just go ahead and click that one
  • 00:06:20
    click training button come back in about
  • 00:06:22
    10 minutes and you'll have a model
  • 00:06:24
    however if you are like me and you do
  • 00:06:27
    like to change things a little bit
  • 00:06:28
    you've got some options there for how
  • 00:06:30
    often you want to save the full model
  • 00:06:32
    the total number of epochs the GPU batch
  • 00:06:35
    size and some options for saving
  • 00:06:38
    personally the way I like to set this up
  • 00:06:40
    for a version 2 model is to set that to
  • 00:06:43
    10 total training epochs I do to 200
  • 00:06:47
    which is about the maximum you'll ever
  • 00:06:49
    need as I have a very large GPU I've got
  • 00:06:53
    24 gig of vram the batch size up to 40
  • 00:06:55
    as that's the maximum my GPU will handle
  • 00:06:58
    I like to click yes to only save the
  • 00:07:01
    latest checkpoint I'll keep cash all on
  • 00:07:04
    no and I say yes to save small finished
  • 00:07:07
    models
  • 00:07:08
    so with your model training via that one
  • 00:07:11
    click training I would suggest also
  • 00:07:13
    going and having a look over at the
  • 00:07:15
    frequently asked questions tab there's
  • 00:07:18
    quite a lot of information here
  • 00:07:19
    particularly useful are question nine
  • 00:07:22
    and question 10 how many total epochs
  • 00:07:24
    are optimal and how much training set
  • 00:07:27
    duration is needed
  • 00:07:29
    now that you've got your very first
  • 00:07:31
    voice model it's time to do that AI
  • 00:07:34
    voice to voice thing if you already have
  • 00:07:36
    the voice that you want to convert you
  • 00:07:39
    can skip straight to model inference
  • 00:07:41
    however if you want to do something like
  • 00:07:43
    change the singer of a song that you
  • 00:07:46
    don't have the Vocal Stems for like I
  • 00:07:48
    did here then you'll first need to
  • 00:07:50
    separate those vocals out from the
  • 00:07:53
    background music and this is where the
  • 00:07:56
    separation tab comes in handy
  • 00:07:58
    once again those files and directories
  • 00:08:01
    come into play as you'll need to know
  • 00:08:03
    where you've saved your music files the
  • 00:08:06
    first boxes if you want to convert
  • 00:08:08
    multiple files from a given directory as
  • 00:08:11
    I tend to do just one at a time I delete
  • 00:08:13
    that and then use the box underneath
  • 00:08:15
    instead
  • 00:08:17
    model selection has two options like it
  • 00:08:20
    says at the top there hp2 is for input
  • 00:08:23
    without Harmony or if with Harmony and
  • 00:08:26
    instructed vocals do not need harmony
  • 00:08:28
    use hp5 basically if you're unsure use
  • 00:08:31
    both have a listen to the output and see
  • 00:08:34
    which is best for you in my case I'm
  • 00:08:37
    going to use hp2 here
  • 00:08:40
    by default the output goes into the opt
  • 00:08:43
    directory so feel free to change that if
  • 00:08:46
    you like
  • 00:08:47
    when you're ready push the huge orange
  • 00:08:49
    convert button and you'll have split the
  • 00:08:51
    vocals from the music
  • 00:08:54
    let's have a quick listen to that
  • 00:08:57
    of course there's a few seconds of
  • 00:08:58
    Silence
  • 00:09:03
    [Music]
  • 00:09:05
    there we go anyway that's done quite
  • 00:09:06
    well we've got the vocals there without
  • 00:09:08
    the music
  • 00:09:09
    even if there is a little bit of an echo
  • 00:09:12
    or something there in the voice alright
  • 00:09:14
    so now we're ready to go with inference
  • 00:09:17
    the page does look huge but really it's
  • 00:09:20
    two things in one the top half there is
  • 00:09:22
    for single voice conversion and there
  • 00:09:25
    you've got a batch as well so I'll just
  • 00:09:27
    be going through the one the batch is
  • 00:09:29
    essentially the same but you're doing
  • 00:09:30
    loads at a time again everything is
  • 00:09:33
    pretty straightforward here push that
  • 00:09:35
    huge refresh button and then you should
  • 00:09:38
    see your options appear in this little
  • 00:09:41
    pull down here my list is absolutely
  • 00:09:43
    huge as all the girls would agree but
  • 00:09:46
    you'll probably only have one option in
  • 00:09:49
    there the first time so pick that I'm
  • 00:09:52
    going to pick that one because that's my
  • 00:09:53
    trained voice
  • 00:09:54
    next you have to select a pitch just
  • 00:09:57
    like it says above for low to high
  • 00:09:59
    conversion news plus 12 if it's about
  • 00:10:01
    the same use zero and for high to low
  • 00:10:05
    voice conversion use minus 12. the
  • 00:10:08
    source voice in this case is quite High
  • 00:10:10
    my voice is a bit lower so I'm going to
  • 00:10:13
    use minus 12.
  • 00:10:15
    once again those files and directories
  • 00:10:18
    come into play here so put the path to
  • 00:10:21
    your vocals in if you did that default
  • 00:10:23
    voice separation then you'll have the
  • 00:10:25
    two files in your opt directory you want
  • 00:10:29
    the one which starts vocal so there in
  • 00:10:31
    my opt directory I have the long name of
  • 00:10:34
    that WAV file the one that starts with
  • 00:10:36
    vocal for pitch extraction again PM is
  • 00:10:39
    Fast and The Harvest is best so I like
  • 00:10:42
    to select Harvest everything else I
  • 00:10:44
    leave at the default apart from this
  • 00:10:46
    path to index which should have a pull
  • 00:10:49
    down menu there is the one that I want
  • 00:10:51
    to use because it matches that inference
  • 00:10:54
    voice
  • 00:10:54
    okay so now you can go ahead and click
  • 00:10:57
    that very tiny convert button and in
  • 00:11:00
    just a few seconds you should have your
  • 00:11:02
    output
  • 00:11:04
    and there it is
  • 00:11:07
    [Music]
  • 00:11:14
    yeah there we go that's pretty cool
  • 00:11:16
    that's pretty cool that's me now you can
  • 00:11:19
    right click that save audio as I'm going
  • 00:11:22
    to put it in my opt directory as well
  • 00:11:24
    I'm using audacity here I've got the
  • 00:11:26
    instrumental so I just drag that other
  • 00:11:29
    voice in and then I can file exporters
  • 00:11:32
    whatever I want and it will mix those
  • 00:11:34
    two voices together
  • 00:11:36
    thank you
  • 00:11:40
    on your bones
  • 00:11:47
    [Music]
  • 00:11:56
    plus if you thought that was cool then
  • 00:11:59
    you may also like this nerdy rodent
  • 00:12:01
    video
Tags
  • AI
  • voice conversion
  • web interface
  • voice changer
  • training model
  • pitch extraction
  • Python installation
  • vocal separation
  • audio processing
  • high-quality conversion