RVC

RVC

Learn how to use the rvc_cli.py script to perform various operations with RVC.

Usage

To use the RVC CLI, navigate to the directory containing rvc_cli.py in your terminal and execute the script using the following syntax:

python rvc_cli.py <mode> [arguments]

Replace <mode> with the desired mode of operation (e.g., infer, train, index) and provide the necessary arguments. For a detailed list of arguments available for each mode, run:

python rvc_cli.py <mode> -h

This will display a help message with explanations for each argument.

Modes

Infer

Performs a voice cloning conversion on a single audio file.

ArgumentDescriptionTypeDefaultRequired
--pitchSet the pitch of the audio. Higher values result in a higher pitch.int0No
--filter_radiusApply median filtering to the extracted pitch values if this value is greater than or equal to three. This can help reduce breathiness in the output audio.int3No
--index_rateControl the influence of the index file on the output. Higher values mean stronger influence. Lower values can help reduce artifacts but may result in less accurate voice cloning.float0.3No
--volume_envelopeControl the blending of the output's volume envelope. A value of 1 means the output envelope is fully used.float1No
--protectProtect consonants and breathing sounds from artifacts. A value of 0.5 offers the strongest protection, while lower values may reduce the protection level but potentially mitigate the indexing effect.float0.33No
--hop_lengthOnly applicable for the Crepe pitch extraction method. Determines the time it takes for the system to react to a significant pitch change. Smaller values require more processing time but can lead to better pitch accuracy.int128No
--f0_methodChoose the pitch extraction algorithm for the conversion. 'rmvpe' is the default and generally recommended.strrmvpeNo
--input_pathFull path to the input audio file.strYes
--output_pathFull path to the output audio file.strYes
--pth_pathFull path to the RVC model file (.pth).strYes
--index_pathFull path to the index file (.index).strYes
--split_audioSplit the audio into smaller segments before inference. This can improve the quality of the output for longer audio files.boolFalseNo
--f0_autotuneApply a light autotune to the inferred audio. Particularly useful for singing voice conversions.boolFalseNo
--clean_audioClean the output audio using noise reduction algorithms. Recommended for speech conversions.boolFalseNo
--clean_strengthAdjust the intensity of the audio cleaning process. Higher values result in stronger cleaning, but may lead to a more compressed sound.float0.7No
--export_formatSelect the desired output audio format.strWAVNo
--embedder_modelChoose the model used for generating speaker embeddings.strcontentvecNo
--embedder_model_customSpecify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.strNoneNo
--upscale_audioUpscale the input audio to a higher quality before processing. This can improve the overall quality of the output, especially for low-quality input audio.boolFalseNo
--f0_fileFull path to an external F0 file (.f0). This allows you to use pre-computed pitch values for the input audio.strNoneNo

Batch Infer

Performs real-time voice cloning on all supported audio files within a specified folder. This mode utilizes the same arguments as the infer mode, except it requires an --input_folder and --output_folder instead of --input_path and --output_path, respectively.

ArgumentDescriptionTypeRequired
--input_folderPath to the folder containing input audio files.strYes
--output_folderPath to the folder for saving output audio files.strYes

TTS

Synthesizes text into speech using the specified voice and then applies voice conversion using the provided RVC model.

ArgumentDescriptionTypeDefaultRequired
--tts_textText to be synthesized.strYes
--tts_voiceVoice to be used for TTS synthesis. Refer to Microsoft's TTS voice list (opens in a new tab) for available options.strYes
--tts_rateControl the speaking rate of the TTS. Values range from -100 (slower) to 100 (faster).int0No
--output_tts_pathFull path to save the synthesized TTS audio.strYes
--output_rvc_pathFull path to save the voice-converted audio using the synthesized TTS.strYes

This mode utilizes the same arguments as the infer mode for voice conversion settings.

Preprocess

Preprocesses a dataset for training an RVC model.

ArgumentDescriptionTypeRequired
--model_nameName of the model to be trained.strYes
--dataset_pathPath to the dataset directory.strYes
--sample_rateTarget sampling rate for the audio data.intYes
--cpu_coresNumber of CPU cores to use for preprocessing.intNo

Extract

Extracts features from a dataset for training an RVC model.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the model.strYes
--rvc_versionVersion of the RVC model ('v1' or 'v2').strv2No
--f0_methodPitch extraction method to use.strrmvpeNo
--pitch_guidanceEnable or disable pitch guidance during feature extraction.boolTrueNo
--hop_lengthHop length for feature extraction. Only applicable for Crepe pitch extraction.int128No
--cpu_coresNumber of CPU cores to use for feature extraction (optional).intNoneNo
--sample_rateTarget sampling rate for the audio data.intYes
--embedder_modelChoose the model used for generating speaker embeddings.strcontentvecNo
--embedder_model_customSpecify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.strNoneNo

Train

Trains an RVC model.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the model to be trained.strYes
--rvc_versionVersion of the RVC model to train ('v1' or 'v2').strv2No
--save_every_epochSave the model every specified number of epochs.intYes
--save_only_latestSave only the latest model checkpoint.boolFalseNo
--save_every_weightsSave model weights every epoch.boolTrueNo
--total_epochTotal number of epochs to train for.int1000No
--sample_rateSampling rate of the training data.intYes
--batch_sizeBatch size for training.int8No
--gpuGPU device to use for training (e.g., '0').str0No
--pitch_guidanceEnable or disable pitch guidance during training.boolTrueNo
--pretrainedUse a pretrained model for initialization.boolTrueNo
--custom_pretrainedUse a custom pretrained model.boolFalseNo
--g_pretrained_pathPath to the pretrained generator model file.strNoneNo
--d_pretrained_pathPath to the pretrained discriminator model file.strNoneNo
--overtraining_detectorEnable overtraining detection.boolFalseNo
--overtraining_thresholdThreshold for overtraining detection.int50No
--sync_graphEnable graph synchronization for distributed training.boolFalseNo
--cache_data_in_gpuCache training data in GPU memory.boolFalseNo

Index

Generates an index file for an RVC model.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the model.strYes
--rvc_versionVersion of the RVC model ('v1' or 'v2').strv2No

Model Extract

Extracts a checkpoint of the trained model.

ArgumentDescriptionTypeDefaultRequired
--pth_pathPath to the main .pth model file.strYes
--model_nameName of the model.strYes
--sample_rateSampling rate of the extracted model.intYes
--pitch_guidanceEnable or disable pitch guidance for the extracted model.boolYes
--rvc_versionVersion of the extracted RVC model ('v1' or 'v2').strv2No
--epochEpoch number to extract from the model.intYes
--stepStep number to extract from the model (optional).intNoneNo

Model Information

Displays information about a trained model.

ArgumentDescriptionTypeRequired
--pth_pathPath to the .pth model file.strYes

Model Blender

Fuses two RVC models together.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the new fused model.strYes
--pth_path_1Path to the first .pth model file.strYes
--pth_path_2Path to the second .pth model file.strYes
--ratioRatio for blending the two models (0.0 to 1.0).float0.5No

Tensorboard

Launches TensorBoard for monitoring training progress. This mode requires no arguments.

Download

Downloads a model from a provided link.

ArgumentDescriptionTypeRequired
--model_linkDirect link to the model file.strYes

Prerequisites

Installs prerequisites for RVC.

ArgumentDescriptionTypeDefaultRequired
--pretraineds_v1Download pretrained models for RVC v1.boolTrueNo
--pretraineds_v2Download pretrained models for RVC v2.boolTrueNo
--modelsDownload additional models.boolTrueNo
--exeDownload required executables.boolTrueNo

Audio Analyzer

Analyzes an audio file and displays its information.

ArgumentDescriptionTypeRequired
--input_pathPath to the input audio file.strYes

Examples

Here are a few examples of how to use the RVC CLI:

  • Inferring voice on an audio file:
python rvc_cli.py infer --pitch 5 --input_path "path/to/input.wav" --output_path "path/to/output.wav" --pth_path "path/to/model.pth" --index_path "path/to/index.index"
  • Training a new RVC model:
python rvc_cli.py train --model_name "my_model" --dataset_path "path/to/dataset" --sample_rate 48000 --total_epoch 500 --gpu 0
  • Generating an index file for a trained model:
python rvc_cli.py index --model_name "my_model"