Usage
RVC

RVC CLI Documentation

Learn how to use the rvc_cli.py script to perform various operations with Real-Time Voice Cloning (RVC).

Usage

To use the RVC CLI, navigate to the directory containing rvc_cli.py in your terminal and execute the script using the following syntax:

python rvc_cli.py <mode> [arguments]

Replace <mode> with the desired mode of operation (e.g., infer, train, index) and provide the necessary arguments. For a detailed list of arguments available for each mode, run:

python rvc_cli.py <mode> -h

This will display a help message with explanations for each argument.

Modes

Infer

Performs a voice cloning conversion on a single audio file.

ArgumentDescriptionTypeDefaultRequired
--pitchSet the pitch of the audio. Higher values result in a higher pitch.int0No
--filter_radiusApply median filtering to the extracted pitch values if this value is greater than or equal to three. This can help reduce breathiness in the output audio.int3No
--index_rateControl the influence of the index file on the output. Higher values mean stronger influence. Lower values can help reduce artifacts but may result in less accurate voice cloning.float0.3No
--volume_envelopeControl the blending of the output's volume envelope. A value of 1 means the output envelope is fully used.float1No
--protectProtect consonants and breathing sounds from artifacts. A value of 0.5 offers the strongest protection, while lower values may reduce the protection level but potentially mitigate the indexing effect.float0.33No
--hop_lengthOnly applicable for the Crepe pitch extraction method. Determines the time it takes for the system to react to a significant pitch change. Smaller values require more processing time but can lead to better pitch accuracy.int128No
--f0_methodChoose the pitch extraction algorithm for the conversion. 'rmvpe' is the default and generally recommended.strrmvpeNo
--input_pathFull path to the input audio file.strYes
--output_pathFull path to the output audio file.strYes
--pth_pathFull path to the RVC model file (.pth).strYes
--index_pathFull path to the index file (.index).strYes
--split_audioSplit the audio into smaller segments before inference. This can improve the quality of the output for longer audio files.boolFalseNo
--f0_autotuneApply a light autotune to the inferred audio. Particularly useful for singing voice conversions.boolFalseNo
--clean_audioClean the output audio using noise reduction algorithms. Recommended for speech conversions.boolFalseNo
--clean_strengthAdjust the intensity of the audio cleaning process. Higher values result in stronger cleaning, but may lead to a more compressed sound.float0.7No
--export_formatSelect the desired output audio format.strWAVNo
--embedder_modelChoose the model used for generating speaker embeddings.strcontentvecNo
--embedder_model_customSpecify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.strNoneNo
--upscale_audioUpscale the input audio to a higher quality before processing. This can improve the overall quality of the output, especially for low-quality input audio.boolFalseNo
--f0_fileFull path to an external F0 file (.f0). This allows you to use pre-computed pitch values for the input audio.strNoneNo
--formant_shiftingApply formant shifting to the input audio. This can help adjust the timbre of the voice.boolFalseNo
--formant_qfrencyControl the frequency of the formant shifting effect. Higher values result in a more pronounced effect.float1.0No
--formant_timbreControl the timbre of the formant shifting effect. Higher values result in a more pronounced effect.float1.0No
--sidSpeaker ID for multi-speaker models.int0No
--post_processApply post-processing effects to the output audio.boolFalseNo
--reverbApply reverb effect to the output audio.boolFalseNo
--pitch_shiftApply pitch shifting effect to the output audio.boolFalseNo
--limiterApply limiter effect to the output audio.boolFalseNo
--gainApply gain effect to the output audio.boolFalseNo
--distortionApply distortion effect to the output audio.boolFalseNo
--chorusApply chorus effect to the output audio.boolFalseNo
--bitcrushApply bitcrush effect to the output audio.boolFalseNo
--clippingApply clipping effect to the output audio.boolFalseNo
--compressorApply compressor effect to the output audio.boolFalseNo
--delayApply delay effect to the output audio.boolFalseNo
--reverb_room_sizeControl the room size of the reverb effect. Higher values result in a larger room size.float0.5No
--reverb_dampingControl the damping of the reverb effect. Higher values result in a more damped sound.float0.5No
--reverb_wet_gainControl the wet gain of the reverb effect. Higher values result in a stronger reverb effect.float0.5No
--reverb_dry_gainControl the dry gain of the reverb effect. Higher values result in a stronger dry signal.float0.5No
--reverb_widthControl the stereo width of the reverb effect. Higher values result in a wider stereo image.float0.5No
--reverb_freeze_modeControl the freeze mode of the reverb effect. Higher values result in a stronger freeze effect.float0.5No
--pitch_shift_semitonesControl the pitch shift in semitones. Positive values increase the pitch, while negative values decrease it.float0.0No
--limiter_thresholdControl the threshold of the limiter effect. Higher values result in a stronger limiting effect.float-6No
--limiter_release_timeControl the release time of the limiter effect. Higher values result in a longer release time.float0.01No
--gain_dbControl the gain in decibels. Positive values increase the gain, while negative values decrease it.float0.0No
--distortion_gainControl the gain of the distortion effect. Higher values result in a stronger distortion effect.float25No
--chorus_rateControl the rate of the chorus effect. Higher values result in a faster chorus effect.float1.0No
--chorus_depthControl the depth of the chorus effect. Higher values result in a stronger chorus effect.float0.25No
--chorus_center_delayControl the center delay of the chorus effect. Higher values result in a longer center delay.float7No
--chorus_feedbackControl the feedback of the chorus effect. Higher values result in a stronger feedback effect.float0.0No
--chorus_mixControl the mix of the chorus effect. Higher values result in a stronger chorus effect.float0.5No
--bitcrush_bit_depthControl the bit depth of the bitcrush effect. Higher values result in a stronger bitcrush effect.int8No
--clipping_thresholdControl the threshold of the clipping effect. Higher values result in a stronger clipping effect.float-6No
--compressor_thresholdControl the threshold of the compressor effect. Higher values result in a stronger compressor effect.float0No
--compressor_ratioControl the ratio of the compressor effect. Higher values result in a stronger compressor effect.float1No
--compressor_attackControl the attack of the compressor effect. Higher values result in a stronger compressor effect.float1.0No
--compressor_releaseControl the release of the compressor effect. Higher values result in a stronger compressor effect.float100No
--delay_secondsControl the delay time in seconds. Higher values result in a longer delay time.float0.5No
--delay_feedbackControl the feedback of the delay effect. Higher values result in a stronger feedback effect.float0.0No
--delay_mixControl the mix of the delay effect. Higher values result in a stronger delay effect.float0.5No

Batch Infer

Performs real-time voice cloning on all supported audio files within a specified folder. This mode utilizes the same arguments as the infer mode, except it requires an --input_folder and --output_folder instead of --input_path and --output_path, respectively.

ArgumentDescriptionTypeRequired
--input_folderPath to the folder containing input audio files.strYes
--output_folderPath to the folder for saving output audio files.strYes

TTS

Synthesizes text into speech using the specified voice and then applies voice conversion using the provided RVC model.

ArgumentDescriptionTypeDefaultRequired
--tts_textText to be synthesized.strYes
--tts_voiceVoice to be used for TTS synthesis. Refer to Microsoft's TTS voice list (opens in a new tab) for available options.strYes
--tts_rateControl the speaking rate of the TTS. Values range from -100 (slower) to 100 (faster).int0No
--output_tts_pathFull path to save the synthesized TTS audio.strYes
--output_rvc_pathFull path to save the voice-converted audio using the synthesized TTS.strYes

This mode utilizes the same arguments as the infer mode for voice conversion settings.

Preprocess

Preprocesses a dataset for training an RVC model.

ArgumentDescriptionTypeRequired
--model_nameName of the model to be trained.strYes
--dataset_pathPath to the dataset directory.strYes
--sample_rateTarget sampling rate for the audio data.intYes
--cpu_coresNumber of CPU cores to use for preprocessing.intNo
--cut_preprocessCut the dataset into smaller segments for faster preprocessing.boolTrue
--process_effectsDisable all filters during preprocessing.boolFalse
--noise_reductionEnable noise reduction during preprocessing.boolFalse
--noise_reduction_strengthStrength of the noise reduction filter.float0.7

Extract

Extracts features from a dataset for training an RVC model.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the model.strYes
--rvc_versionVersion of the RVC model ('v1' or 'v2').strv2No
--f0_methodPitch extraction method to use.strrmvpeNo
--pitch_guidanceEnable or disable pitch guidance during feature extraction.boolTrueNo
--hop_lengthHop length for feature extraction. Only applicable for Crepe pitch extraction.int128No
--cpu_coresNumber of CPU cores to use for feature extraction (optional).intNoneNo
--gpuGPU device to use for feature extraction (optional).int-No
--sample_rateTarget sampling rate for the audio data.intYes
--embedder_modelChoose the model used for generating speaker embeddings.strcontentvecNo
--embedder_model_customSpecify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.strNoneNo

Train

Trains an RVC model.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the model to be trained.strYes
--rvc_versionVersion of the RVC model to train ('v1' or 'v2').strv2No
--save_every_epochSave the model every specified number of epochs.intYes
--save_only_latestSave only the latest model checkpoint.boolFalseNo
--save_every_weightsSave model weights every epoch.boolTrueNo
--total_epochTotal number of epochs to train for.int1000No
--sample_rateSampling rate of the training data.intYes
--batch_sizeBatch size for training.int8No
--gpuGPU device to use for training (e.g., '0').str0No
--pitch_guidanceEnable or disable pitch guidance during training.boolTrueNo
--pretrainedUse a pretrained model for initialization.boolTrueNo
--custom_pretrainedUse a custom pretrained model.boolFalseNo
--g_pretrained_pathPath to the pretrained generator model file.strNoneNo
--d_pretrained_pathPath to the pretrained discriminator model file.strNoneNo
--overtraining_detectorEnable overtraining detection.boolFalseNo
--overtraining_thresholdThreshold for overtraining detection.int50No
--sync_graphEnable graph synchronization for distributed training.boolFalseNo
--cache_data_in_gpuCache training data in GPU memory.boolFalseNo
--index_algorithmChoose the method for generating the index file.strAutoNo

Index

Generates an index file for an RVC model.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the model.strYes
--rvc_versionVersion of the RVC model ('v1' or 'v2').strv2No
--index_algorithmChoose the method for generating the index file.strAutoNo

Model Extract

Extracts a checkpoint of the trained model.

ArgumentDescriptionTypeDefaultRequired
--pth_pathPath to the main .pth model file.strYes
--model_nameName of the model.strYes
--sample_rateSampling rate of the extracted model.intYes
--pitch_guidanceEnable or disable pitch guidance for the extracted model.boolYes
--rvc_versionVersion of the extracted RVC model ('v1' or 'v2').strv2No
--epochEpoch number to extract from the model.intYes
--stepStep number to extract from the model (optional).intNoneNo

Model Information

Displays information about a trained model.

ArgumentDescriptionTypeRequired
--pth_pathPath to the .pth model file.strYes

Model Blender

Fuses two RVC models together.

ArgumentDescriptionTypeDefaultRequired
--model_nameName of the new fused model.strYes
--pth_path_1Path to the first .pth model file.strYes
--pth_path_2Path to the second .pth model file.strYes
--ratioRatio for blending the two models (0.0 to 1.0).float0.5No

Tensorboard

Launches TensorBoard for monitoring training progress. This mode requires no arguments.

Download

Downloads a model from a provided link.

ArgumentDescriptionTypeRequired
--model_linkDirect link to the model file.strYes

Prerequisites

Installs prerequisites for RVC.

ArgumentDescriptionTypeDefaultRequired
--pretraineds_v1_f0Download pretrained models for RVC v1.boolTrueNo
--pretraineds_v2_f0Download pretrained models for RVC v2.boolTrueNo
--pretraineds_v1_nof0Download non f0 pretrained models for RVC v1.boolFalseNo
--pretraineds_v2_nof0Download non f0 pretrained models for RVC v2.boolFalseNo
--modelsDownload additional models.boolTrueNo
--exeDownload required executables.boolTrueNo

Audio Analyzer

Analyzes an audio file and displays its information.

ArgumentDescriptionTypeRequired
--input_pathPath to the input audio file.strYes

Examples

Here are a few examples of how to use the RVC CLI:

  • Inferring voice on an audio file:
python rvc_cli.py infer --pitch 5 --input_path "path/to/input.wav" --output_path "path/to/output.wav" --pth_path "path/to/model.pth" --index_path "path/to/index.index"
  • Training a new RVC model:
python rvc_cli.py train --model_name "my_model" --dataset_path "path/to/dataset" --sample_rate 48000 --total_epoch 500 --gpu 0
  • Generating an index file for a trained model:
python rvc_cli.py index --model_name "my_model"