RVC

Learn how to use the rvc_cli.py script to perform various operations with RVC.

Usage

To use the RVC CLI, navigate to the directory containing rvc_cli.py in your terminal and execute the script using the following syntax:

python rvc_cli.py <mode> [arguments]

Replace <mode> with the desired mode of operation (e.g., infer, train, index) and provide the necessary arguments. For a detailed list of arguments available for each mode, run:

python rvc_cli.py <mode> -h

This will display a help message with explanations for each argument.

Modes

Infer

Performs a voice cloning conversion on a single audio file.

Argument	Description	Type	Default	Required
`--pitch`	Set the pitch of the audio. Higher values result in a higher pitch.	int	0	No
`--filter_radius`	Apply median filtering to the extracted pitch values if this value is greater than or equal to three. This can help reduce breathiness in the output audio.	int	3	No
`--index_rate`	Control the influence of the index file on the output. Higher values mean stronger influence. Lower values can help reduce artifacts but may result in less accurate voice cloning.	float	0.3	No
`--volume_envelope`	Control the blending of the output's volume envelope. A value of 1 means the output envelope is fully used.	float	1	No
`--protect`	Protect consonants and breathing sounds from artifacts. A value of 0.5 offers the strongest protection, while lower values may reduce the protection level but potentially mitigate the indexing effect.	float	0.33	No
`--hop_length`	Only applicable for the Crepe pitch extraction method. Determines the time it takes for the system to react to a significant pitch change. Smaller values require more processing time but can lead to better pitch accuracy.	int	128	No
`--f0_method`	Choose the pitch extraction algorithm for the conversion. 'rmvpe' is the default and generally recommended.	str	rmvpe	No
`--input_path`	Full path to the input audio file.	str		Yes
`--output_path`	Full path to the output audio file.	str		Yes
`--pth_path`	Full path to the RVC model file (.pth).	str		Yes
`--index_path`	Full path to the index file (.index).	str		Yes
`--split_audio`	Split the audio into smaller segments before inference. This can improve the quality of the output for longer audio files.	bool	False	No
`--f0_autotune`	Apply a light autotune to the inferred audio. Particularly useful for singing voice conversions.	bool	False	No
`--clean_audio`	Clean the output audio using noise reduction algorithms. Recommended for speech conversions.	bool	False	No
`--clean_strength`	Adjust the intensity of the audio cleaning process. Higher values result in stronger cleaning, but may lead to a more compressed sound.	float	0.7	No
`--export_format`	Select the desired output audio format.	str	WAV	No
`--embedder_model`	Choose the model used for generating speaker embeddings.	str	contentvec	No
`--embedder_model_custom`	Specify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.	str	None	No
`--upscale_audio`	Upscale the input audio to a higher quality before processing. This can improve the overall quality of the output, especially for low-quality input audio.	bool	False	No
`--f0_file`	Full path to an external F0 file (.f0). This allows you to use pre-computed pitch values for the input audio.	str	None	No

Batch Infer

Performs real-time voice cloning on all supported audio files within a specified folder. This mode utilizes the same arguments as the infer mode, except it requires an --input_folder and --output_folder instead of --input_path and --output_path, respectively.

Argument	Description	Type	Required
`--input_folder`	Path to the folder containing input audio files.	str	Yes
`--output_folder`	Path to the folder for saving output audio files.	str	Yes

TTS

Synthesizes text into speech using the specified voice and then applies voice conversion using the provided RVC model.

Argument	Description	Type	Default	Required
`--tts_text`	Text to be synthesized.	str		Yes
`--tts_voice`	Voice to be used for TTS synthesis. Refer to Microsoft's TTS voice list (opens in a new tab) for available options.	str		Yes
`--tts_rate`	Control the speaking rate of the TTS. Values range from -100 (slower) to 100 (faster).	int	0	No
`--output_tts_path`	Full path to save the synthesized TTS audio.	str		Yes
`--output_rvc_path`	Full path to save the voice-converted audio using the synthesized TTS.	str		Yes

This mode utilizes the same arguments as the infer mode for voice conversion settings.

Preprocess

Preprocesses a dataset for training an RVC model.

Argument	Description	Type	Required
`--model_name`	Name of the model to be trained.	str	Yes
`--dataset_path`	Path to the dataset directory.	str	Yes
`--sample_rate`	Target sampling rate for the audio data.	int	Yes
`--cpu_cores`	Number of CPU cores to use for preprocessing.	int	No

Extract

Extracts features from a dataset for training an RVC model.

Argument	Description	Type	Default	Required
`--model_name`	Name of the model.	str		Yes
`--rvc_version`	Version of the RVC model ('v1' or 'v2').	str	v2	No
`--f0_method`	Pitch extraction method to use.	str	rmvpe	No
`--pitch_guidance`	Enable or disable pitch guidance during feature extraction.	bool	True	No
`--hop_length`	Hop length for feature extraction. Only applicable for Crepe pitch extraction.	int	128	No
`--cpu_cores`	Number of CPU cores to use for feature extraction (optional).	int	None	No
`--sample_rate`	Target sampling rate for the audio data.	int		Yes
`--embedder_model`	Choose the model used for generating speaker embeddings.	str	contentvec	No
`--embedder_model_custom`	Specify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.	str	None	No

Train

Trains an RVC model.

Argument	Description	Type	Default	Required
`--model_name`	Name of the model to be trained.	str		Yes
`--rvc_version`	Version of the RVC model to train ('v1' or 'v2').	str	v2	No
`--save_every_epoch`	Save the model every specified number of epochs.	int		Yes
`--save_only_latest`	Save only the latest model checkpoint.	bool	False	No
`--save_every_weights`	Save model weights every epoch.	bool	True	No
`--total_epoch`	Total number of epochs to train for.	int	1000	No
`--sample_rate`	Sampling rate of the training data.	int		Yes
`--batch_size`	Batch size for training.	int	8	No
`--gpu`	GPU device to use for training (e.g., '0').	str	0	No
`--pitch_guidance`	Enable or disable pitch guidance during training.	bool	True	No
`--pretrained`	Use a pretrained model for initialization.	bool	True	No
`--custom_pretrained`	Use a custom pretrained model.	bool	False	No
`--g_pretrained_path`	Path to the pretrained generator model file.	str	None	No
`--d_pretrained_path`	Path to the pretrained discriminator model file.	str	None	No
`--overtraining_detector`	Enable overtraining detection.	bool	False	No
`--overtraining_threshold`	Threshold for overtraining detection.	int	50	No
`--sync_graph`	Enable graph synchronization for distributed training.	bool	False	No
`--cache_data_in_gpu`	Cache training data in GPU memory.	bool	False	No

Index

Generates an index file for an RVC model.

Argument	Description	Type	Default	Required
`--model_name`	Name of the model.	str		Yes
`--rvc_version`	Version of the RVC model ('v1' or 'v2').	str	v2	No

Model Extract

Extracts a checkpoint of the trained model.

Argument	Description	Type	Default	Required
`--pth_path`	Path to the main .pth model file.	str		Yes
`--model_name`	Name of the model.	str		Yes
`--sample_rate`	Sampling rate of the extracted model.	int		Yes
`--pitch_guidance`	Enable or disable pitch guidance for the extracted model.	bool		Yes
`--rvc_version`	Version of the extracted RVC model ('v1' or 'v2').	str	v2	No
`--epoch`	Epoch number to extract from the model.	int		Yes
`--step`	Step number to extract from the model (optional).	int	None	No

Model Information

Displays information about a trained model.

Argument	Description	Type	Required
`--pth_path`	Path to the .pth model file.	str	Yes

Model Blender

Fuses two RVC models together.

Argument	Description	Type	Default	Required
`--model_name`	Name of the new fused model.	str		Yes
`--pth_path_1`	Path to the first .pth model file.	str		Yes
`--pth_path_2`	Path to the second .pth model file.	str		Yes
`--ratio`	Ratio for blending the two models (0.0 to 1.0).	float	0.5	No

Tensorboard

Launches TensorBoard for monitoring training progress. This mode requires no arguments.

Download

Downloads a model from a provided link.

Argument	Description	Type	Required
`--model_link`	Direct link to the model file.	str	Yes

Prerequisites

Installs prerequisites for RVC.

Argument	Description	Type	Default	Required
`--pretraineds_v1`	Download pretrained models for RVC v1.	bool	True	No
`--pretraineds_v2`	Download pretrained models for RVC v2.	bool	True	No
`--models`	Download additional models.	bool	True	No
`--exe`	Download required executables.	bool	True	No

Audio Analyzer

Analyzes an audio file and displays its information.

Argument	Description	Type	Required
`--input_path`	Path to the input audio file.	str	Yes

Examples

Here are a few examples of how to use the RVC CLI:

Inferring voice on an audio file:

python rvc_cli.py infer --pitch 5 --input_path "path/to/input.wav" --output_path "path/to/output.wav" --pth_path "path/to/model.pth" --index_path "path/to/index.index"

Training a new RVC model:

python rvc_cli.py train --model_name "my_model" --dataset_path "path/to/dataset" --sample_rate 48000 --total_epoch 500 --gpu 0

Generating an index file for a trained model:

python rvc_cli.py index --model_name "my_model"

Windows UVR