Rvc - RVC CLI

RVC CLI Documentation

Learn how to use the rvc_cli.py script to perform various operations with Real-Time Voice Cloning (RVC).

Usage

To use the RVC CLI, navigate to the directory containing rvc_cli.py in your terminal and execute the script using the following syntax:

python rvc_cli.py <mode> [arguments]

Replace <mode> with the desired mode of operation (e.g., infer, train, index) and provide the necessary arguments. For a detailed list of arguments available for each mode, run:

python rvc_cli.py <mode> -h

This will display a help message with explanations for each argument.

Modes

Infer

Performs a voice cloning conversion on a single audio file.

Argument	Description	Type	Default	Required
`--pitch`	Set the pitch of the audio. Higher values result in a higher pitch.	int	0	No
`--filter_radius`	Apply median filtering to the extracted pitch values if this value is greater than or equal to three. This can help reduce breathiness in the output audio.	int	3	No
`--index_rate`	Control the influence of the index file on the output. Higher values mean stronger influence. Lower values can help reduce artifacts but may result in less accurate voice cloning.	float	0.3	No
`--volume_envelope`	Control the blending of the output's volume envelope. A value of 1 means the output envelope is fully used.	float	1	No
`--protect`	Protect consonants and breathing sounds from artifacts. A value of 0.5 offers the strongest protection, while lower values may reduce the protection level but potentially mitigate the indexing effect.	float	0.33	No
`--hop_length`	Only applicable for the Crepe pitch extraction method. Determines the time it takes for the system to react to a significant pitch change. Smaller values require more processing time but can lead to better pitch accuracy.	int	128	No
`--f0_method`	Choose the pitch extraction algorithm for the conversion. 'rmvpe' is the default and generally recommended.	str	rmvpe	No
`--input_path`	Full path to the input audio file.	str		Yes
`--output_path`	Full path to the output audio file.	str		Yes
`--pth_path`	Full path to the RVC model file (.pth).	str		Yes
`--index_path`	Full path to the index file (.index).	str		Yes
`--split_audio`	Split the audio into smaller segments before inference. This can improve the quality of the output for longer audio files.	bool	False	No
`--f0_autotune`	Apply a light autotune to the inferred audio. Particularly useful for singing voice conversions.	bool	False	No
`--clean_audio`	Clean the output audio using noise reduction algorithms. Recommended for speech conversions.	bool	False	No
`--clean_strength`	Adjust the intensity of the audio cleaning process. Higher values result in stronger cleaning, but may lead to a more compressed sound.	float	0.7	No
`--export_format`	Select the desired output audio format.	str	WAV	No
`--embedder_model`	Choose the model used for generating speaker embeddings.	str	contentvec	No
`--embedder_model_custom`	Specify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.	str	None	No
`--upscale_audio`	Upscale the input audio to a higher quality before processing. This can improve the overall quality of the output, especially for low-quality input audio.	bool	False	No
`--f0_file`	Full path to an external F0 file (.f0). This allows you to use pre-computed pitch values for the input audio.	str	None	No
`--formant_shifting`	Apply formant shifting to the input audio. This can help adjust the timbre of the voice.	bool	False	No
`--formant_qfrency`	Control the frequency of the formant shifting effect. Higher values result in a more pronounced effect.	float	1.0	No
`--formant_timbre`	Control the timbre of the formant shifting effect. Higher values result in a more pronounced effect.	float	1.0	No
`--sid`	Speaker ID for multi-speaker models.	int	0	No
`--post_process`	Apply post-processing effects to the output audio.	bool	False	No
`--reverb`	Apply reverb effect to the output audio.	bool	False	No
`--pitch_shift`	Apply pitch shifting effect to the output audio.	bool	False	No
`--limiter`	Apply limiter effect to the output audio.	bool	False	No
`--gain`	Apply gain effect to the output audio.	bool	False	No
`--distortion`	Apply distortion effect to the output audio.	bool	False	No
`--chorus`	Apply chorus effect to the output audio.	bool	False	No
`--bitcrush`	Apply bitcrush effect to the output audio.	bool	False	No
`--clipping`	Apply clipping effect to the output audio.	bool	False	No
`--compressor`	Apply compressor effect to the output audio.	bool	False	No
`--delay`	Apply delay effect to the output audio.	bool	False	No
`--reverb_room_size`	Control the room size of the reverb effect. Higher values result in a larger room size.	float	0.5	No
`--reverb_damping`	Control the damping of the reverb effect. Higher values result in a more damped sound.	float	0.5	No
`--reverb_wet_gain`	Control the wet gain of the reverb effect. Higher values result in a stronger reverb effect.	float	0.5	No
`--reverb_dry_gain`	Control the dry gain of the reverb effect. Higher values result in a stronger dry signal.	float	0.5	No
`--reverb_width`	Control the stereo width of the reverb effect. Higher values result in a wider stereo image.	float	0.5	No
`--reverb_freeze_mode`	Control the freeze mode of the reverb effect. Higher values result in a stronger freeze effect.	float	0.5	No
`--pitch_shift_semitones`	Control the pitch shift in semitones. Positive values increase the pitch, while negative values decrease it.	float	0.0	No
`--limiter_threshold`	Control the threshold of the limiter effect. Higher values result in a stronger limiting effect.	float	-6	No
`--limiter_release_time`	Control the release time of the limiter effect. Higher values result in a longer release time.	float	0.01	No
`--gain_db`	Control the gain in decibels. Positive values increase the gain, while negative values decrease it.	float	0.0	No
`--distortion_gain`	Control the gain of the distortion effect. Higher values result in a stronger distortion effect.	float	25	No
`--chorus_rate`	Control the rate of the chorus effect. Higher values result in a faster chorus effect.	float	1.0	No
`--chorus_depth`	Control the depth of the chorus effect. Higher values result in a stronger chorus effect.	float	0.25	No
`--chorus_center_delay`	Control the center delay of the chorus effect. Higher values result in a longer center delay.	float	7	No
`--chorus_feedback`	Control the feedback of the chorus effect. Higher values result in a stronger feedback effect.	float	0.0	No
`--chorus_mix`	Control the mix of the chorus effect. Higher values result in a stronger chorus effect.	float	0.5	No
`--bitcrush_bit_depth`	Control the bit depth of the bitcrush effect. Higher values result in a stronger bitcrush effect.	int	8	No
`--clipping_threshold`	Control the threshold of the clipping effect. Higher values result in a stronger clipping effect.	float	-6	No
`--compressor_threshold`	Control the threshold of the compressor effect. Higher values result in a stronger compressor effect.	float	0	No
`--compressor_ratio`	Control the ratio of the compressor effect. Higher values result in a stronger compressor effect.	float	1	No
`--compressor_attack`	Control the attack of the compressor effect. Higher values result in a stronger compressor effect.	float	1.0	No
`--compressor_release`	Control the release of the compressor effect. Higher values result in a stronger compressor effect.	float	100	No
`--delay_seconds`	Control the delay time in seconds. Higher values result in a longer delay time.	float	0.5	No
`--delay_feedback`	Control the feedback of the delay effect. Higher values result in a stronger feedback effect.	float	0.0	No
`--delay_mix`	Control the mix of the delay effect. Higher values result in a stronger delay effect.	float	0.5	No

Batch Infer

Performs real-time voice cloning on all supported audio files within a specified folder. This mode utilizes the same arguments as the infer mode, except it requires an --input_folder and --output_folder instead of --input_path and --output_path, respectively.

Argument	Description	Type	Required
`--input_folder`	Path to the folder containing input audio files.	str	Yes
`--output_folder`	Path to the folder for saving output audio files.	str	Yes

TTS

Synthesizes text into speech using the specified voice and then applies voice conversion using the provided RVC model.

Argument	Description	Type	Default	Required
`--tts_text`	Text to be synthesized.	str		Yes
`--tts_voice`	Voice to be used for TTS synthesis. Refer to Microsoft's TTS voice list (opens in a new tab) for available options.	str		Yes
`--tts_rate`	Control the speaking rate of the TTS. Values range from -100 (slower) to 100 (faster).	int	0	No
`--output_tts_path`	Full path to save the synthesized TTS audio.	str		Yes
`--output_rvc_path`	Full path to save the voice-converted audio using the synthesized TTS.	str		Yes

This mode utilizes the same arguments as the infer mode for voice conversion settings.

Preprocess

Preprocesses a dataset for training an RVC model.

Argument	Description	Type	Required
`--model_name`	Name of the model to be trained.	str	Yes
`--dataset_path`	Path to the dataset directory.	str	Yes
`--sample_rate`	Target sampling rate for the audio data.	int	Yes
`--cpu_cores`	Number of CPU cores to use for preprocessing.	int	No
`--cut_preprocess`	Cut the dataset into smaller segments for faster preprocessing.	bool	True
`--process_effects`	Disable all filters during preprocessing.	bool	False
`--noise_reduction`	Enable noise reduction during preprocessing.	bool	False
`--noise_reduction_strength`	Strength of the noise reduction filter.	float	0.7

Extract

Extracts features from a dataset for training an RVC model.

Argument	Description	Type	Default	Required
`--model_name`	Name of the model.	str		Yes
`--rvc_version`	Version of the RVC model ('v1' or 'v2').	str	v2	No
`--f0_method`	Pitch extraction method to use.	str	rmvpe	No
`--pitch_guidance`	Enable or disable pitch guidance during feature extraction.	bool	True	No
`--hop_length`	Hop length for feature extraction. Only applicable for Crepe pitch extraction.	int	128	No
`--cpu_cores`	Number of CPU cores to use for feature extraction (optional).	int	None	No
`--gpu`	GPU device to use for feature extraction (optional).	int	-	No
`--sample_rate`	Target sampling rate for the audio data.	int		Yes
`--embedder_model`	Choose the model used for generating speaker embeddings.	str	contentvec	No
`--embedder_model_custom`	Specify the path to a custom model for speaker embedding. Only applicable if 'embedder_model' is set to 'custom'.	str	None	No

Train

Trains an RVC model.

Argument	Description	Type	Default	Required
`--model_name`	Name of the model to be trained.	str		Yes
`--rvc_version`	Version of the RVC model to train ('v1' or 'v2').	str	v2	No
`--save_every_epoch`	Save the model every specified number of epochs.	int		Yes
`--save_only_latest`	Save only the latest model checkpoint.	bool	False	No
`--save_every_weights`	Save model weights every epoch.	bool	True	No
`--total_epoch`	Total number of epochs to train for.	int	1000	No
`--sample_rate`	Sampling rate of the training data.	int		Yes
`--batch_size`	Batch size for training.	int	8	No
`--gpu`	GPU device to use for training (e.g., '0').	str	0	No
`--pitch_guidance`	Enable or disable pitch guidance during training.	bool	True	No
`--pretrained`	Use a pretrained model for initialization.	bool	True	No
`--custom_pretrained`	Use a custom pretrained model.	bool	False	No
`--g_pretrained_path`	Path to the pretrained generator model file.	str	None	No
`--d_pretrained_path`	Path to the pretrained discriminator model file.	str	None	No
`--overtraining_detector`	Enable overtraining detection.	bool	False	No
`--overtraining_threshold`	Threshold for overtraining detection.	int	50	No
`--sync_graph`	Enable graph synchronization for distributed training.	bool	False	No
`--cache_data_in_gpu`	Cache training data in GPU memory.	bool	False	No
`--index_algorithm`	Choose the method for generating the index file.	str	Auto	No

Index

Generates an index file for an RVC model.

Argument	Description	Type	Default	Required
`--model_name`	Name of the model.	str		Yes
`--rvc_version`	Version of the RVC model ('v1' or 'v2').	str	v2	No
`--index_algorithm`	Choose the method for generating the index file.	str	Auto	No

Model Extract

Extracts a checkpoint of the trained model.

Argument	Description	Type	Default	Required
`--pth_path`	Path to the main .pth model file.	str		Yes
`--model_name`	Name of the model.	str		Yes
`--sample_rate`	Sampling rate of the extracted model.	int		Yes
`--pitch_guidance`	Enable or disable pitch guidance for the extracted model.	bool		Yes
`--rvc_version`	Version of the extracted RVC model ('v1' or 'v2').	str	v2	No
`--epoch`	Epoch number to extract from the model.	int		Yes
`--step`	Step number to extract from the model (optional).	int	None	No

Model Information

Displays information about a trained model.

Argument	Description	Type	Required
`--pth_path`	Path to the .pth model file.	str	Yes

Model Blender

Fuses two RVC models together.

Argument	Description	Type	Default	Required
`--model_name`	Name of the new fused model.	str		Yes
`--pth_path_1`	Path to the first .pth model file.	str		Yes
`--pth_path_2`	Path to the second .pth model file.	str		Yes
`--ratio`	Ratio for blending the two models (0.0 to 1.0).	float	0.5	No

Tensorboard

Launches TensorBoard for monitoring training progress. This mode requires no arguments.

Download

Downloads a model from a provided link.

Argument	Description	Type	Required
`--model_link`	Direct link to the model file.	str	Yes

Prerequisites

Installs prerequisites for RVC.

Argument	Description	Type	Default	Required
`--pretraineds_v1_f0`	Download pretrained models for RVC v1.	bool	True	No
`--pretraineds_v2_f0`	Download pretrained models for RVC v2.	bool	True	No
`--pretraineds_v1_nof0`	Download non f0 pretrained models for RVC v1.	bool	False	No
`--pretraineds_v2_nof0`	Download non f0 pretrained models for RVC v2.	bool	False	No
`--models`	Download additional models.	bool	True	No
`--exe`	Download required executables.	bool	True	No

Audio Analyzer

Analyzes an audio file and displays its information.

Argument	Description	Type	Required
`--input_path`	Path to the input audio file.	str	Yes

Examples

Here are a few examples of how to use the RVC CLI:

Inferring voice on an audio file:

python rvc_cli.py infer --pitch 5 --input_path "path/to/input.wav" --output_path "path/to/output.wav" --pth_path "path/to/model.pth" --index_path "path/to/index.index"

Training a new RVC model:

python rvc_cli.py train --model_name "my_model" --dataset_path "path/to/dataset" --sample_rate 48000 --total_epoch 500 --gpu 0

Generating an index file for a trained model:

python rvc_cli.py index --model_name "my_model"

Installation UVR