UVR

Learn how to use the uvr_cli.py script to perform various operations with UVR.

Usage

To use the UVR CLI, navigate to the directory containing uvr_cli.py in your terminal and execute the script using the following syntax:

python uvr_cli.py --audio_file <path/to/audio_file> [options]

Replace <path/to/audio_file> with the path to the audio file you want to process and [options] with the necessary arguments. For a detailed list of arguments available for each mode, run:

python uvr_cli.py <mode> -h

This will display a help message with explanations for each argument.

Modes

Info and Debugging

Argument	Description	Type	Default	Required
`-d`, `--debug`	Enable debug logging. Equivalent to `--log_level=debug`.	bool	False	No
`-e`, `--env_info`	Print environment information and exit.	bool	False	No
`-l`, `--list_models`	List all supported models and exit.	bool	False	No
`--log_level`	Log level, e.g. `info`, `debug`, `warning` (default: `info`).	str	info	No

Separation I/O Params

Argument	Description	Type	Default	Required
`-m`, `--model_filename`	Model to use for separation. Example: `-m 2_HP-UVR.pth`	str	`model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`	No
`--output_format`	Output format for separated files, any common format (default: `WAV`). Example: `--output_format=MP3`	str	`WAV`	No
`--output_dir`	Directory to write output files (default: `<current dir>`). Example: `--output_dir=/app/separated`	str	`None`	No
`--model_file_dir`	Model files directory (default: `uvr/tmp/audio-separator-models/`). Example: `--model_file_dir=/app/models`	str	`uvr/tmp/audio-separator-models/`	No

Common Separation Parameters

Argument	Description	Type	Default	Required
`--invert_spect`	Invert secondary stem using spectrogram (default: `False`). Example: `--invert_spect`	bool	False	No
`--normalization`	Max peak amplitude to normalize input and output audio to (default: `0.9`). Example: `--normalization=0.7`	float	0.9	No
`--single_stem`	Output only single stem, e.g. `Instrumental`, `Vocals`, `Drums`, `Bass`, `Guitar`, `Piano`, `Other`. Example: `--single_stem=Instrumental`	str	None	No
`--sample_rate`	Modify the sample rate of the output audio (default: `44100`). Example: `--sample_rate=44100`	int	44100	No

MDX Architecture Parameters

Argument	Description	Type	Default	Required
`--mdx_segment_size`	Larger consumes more resources, but may give better results (default: `256`). Example: `--mdx_segment_size=256`	int	256	No
`--mdx_overlap`	Amount of overlap between prediction windows, 0.001-0.999. Higher is better but slower (default: `0.25`). Example: `--mdx_overlap=0.25`	float	0.25	No
`--mdx_batch_size`	Larger consumes more RAM but may process slightly faster (default: `1`). Example: `--mdx_batch_size=4`	int	1	No
`--mdx_hop_length`	Usually called stride in neural networks, only change if you know what you're doing (default: `1024`). Example: `--mdx_hop_length=1024`	int	1024	No
`--mdx_enable_denoise`	Enable denoising during separation (default: `False`). Example: `--mdx_enable_denoise`	bool	False	No

VR Architecture Parameters

Argument	Description	Type	Default	Required
`--vr_batch_size`	Number of batches to process at a time. Higher = more RAM, slightly faster processing (default: `4`). Example: `--vr_batch_size=16`	int	4	No
`--vr_window_size`	Balance quality and speed. `1024` = fast but lower, `320` = slower but better quality. (default: `512`). Example: `--vr_window_size=320`	int	512	No
`--vr_aggression`	Intensity of primary stem extraction, `-100` - `100`. Typically `5` for vocals & instrumentals (default: `5`). Example: `--vr_aggression=2`	int	5	No
`--vr_enable_tta`	Enable Test-Time-Augmentation; slow but improves quality (default: `False`). Example: `--vr_enable_tta`	bool	False	No
`--vr_high_end_process`	Mirror the missing frequency range of the output (default: `False`). Example: `--vr_high_end_process`	bool	False	No
`--vr_enable_post_process`	Identify leftover artifacts within vocal output; may improve separation for some songs (default: `False`). Example: `--vr_enable_post_process`	bool	False	No
`--vr_post_process_threshold`	Threshold for post_process feature: `0.1`-`0.3` (default: `0.2`). Example: `--vr_post_process_threshold=0.1`	float	0.2	No

Demucs Architecture Parameters

Argument	Description	Type	Default	Required
`--demucs_segment_size`	Size of segments into which the audio is split, `1-100`. Higher = slower but better quality (default: `Default`). Example: `--demucs_segment_size=256`	str	`Default`	No
`--demucs_shifts`	Number of predictions with random shifts, higher = slower but better quality (default: `2`). Example: `--demucs_shifts=4`	int	2	No
`--demucs_overlap`	Overlap between prediction windows, 0.001-0.999. Higher = slower but better quality (default: `0.25`). Example: `--demucs_overlap=0.25`	float	0.25	No
`--demucs_segments_enabled`	Enable segment-wise processing (default: `True`). Example: `--demucs_segments_enabled=False`	bool	`True`	No

MDXC Architecture Parameters

Argument	Description	Type	Default	Required
`--mdxc_segment_size`	Larger consumes more resources, but may give better results (default: `256`). Example: `--mdxc_segment_size=256`	int	256	No
`--mdxc_override_model_segment_size`	Override model default segment size instead of using the model default value. Example: `--mdxc_override_model_segment_size`	bool	False	No
`--mdxc_overlap`	Amount of overlap between prediction windows, `2-50`. Higher is better but slower (default: `8`). Example: `--mdxc_overlap=8`	int	8	No
`--mdxc_batch_size`	Larger consumes more RAM but may process slightly faster (default: `1`). Example: `--mdxc_batch_size=4`	int	1	No
`--mdxc_pitch_shift`	Shift audio pitch by a number of semitones while processing. May improve output for deep/high vocals. (default: `0`). Example: `--mdxc_pitch_shift=2`	int	0	No

Example:

uvr_cli.py --audio_file "my_song.mp3" --output_format MP3 --output_dir "/path/to/output" --model_filename "2_HP-UVR.pth" --vr_aggression 10

RVC