UVR

UVR

Learn how to use the uvr_cli.py script to perform various operations with UVR.

Usage

To use the UVR CLI, navigate to the directory containing uvr_cli.py in your terminal and execute the script using the following syntax:

python uvr_cli.py --audio_file <path/to/audio_file> [options]

Replace <path/to/audio_file> with the path to the audio file you want to process and [options] with the necessary arguments. For a detailed list of arguments available for each mode, run:

python uvr_cli.py <mode> -h

This will display a help message with explanations for each argument.

Modes

Info and Debugging

ArgumentDescriptionTypeDefaultRequired
-d, --debugEnable debug logging. Equivalent to --log_level=debug.boolFalseNo
-e, --env_infoPrint environment information and exit.boolFalseNo
-l, --list_modelsList all supported models and exit.boolFalseNo
--log_levelLog level, e.g. info, debug, warning (default: info).strinfoNo

Separation I/O Params

ArgumentDescriptionTypeDefaultRequired
-m, --model_filenameModel to use for separation. Example: -m 2_HP-UVR.pthstrmodel_mel_band_roformer_ep_3005_sdr_11.4360.ckptNo
--output_formatOutput format for separated files, any common format (default: WAV). Example: --output_format=MP3strWAVNo
--output_dirDirectory to write output files (default: <current dir>). Example: --output_dir=/app/separatedstrNoneNo
--model_file_dirModel files directory (default: uvr/tmp/audio-separator-models/). Example: --model_file_dir=/app/modelsstruvr/tmp/audio-separator-models/No

Common Separation Parameters

ArgumentDescriptionTypeDefaultRequired
--invert_spectInvert secondary stem using spectrogram (default: False). Example: --invert_spectboolFalseNo
--normalizationMax peak amplitude to normalize input and output audio to (default: 0.9). Example: --normalization=0.7float0.9No
--single_stemOutput only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=InstrumentalstrNoneNo
--sample_rateModify the sample rate of the output audio (default: 44100). Example: --sample_rate=44100int44100No

MDX Architecture Parameters

ArgumentDescriptionTypeDefaultRequired
--mdx_segment_sizeLarger consumes more resources, but may give better results (default: 256). Example: --mdx_segment_size=256int256No
--mdx_overlapAmount of overlap between prediction windows, 0.001-0.999. Higher is better but slower (default: 0.25). Example: --mdx_overlap=0.25float0.25No
--mdx_batch_sizeLarger consumes more RAM but may process slightly faster (default: 1). Example: --mdx_batch_size=4int1No
--mdx_hop_lengthUsually called stride in neural networks, only change if you know what you're doing (default: 1024). Example: --mdx_hop_length=1024int1024No
--mdx_enable_denoiseEnable denoising during separation (default: False). Example: --mdx_enable_denoiseboolFalseNo

VR Architecture Parameters

ArgumentDescriptionTypeDefaultRequired
--vr_batch_sizeNumber of batches to process at a time. Higher = more RAM, slightly faster processing (default: 4). Example: --vr_batch_size=16int4No
--vr_window_sizeBalance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320int512No
--vr_aggressionIntensity of primary stem extraction, -100 - 100. Typically 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2int5No
--vr_enable_ttaEnable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_ttaboolFalseNo
--vr_high_end_processMirror the missing frequency range of the output (default: False). Example: --vr_high_end_processboolFalseNo
--vr_enable_post_processIdentify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_processboolFalseNo
--vr_post_process_thresholdThreshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1float0.2No

Demucs Architecture Parameters

ArgumentDescriptionTypeDefaultRequired
--demucs_segment_sizeSize of segments into which the audio is split, 1-100. Higher = slower but better quality (default: Default). Example: --demucs_segment_size=256strDefaultNo
--demucs_shiftsNumber of predictions with random shifts, higher = slower but better quality (default: 2). Example: --demucs_shifts=4int2No
--demucs_overlapOverlap between prediction windows, 0.001-0.999. Higher = slower but better quality (default: 0.25). Example: --demucs_overlap=0.25float0.25No
--demucs_segments_enabledEnable segment-wise processing (default: True). Example: --demucs_segments_enabled=FalseboolTrueNo

MDXC Architecture Parameters

ArgumentDescriptionTypeDefaultRequired
--mdxc_segment_sizeLarger consumes more resources, but may give better results (default: 256). Example: --mdxc_segment_size=256int256No
--mdxc_override_model_segment_sizeOverride model default segment size instead of using the model default value. Example: --mdxc_override_model_segment_sizeboolFalseNo
--mdxc_overlapAmount of overlap between prediction windows, 2-50. Higher is better but slower (default: 8). Example: --mdxc_overlap=8int8No
--mdxc_batch_sizeLarger consumes more RAM but may process slightly faster (default: 1). Example: --mdxc_batch_size=4int1No
--mdxc_pitch_shiftShift audio pitch by a number of semitones while processing. May improve output for deep/high vocals. (default: 0). Example: --mdxc_pitch_shift=2int0No

Example:

uvr_cli.py --audio_file "my_song.mp3" --output_format MP3 --output_dir "/path/to/output" --model_filename "2_HP-UVR.pth" --vr_aggression 10