Skip to content

Audio source separation library for .NET that uses ONNX models, modeled after python's Ultimate Vocal Remover.

License

Notifications You must be signed in to change notification settings

ModernMube/OwnVocalRemover

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OwnVocalRemover

A high-performance audio source separation library for .NET that uses ONNX models to separate vocals and instrumental tracks from mixed audio files. I've been looking for a good vocal and music separation code in C# for a very long time that provides decent quality. Unfortunately, I could only find such code in Python, so I decided to create a pure Csharp vocal separator that would deliver the quality created by the Python code!

Features

  • ONNX Model Support: Works with pre-trained ONNX models for audio separation
  • GPU Acceleration: Automatic CUDA support with CPU fallback
  • Parallel Processing: Multi-threaded chunk processing with session pooling
  • Memory Management: Adaptive memory pressure monitoring
  • Chunked Processing: Handles large files by processing in configurable chunks
  • Noise Reduction: Optional denoising for improved separation quality
  • Batch Processing: Process multiple files efficiently
  • Progress Tracking: Real-time progress reporting with events
  • Auto-Configuration: Automatically detects model parameters from ONNX metadata

Dependencies

  • MathNet.Numerics - FFT operations
  • Microsoft.ML.OnnxRuntime - ONNX model inference
  • Ownaudio - Audio I/O operations
  • Microsoft.Extensions.ObjectPool - Session pooling for parallel processing

Support My Work

If you find this project helpful, consider buying me a coffee!

Buy Me A Coffee

Quick Start

Basic Usage (Traditional Mode)

// Basic usage with included model
var service = AudioSeparationExtensions.CreateDefaultService(InternalModel.Default);
await service.InitializeAsync();

var result = await service.SeparateAsync("input_song.wav");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");

service.Dispose();

Parallel Processing Mode

// Parallel processing for faster performance
var service = AudioSeparationExtensions.CreateDefaultService(InternalModel.Default);

var parallelOptions = new ParallelProcessingOptions
{
    MaxDegreeOfParallelism = 4,
    SessionPoolSize = 3,
    EnableMemoryPressureMonitoring = true
};

await service.InitializeParallelAsync(parallelOptions);
var result = await service.SeparateAsync("input_song.wav");

service.Dispose();

Configuration Options

SeparationOptions

  • ModelPath: Path to ONNX model file
  • OutputDirectory: Output directory for separated files
  • DisableNoiseReduction: Disable denoising (default: false)
  • Margin: Overlap margin for chunks (default: 44100 samples)
  • ChunkSizeSeconds: Chunk duration in seconds (0 = process entire file)
  • NFft: FFT size (default: 6144)
  • DimT: Temporal dimension parameter (default: 8)
  • DimF: Frequency dimension parameter (default: 2048)

ParallelProcessingOptions

  • MaxDegreeOfParallelism: Maximum concurrent chunks (0 = auto-detect)
  • SessionPoolSize: Number of ONNX sessions in pool (0 = auto-detect)
  • EnableMemoryPressureMonitoring: Monitor memory usage (default: true)
  • MemoryPressureThreshold: Memory threshold in bytes (default: 2GB)
  • ChunkQueueCapacity: Queue capacity for chunks (default: 10)

Usage Examples

Custom Configuration with Parallel Processing

var options = new SeparationOptions
{
    ModelPath = "",
	Model = InternalModel.Default,
    OutputDirectory = "output",
    ChunkSizeSeconds = 20,
    DisableNoiseReduction = false
};

var parallelOptions = new ParallelProcessingOptions
{
    MaxDegreeOfParallelism = 6,
    SessionPoolSize = 4,
    EnableMemoryPressureMonitoring = true,
    MemoryPressureThreshold = 3_000_000_000 // 3GB
};

var service = new AudioSeparationService(options);
await service.InitializeParallelAsync(parallelOptions);

System-Optimized Configuration

// Automatically configure based on system capabilitiesvar 
var (service, parallelOptions) = AudioSeparationFactory.CreateSystemOptimized(InternalModel.Default, @"output");
await service.InitializeParallelAsync(parallelOptions);

Progress Monitoring

service.ProgressChanged += (sender, progress) =>
{
    Console.WriteLine($"Progress: {progress.OverallProgress:F1}% - {progress.Status}");
    Console.WriteLine($"Chunks: {progress.ProcessedChunks}/{progress.TotalChunks}");
};

service.ProcessingStarted += (sender, file) =>
{
    Console.WriteLine($"Started processing: {file}");
};

service.ProcessingCompleted += (sender, result) =>
{
    Console.WriteLine($"Completed in {result.ProcessingTime}");
};

Batch Processing

var files = new[] { "song1.wav", "song2.wav", "song3.wav" };
var results = await service.SeparateMultipleAsync(files);

foreach (var result in results)
{
    Console.WriteLine($"Processed: {result.VocalsPath}");
}

Pre-configured Factory Methods

Mobile Optimized (Faster)

var service = AudioSeparationFactory.CreateMobileOptimized(
    InternalModel.Default, 
    "output", 
    disableNoiseReduction: true
);
await service.InitializeAsync(); // Traditional mode for mobile

Desktop Optimized (Better Quality)

var service = AudioSeparationFactory.CreateDesktopOptimized(
    InternalModel.Default, 
    "output"
);
await service.InitializeParallelAsync(); // Parallel mode for desktop

System-Optimized with Parallel Processing

var (service, parallelOptions) = AudioSeparationFactory.CreateSystemOptimized(
    InternalModel.Default, 
    "output"
);
await service.InitializeParallelAsync(parallelOptions);

Choosing the Right Model

For general use: Start with default model

var service = AudioSeparationFactory.CreateBatchOptimized(InternalModel.Default, "output");

For best quality: Use best model with desktop settings

var service = AudioSeparationFactory.CreateDesktopOptimized(InternalModel.Best, "output");

For karaoke creation: Use karaoke model

var service = AudioSeparationExtensions.CreateDefaultService(InternalModel.Karaoke);

For custom MDXNET models: Any compatible model works

var service = AudioSeparationExtensions.CreateDefaultService("models/custom_mdxnet.onnx");

Processing Modes

Traditional Mode

  • Single-threaded processing
  • Lower memory usage
  • Suitable for mobile/low-end devices
  • Initialize with InitializeAsync()

Parallel Processing Mode

  • Multi-threaded chunk processing
  • Higher performance on multi-core systems
  • Session pooling for better resource utilization
  • Memory pressure monitoring
  • Initialize with InitializeParallelAsync()

Supported Audio Formats

  • WAV (.wav)
  • MP3 (.mp3)
  • FLAC (.flac)

Output Files

The service generates two files per input:

  • {filename}_vocals.wav - Extracted vocals
  • {filename}_music.wav - Instrumental track

Error Handling

try
{
    var result = await service.SeparateAsync("input.wav");
}
catch (FileNotFoundException ex)
{
    Console.WriteLine($"File not found: {ex.Message}");
}
catch (InvalidOperationException ex)
{
    Console.WriteLine($"Service error: {ex.Message}");
}
catch (AggregateException ex) when (ex.InnerExceptions.Any())
{
    Console.WriteLine("Parallel processing errors occurred:");
    foreach (var innerEx in ex.InnerExceptions)
    {
        Console.WriteLine($"- {innerEx.Message}");
    }
}

Performance Tips

  1. Processing Mode: Use parallel processing on multi-core systems
  2. GPU Acceleration: Ensure CUDA is available for faster processing
  3. Chunk Size: Adjust ChunkSizeSeconds based on available memory
  4. Session Pool: Increase SessionPoolSize for better parallel performance
  5. Memory Management: Enable memory pressure monitoring for large files
  6. Noise Reduction: Disable for faster processing in batch scenarios

Memory Management

The parallel processing mode includes adaptive memory management:

  • Memory Pressure Monitoring: Automatically detects high memory usage
  • Garbage Collection: Forces GC under memory pressure
  • Throttling: Reduces parallelism when memory is constrained
  • Session Pooling: Efficient reuse of ONNX sessions

Statistics and Analysis

The SeparationResult includes audio statistics:

var stats = result.Statistics;
Console.WriteLine($"Vocals RMS: {stats.VocalsRMS:F4}");
Console.WriteLine($"Instrumental RMS: {stats.InstrumentalRMS:F4}");
Console.WriteLine($"Sample Rate: {stats.SampleRate} Hz");
Console.WriteLine($"Processing Time: {result.ProcessingTime}");

Included Models

The library comes with three pre-trained models:

DEFAULT model

  • Type: Basic instrumental separation
  • Quality: Good baseline performance
  • Use case: General purpose separation, fastest processing
  • Output: Clean vocals and instrumental tracks

BEST model

  • Type: High-quality instrumental separation
  • Quality: Superior separation accuracy
  • Use case: When quality is more important than speed
  • Output: High-fidelity vocals and instrumental tracks

Karaoke model

  • Type: Karaoke model (lead vocal removal)
  • Quality: Specialized for karaoke creation
  • Use case: Remove lead vocals while preserving backing vocals
  • Output: Lead vocals and music with backing vocals intact

Model Usage Examples

// Using the default model
var defaultService = AudioSeparationExtensions.CreateDefaultService(InternalModel.Default);

// Using the best quality model with parallel processing
var bestService = AudioSeparationExtensions.CreateDefaultService(InternalModel.Best);
await bestService.InitializeParallelAsync();

// Using the karaoke model
var karaokeService = AudioSeparationExtensions.CreateDefaultService(InternalModel.Karaoke);

MDXNET Model Support

The library is fully compatible with any MDXNET model:

// Using custom MDXNET model with parallel processing
var mdxService = AudioSeparationExtensions.CreateDefaultService("models/my_mdxnet_model.onnx");
await mdxService.InitializeParallelAsync(); // Auto-detects model parameters

Model Requirements

ONNX models should:

  • Accept input shape: [batch, 4, frequency, time]
  • Output same shape as input
  • Support 44.1kHz stereo audio
  • Use STFT-based processing
  • Be compatible with MDXNET architecture

Thread Safety

  • The AudioSeparationService is not thread-safe for concurrent operations on the same instance
  • Parallel processing is handled internally and is thread-safe
  • Create separate instances for concurrent processing of different files
  • Session pooling ensures safe concurrent access to ONNX models

Best Practices

For Single Files

using var service = AudioSeparationExtensions.CreateDefaultService(InternalModel.Default);
await service.InitializeParallelAsync();
var result = await service.SeparateAsync("song.wav");

For Batch Processing

var service = AudioSeparationFactory.CreateBatchOptimized(InternalModel.Default, "output");
await service.InitializeParallelAsync();

var files = Directory.GetFiles("input", "*.wav");
var results = await service.SeparateMultipleAsync(files);

service.Dispose();

For System-Specific Optimization

var (service, options) = AudioSeparationFactory.CreateSystemOptimized(
    InternalModel.Default, 
    "output",
    Environment.ProcessorCount,
    GC.GetTotalMemory(false) / (1024.0 * 1024.0 * 1024.0) // Available memory in GB
);
await service.InitializeParallelAsync(options);

Disposal

Always dispose the service to free ONNX resources and session pools:

using var service = new AudioSeparationService(options);
await service.InitializeParallelAsync();
// Use service...
// Automatically disposed, including session pool cleanup

About

Audio source separation library for .NET that uses ONNX models, modeled after python's Ultimate Vocal Remover.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages