Loading tool...
Convert text to speech using browser Web Speech API. Choose from multiple voices, adjust speed and pitch, and play audio directly.
Create custom ringtones from any audio file. Select portion, add fade effects, export for iPhone (.m4r) or Android (.mp3).
Trim, cut, and slice audio files with interactive waveform visualization. Drag handles to select portions, use keyboard shortcuts, zoom and pan, preview selection before export. Supports MP3, WAV, OGG, AAC.
Remove or isolate vocals from songs using advanced phase cancellation technology, perfect for creating karaoke tracks, extracting acapellas, and separating audio components. The Vocal Remover works by analyzing the stereo mix and removing audio that appears in both channels identically (typically centered vocals), leaving instruments and other effects intact. Works best with professionally mixed stereo tracks where vocals are centered in the mix—amateur recordings, live performances, and heavily-processed vocals may produce less clean results. Choose from multiple processing modes: Karaoke mode removes centered vocals to create backing tracks for singing along, Vocals Only mode extracts vocals by inverting one channel, and Center Isolation mode isolates the center stereo channel for maximum flexibility. Fine-tune results with adjustable reduction strength to control how aggressively vocals are removed, bass preservation settings to maintain low-end punch and clarity, and center width control for precise tuning. Real-time waveform visualization shows your audio during processing, and before/after comparison playback lets you evaluate quality before exporting. Perfect for creating karaoke backing tracks without purchasing expensive karaoke versions, extracting vocals for remixes and mashups, isolating instruments for music practice, creating acapella versions of songs, removing vocals to repurpose music as background content, and music production sampling.
Transform any professionally-mixed song into a karaoke backing track by removing the lead vocal while preserving instruments and harmonies.
Isolate vocal tracks from songs to sample, remix, or use as raw material in your own music production projects.
Remove vocals from songs to practice along with individual instruments, improving your ability to play specific parts.
Extract a cappella vocal-only versions of songs for a cappella groups, remixes, and creative audio projects.
Create instrumental versions of songs for use as background music in videos, presentations, and retail environments.
Extract and manipulate vocal and instrumental elements from existing tracks to use as building blocks in new compositions.
Vocal removal from mixed audio recordings is a fascinating application of signal processing that exploits specific characteristics of how music is typically mixed and mastered in stereo. The predominant technique—phase cancellation or mid-side processing—relies on the industry-standard convention of placing lead vocals at the center of the stereo image, meaning the vocal signal is reproduced at equal amplitude and identical phase in both the left and right channels.
The mathematical principle behind phase cancellation is elegantly simple. When you subtract the right channel from the left channel (or vice versa), any signal component that is identical in both channels—including the centered vocal—cancels out to zero, leaving only the signal components that differ between channels. Since instruments in a professional mix are typically panned to various positions across the stereo field, creating differences between the left and right channels, they survive the cancellation process. The result is a vocal-reduced version that retains most of the instrumental content while significantly attenuating the centered vocal.
This technique can also be expressed through mid-side decomposition, a concept from stereo microphone technique and mastering engineering. The mid signal is the sum of left and right channels (L+R), capturing everything centered in the stereo image, including the vocal. The side signal is the difference between channels (L-R), capturing everything panned away from center. By manipulating the balance between mid and side components, you can attenuate the vocal (which dominates the mid signal) while preserving the stereo instruments (which dominate the side signal).
However, this approach has inherent limitations rooted in the physics of the process. Any non-vocal element that is also centered in the mix—typically bass guitar, kick drum, and sometimes snare drum—will be cancelled along with the vocal. This is why bass preservation controls are essential: they route low frequencies below a cutoff point (typically 100-200 Hz) around the cancellation process, maintaining the low-end energy that is almost universally panned center. Additionally, any spatial processing applied to the vocal—reverb, stereo delay, chorus effects—creates differences between the left and right channels for the vocal signal, causing those processed components to survive the cancellation process as ghostly remnants.
Modern machine learning approaches to source separation, such as those based on deep neural networks trained on separated stem data, have dramatically improved vocal isolation quality by learning complex spectral patterns that distinguish vocal content from instrumental content regardless of stereo positioning. However, these approaches require significant computational resources and large trained models, making the traditional phase cancellation technique—with its elegant simplicity and zero-latency processing—remain highly relevant for browser-based implementations.
Phase cancellation works by removing audio that's identical in both stereo channels. If vocals have reverb, panning, or other stereo effects, they may not be fully removed.
Professionally mixed stereo tracks with centered, dry vocals work best. Mono recordings, live recordings, and heavily processed vocals will have poorer results.
Bass is often centered like vocals. Use the Bass Preserve setting to keep low frequencies intact. This may leave some vocal bass frequencies as well.
All processing happens directly in your browser. Your files never leave your device and are never uploaded to any server.