Transcribe audio into sheet music using Basic Pitch neural network running in your browser. Handles humming, singing, solo instruments, piano, guitar, and full mixes. Auto-detects tempo, key, and staff type (treble/bass/grand); supports title/composer metadata, snap-to-key cleanup, live transpose, and exports to MIDI, MusicXML, SVG, PNG, or print.
Audio-to-sheet-music transcription estimates which pitches are sounding at each moment in a recording, groups those estimates into discrete notes, and aligns them to a musical grid of beats and measures. Under the hood the tool runs Basic Pitch convolutional neural network directly in your browser, a CNN over log-Mel spectrograms that emits three per-frame probability maps (note-onset, note-active, and pitch-contour) decoded into overlapping note events at ~11 ms resolution across a piano-range of pitches. The model is ~900 KB gzipped, downloaded on first use and cached thereafter; inference runs on the WebGL backend when available and falls back to CPU otherwise. Basic Pitch is polyphonic by design: it handles humming, singing, solo instruments, piano recordings, guitar strums, and full mixes in a single code path, with no mode switch to get wrong. Once notes are detected, a register-aware single-melody collapse can reduce chord-heavy material to a clean lead line (picking at each moment the note that balances loudness against proximity to the running pitch center, which prevents the output from jumping between bass and melody octaves). An octave-outlier filter drops spurious notes more than two octaves from the main register, a common Basic Pitch artifact where the model fires on a harmonic partial alongside the real pitch. A tempo estimator autocorrelates the RMS envelope and blends the result with an inter-onset-interval histogram; a Krumhansl-Schmuckler key estimator picks the best-correlating major or minor key from the weighted pitch-class distribution. All three are starting points you can override after seeing the first pass.
Initializing in your browser…
Generate beautiful waveform visualizations from audio files. Choose from bars, mirror, line, or circular styles. Customize colors and export as PNG.
Generate pure audio tones with sine, square, sawtooth, and triangle waveforms. Create multiple oscillators, binaural beats, and export as WAV.
Detect the tempo (BPM) of any audio file. Includes tap tempo feature and genre reference guide.
Audio-to-sheet-music transcription estimates which pitches are sounding at each moment in a recording, groups those estimates into discrete notes, and aligns them to a musical grid of beats and measures. Under the hood the tool runs Basic Pitch convolutional neural network directly in your browser, a CNN over log-Mel spectrograms that emits three per-frame probability maps (note-onset, note-active, and pitch-contour) decoded into overlapping note events at ~11 ms resolution across a piano-range of pitches. The model is ~900 KB gzipped, downloaded on first use and cached thereafter; inference runs on the WebGL backend when available and falls back to CPU otherwise. Basic Pitch is polyphonic by design: it handles humming, singing, solo instruments, piano recordings, guitar strums, and full mixes in a single code path, with no mode switch to get wrong. Once notes are detected, a register-aware single-melody collapse can reduce chord-heavy material to a clean lead line (picking at each moment the note that balances loudness against proximity to the running pitch center, which prevents the output from jumping between bass and melody octaves). An octave-outlier filter drops spurious notes more than two octaves from the main register, a common Basic Pitch artifact where the model fires on a harmonic partial alongside the real pitch. A tempo estimator autocorrelates the RMS envelope and blends the result with an inter-onset-interval histogram; a Krumhansl-Schmuckler key estimator picks the best-correlating major or minor key from the weighted pitch-class distribution. All three are starting points you can override after seeing the first pass.
The raw output of Basic Pitch is decoded with the `outputToNotesPoly` procedure: peaks in the onset map above a threshold spawn candidate notes, the frame-activation map determines how long each note stays on, and a "Melodia trick" post-pass prefers continuous melodic lines over short isolated blips. Notes below a loudness floor (a fraction of the loudest detected note) are dropped to suppress hallucinated faint notes. An optional "single-melody" collapse keeps only the most prominent note at each moment, scored by loudness and register proximity, so a full-mix song collapses to a readable lead sheet instead of a chord stack. Turning that off leaves chord voicings intact.
Tempo estimation runs two passes. First, the RMS envelope of the signal is autocorrelated to find the most consistent periodicity in the 40–240 BPM range; this catches the underlying pulse even on inputs where articulated onsets are weak (legato singing). Second, the inter-onset intervals of the detected notes are histogrammed and their median is octave-normalized into the same BPM range, and the two estimates are averaged. Tempo is ambiguous by a factor of two, 60 BPM with eighth-note pulses looks identical to 120 BPM with quarter-note pulses in the autocorrelation, so the detector sometimes picks the wrong octave. The tempo slider overrides the estimate and the score re-notates in real time.
Key estimation uses the Krumhansl-Schmuckler profile algorithm: each pitch class is weighted by the total duration notes in that class sound for in the piece, and the resulting 12-vector is cross-correlated against the 24 major and minor key templates from empirical tonal-hierarchy studies. The best-correlating key wins. This works well on tonal music with 20+ notes and poorly on atonal or very short examples, so the detected key is a starting point you can override.
Quantization rounds each note's start and end to the chosen subdivision (16th, 8th, or quarter) at the active tempo, then splits notes across measure boundaries with tied note values. The decomposer prefers larger note values when they fit (a half note becomes a half note, not two quarters) but respects beat boundaries so a note that crosses beats 2–3 is written as two tied quarters rather than a half note dangling across the bar. A monophonic-resolver de-aliases the case where two short adjacent notes round to the same grid slot, preventing phantom chord stacks. Rendering uses VexFlow, which handles beaming, stem direction, and accidental placement; ties that span system breaks are split into partial ties (a tail on the source line, a head on the destination line) so the eye can trace the tied pitch across the page turn. Exports flow through to standard MIDI (Type 0, 480 ppq), MusicXML 3.1 (Partwise, openable in MuseScore, Finale, Sibelius, Dorico, including staff assignments for grand staff), SVG, PNG, and browser print, plus in-browser playback of the quantized score via a synthesized triangle tone.
Hum or sing the line into your microphone, and get a readable lead sheet you can edit in MuseScore or import into a DAW.
Upload a recording of a flute, violin, sax, or synth lead and extract its melody as notation for practice or arrangement.
Use "single melody line" to pull out just the top line, or turn it off to see chord voicings rendered on a grand staff.
Transpose to match your instrument or vocal range; the key signature follows so the score stays readable.
Export MusicXML and hand the notation to a collaborator who works in Finale, Dorico, or Sibelius without having to describe the line in words.
Yes, but dense mixes with drums and many overlapping instruments tend to come out cluttered. For the cleanest result, isolate the melody first (our Vocal Remover or a source-separation tool can help) and then transcribe. If you want just the lead vocal, keep "Single melody line" on, the register-aware collapse keeps whichever note sits closest to the running melodic center at each moment, so the output tracks the lead voice instead of jumping to whichever bass note happens to be loudest at that instant.
Basic Pitch is a ~900 KB neural network that has to be fetched and compiled for TensorFlow.js the first time you use the tool. After that it's cached and inference runs offline on your device; subsequent transcriptions are much faster.
Key detection uses statistical profiles derived from Western tonal music. It needs 20+ notes of clear material in one key to work reliably. Short excerpts, modal or chromatic music, and microtonal input can throw it off. Use the key dropdown to override.
Tempo is ambiguous by an octave, 60 BPM with eighth-note pulses looks identical to 120 BPM with quarter-note pulses in the autocorrelation. The detector sometimes picks the wrong octave. Slide the tempo up or down by 2x and everything re-notates.
Every detected note's start time and duration are rounded to the nearest grid point at the chosen resolution. 16th-note quantize captures the most detail but shows fine rhythmic noise; quarter-note quantize hides that noise but also loses real ornaments. Start with 8ths and adjust.
Yes. Export as MusicXML and open in MuseScore (free) or any notation software. The XML preserves notes, durations, key, time signature, tempo, ties, and (for grand staff) separate treble/bass staff assignments, so you can clean up, add dynamics, or re-voice without re-transcribing.
No. The Basic Pitch model, pitch detection, tempo and key estimation, and sheet-music rendering all run locally in your browser. The audio file never leaves your device.
All processing happens directly in your browser. Your files never leave your device and are never uploaded to any server.