Add to favorites

Audio to Sheet Music

Transcribe audio into sheet music using Basic Pitch neural network running in your browser. Handles humming, singing, solo instruments, piano, guitar, and full mixes. Auto-detects tempo, key, and staff type (treble/bass/grand); supports title/composer metadata, snap-to-key cleanup, live transpose, and exports to MIDI, MusicXML, SVG, PNG, or print.

Audio-to-sheet-music transcription estimates which pitches are sounding at each moment in a recording, groups those estimates into discrete notes, and aligns them to a musical grid of beats and measures. Under the hood the tool runs Basic Pitch convolutional neural network directly in your browser, a CNN over log-Mel spectrograms that emits three per-frame probability maps (note-onset, note-active, and pitch-contour) decoded into overlapping note events at ~11 ms resolution across a piano-range of pitches. The model is ~900 KB gzipped, downloaded on first use and cached thereafter; inference runs on the WebGL backend when available and falls back to CPU otherwise. Basic Pitch is polyphonic by design: it handles humming, singing, solo instruments, piano recordings, guitar strums, and full mixes in a single code path, with no mode switch to get wrong. Once notes are detected, a register-aware single-melody collapse can reduce chord-heavy material to a clean lead line (picking at each moment the note that balances loudness against proximity to the running pitch center, which prevents the output from jumping between bass and melody octaves). An octave-outlier filter drops spurious notes more than two octaves from the main register, a common Basic Pitch artifact where the model fires on a harmonic partial alongside the real pitch. A tempo estimator autocorrelates the RMS envelope and blends the result with an inter-onset-interval histogram; a Krumhansl-Schmuckler key estimator picks the best-correlating major or minor key from the weighted pitch-class distribution. All three are starting points you can override after seeing the first pass.

Runs in your browser and files never uploadedMore audio processing Jump to full guide

Initializing in your browser…

Audio Waveform Generator

Generate beautiful waveform visualizations from audio files. Choose from bars, mirror, line, or circular styles. Customize colors and export as PNG.

Tone Generator

Generate pure audio tones with sine, square, sawtooth, and triangle waveforms. Create multiple oscillators, binaural beats, and export as WAV.

BPM Detector

Detect the tempo (BPM) of any audio file. Includes tap tempo feature and genre reference guide.

Audio to Sheet Music: a worked example

You hummed a melody into a recording and want it written as notation to share with a bandmate.

Input

melody.wav (monophonic hum)

Audio to Sheet Music produces

Output

Detected key C major, ~96 BPM → notated staff + exportable MusicXML/MIDI

Pitch tracking estimates notes, key, and tempo from a monophonic recording and renders standard notation you can export to MusicXML or MIDI for a notation app. It works best on a clear single-line melody, which the tool states plainly.

What is Audio to Sheet Music?

How to use

1Upload an audio file, humming, singing, a solo instrument, or a full recording all work
2Click "Transcribe to Sheet Music". The Basic Pitch model downloads on first use (~900 KB) and analyzes your file entirely in the browser
3Review the detected tempo, key, and time signature; use the controls above the staff to override any of them, the score re-notates in real time
4Click "Play score" to hear a synthesized playback of the quantized notation so you can judge if the transcription matches what you intended
5Adjust the quantize grid, transpose, staff mode, and snap-to-key to taste, then export as MIDI (DAW-ready), MusicXML (MuseScore / Finale / Dorico), SVG / PNG, or print

Key features

Basic Pitch CNN running fully in the browser, handles humming, singing, solo instruments, and polyphonic recordings with one engine
Automatic tempo estimation via RMS autocorrelation combined with an inter-onset-interval histogram
Automatic key and mode detection via Krumhansl-Schmuckler profile matching
Auto clef/staff selection: picks treble, bass, or grand staff (piano-style, two staves connected by a brace) based on the detected pitch distribution, or override with a manual staff mode
Register-aware single-melody collapse that balances loudness with proximity to the running pitch center, so the output doesn't jump between bass and melody octaves
Optional snap-to-key post-processing that rounds chromatic notes to the nearest scale tone for cleaner hum/whistle transcriptions without stray sharps and flats
Octave-outlier filter removes spurious low or high harmonics that Basic Pitch occasionally emits alongside the real note
Quantization-aliasing resolver: when two short adjacent notes would land in the same grid slot, one winner is picked instead of rendering them as a chord stack
Line-break-aware ties render correctly across system breaks, splitting into a tail on the source line and a head on the destination line instead of drawing a diagonal across the page
Live re-quantization as you adjust tempo, time signature, key signature, quantize grid, transpose, staff mode, and snap-to-key
Transpose preserves the key signature, shifting up a minor third in F major retakes the score in Ab major, not F-with-accidentals
In-browser synthesized playback of the quantized score so you can hear what was transcribed
Exports to MIDI (SMF Type 0, 480 ppq), MusicXML 3.1 (with staff assignments for grand staff), SVG, PNG, and print-ready sheet music

Tips & best practices

For humming or singing, record close to the mic in a quiet room, background noise triggers spurious notes
Sustain each note clearly for at least 200 ms; extremely short staccato notes may be dropped as glitches
Use "Play score" after transcription to sanity-check the result, often a wrong tempo or key is obvious the moment you hear the synthesized version
Double the detected tempo if the notation looks half as slow as you sang it; halve it if everything is sixteenths
Use "Quarter notes" quantize for a first sanity check of the melodic contour, then switch to "Eighth notes" for final detail
Try the "Snap to key" toggle on whistle, hum, or choral recordings, chromatic slides between notes disappear and the score reads as pure scale tones, which is usually what you want
Force "Grand staff" if the auto picker puts your piano recording on a single staff; force "Treble only" or "Bass only" if you specifically want a lead-sheet or a bass-line transcription even with a wide range
Narrow the pitch range (G3–C6 by default) to whichever register actually contains your melody, this prevents the model from latching onto octave harmonics or bass-line bleed
Turn off "Single melody line" if your source is a piano or guitar and you want the chord voicings preserved

Common use cases

Capture a melody that's stuck in your head
Hum or sing the line into your microphone, and get a readable lead sheet you can edit in MuseScore or import into a DAW.
Transcribe a solo instrument line
Upload a recording of a flute, violin, sax, or synth lead and extract its melody as notation for practice or arrangement.
Get a lead sheet out of a piano recording
Use "single melody line" to pull out just the top line, or turn it off to see chord voicings rendered on a grand staff.
Generate practice sheet music in a friendlier key
Transpose to match your instrument or vocal range; the key signature follows so the score stays readable.
Share musical ideas with collaborators
Export MusicXML and hand the notation to a collaborator who works in Finale, Dorico, or Sibelius without having to describe the line in words.

How it works

The raw output of Basic Pitch is decoded with the `outputToNotesPoly` procedure: peaks in the onset map above a threshold spawn candidate notes, the frame-activation map determines how long each note stays on, and a "Melodia trick" post-pass prefers continuous melodic lines over short isolated blips. Notes below a loudness floor (a fraction of the loudest detected note) are dropped to suppress hallucinated faint notes. An optional "single-melody" collapse keeps only the most prominent note at each moment, scored by loudness and register proximity, so a full-mix song collapses to a readable lead sheet instead of a chord stack. Turning that off leaves chord voicings intact.

Tempo estimation runs two passes. First, the RMS envelope of the signal is autocorrelated to find the most consistent periodicity in the 40–240 BPM range; this catches the underlying pulse even on inputs where articulated onsets are weak (legato singing). Second, the inter-onset intervals of the detected notes are histogrammed and their median is octave-normalized into the same BPM range, and the two estimates are averaged. Tempo is ambiguous by a factor of two, 60 BPM with eighth-note pulses looks identical to 120 BPM with quarter-note pulses in the autocorrelation, so the detector sometimes picks the wrong octave. The tempo slider overrides the estimate and the score re-notates in real time.

Key estimation uses the Krumhansl-Schmuckler profile algorithm: each pitch class is weighted by the total duration notes in that class sound for in the piece, and the resulting 12-vector is cross-correlated against the 24 major and minor key templates from empirical tonal-hierarchy studies. The best-correlating key wins. This works well on tonal music with 20+ notes and poorly on atonal or very short examples, so the detected key is a starting point you can override.

Quantization rounds each note's start and end to the chosen subdivision (16th, 8th, or quarter) at the active tempo, then splits notes across measure boundaries with tied note values. The decomposer prefers larger note values when they fit (a half note becomes a half note, not two quarters) but respects beat boundaries so a note that crosses beats 2–3 is written as two tied quarters rather than a half note dangling across the bar. A monophonic-resolver de-aliases the case where two short adjacent notes round to the same grid slot, preventing phantom chord stacks. Rendering uses VexFlow, which handles beaming, stem direction, and accidental placement; ties that span system breaks are split into partial ties (a tail on the source line, a head on the destination line) so the eye can trace the tied pitch across the page turn. Exports flow through to standard MIDI (Type 0, 480 ppq), MusicXML 3.1 (Partwise, openable in MuseScore, Finale, Sibelius, Dorico, including staff assignments for grand staff), SVG, PNG, and browser print, plus in-browser playback of the quantized score via a synthesized triangle tone.

Frequently asked questions

Will this transcribe a full song with vocals and band?

Yes, but dense mixes with drums and many overlapping instruments tend to come out cluttered. For the cleanest result, isolate the melody first (our Vocal Remover or a source-separation tool can help) and then transcribe. If you want just the lead vocal, keep "Single melody line" on, the register-aware collapse keeps whichever note sits closest to the running melodic center at each moment, so the output tracks the lead voice instead of jumping to whichever bass note happens to be loudest at that instant.

Why does the first run take so long?

Basic Pitch is a ~900 KB neural network that has to be fetched and compiled for TensorFlow.js the first time you use the tool. After that it's cached and inference runs offline on your device; subsequent transcriptions are much faster.

Why is the detected key wrong?

Key detection uses statistical profiles derived from Western tonal music. It needs 20+ notes of clear material in one key to work reliably. Short excerpts, modal or chromatic music, and microtonal input can throw it off. Use the key dropdown to override.

Why did it pick 60 BPM when the song is clearly at 120?

Tempo is ambiguous by an octave, 60 BPM with eighth-note pulses looks identical to 120 BPM with quarter-note pulses in the autocorrelation. The detector sometimes picks the wrong octave. Slide the tempo up or down by 2x and everything re-notates.

What does "quantize to 16th notes" mean?

Every detected note's start time and duration are rounded to the nearest grid point at the chosen resolution. 16th-note quantize captures the most detail but shows fine rhythmic noise; quarter-note quantize hides that noise but also loses real ornaments. Start with 8ths and adjust.

Can I edit the score after exporting?

Yes. Export as MusicXML and open in MuseScore (free) or any notation software. The XML preserves notes, durations, key, time signature, tempo, ties, and (for grand staff) separate treble/bass staff assignments, so you can clean up, add dynamics, or re-voice without re-transcribing.

Is my audio sent to a server?

No. The Basic Pitch model, pitch detection, tempo and key estimation, and sheet-music rendering all run locally in your browser. The audio file never leaves your device.

Private by design

Audio is decoded and processed locally with the Web Audio API. Your files are never uploaded to a server.

Audio to Sheet Music

You might also like

Audio Waveform Generator

Tone Generator

BPM Detector

Audio to Sheet Music: a worked example

What is Audio to Sheet Music?

How to use

Key features

Tips & best practices

Common use cases

Capture a melody that's stuck in your head

Transcribe a solo instrument line

Get a lead sheet out of a piano recording

Generate practice sheet music in a friendlier key

Share musical ideas with collaborators

How it works

Frequently asked questions

Will this transcribe a full song with vocals and band?

Why does the first run take so long?

Why is the detected key wrong?

Why did it pick 60 BPM when the song is clearly at 120?

What does "quantize to 16th notes" mean?

Can I edit the score after exporting?

Is my audio sent to a server?

Private by design

You might also like

Audio Waveform Generator

Tone Generator

BPM Detector

Audio to Sheet Music: a worked example

What is Audio to Sheet Music?

How to use

Key features

Tips & best practices

Common use cases

Capture a melody that's stuck in your head

Transcribe a solo instrument line

Get a lead sheet out of a piano recording

Generate practice sheet music in a friendlier key

Share musical ideas with collaborators

How it works

Frequently asked questions

Will this transcribe a full song with vocals and band?

Why does the first run take so long?

Why is the detected key wrong?

Why did it pick 60 BPM when the song is clearly at 120?

What does "quantize to 16th notes" mean?

Can I edit the score after exporting?

Is my audio sent to a server?

Private by design