← Back to Portfolio
Web Audio API Canvas API Vanilla JS

TX-Z / Dual Deck Browser Sampler

I wanted to build a DJ instrument that lives entirely in the browser. Zero latency, client-side onset detection, and a brutalist interface that feels like physical hardware.

Load a track below to play
TX-Z
Live
👆 This is a real instrument. Click to play.

How It Works

The technical decisions behind each feature

01 | AUDIO ENGINE

Why I Ditched the Audio Tag

My first attempt used a standard <audio> element. It worked, but felt wrong. Every time I hit a pad, there was a tiny delay before the sound played. Maybe 50 milliseconds. Doesn't sound like much, but when you're trying to hit a beat, it's the difference between feeling like you're playing an instrument versus fighting with software.

The problem is that <audio> was designed for streaming music, not for musical performance. When you seek to a new position, the browser has to decode that chunk of audio first. That takes time.

So I switched to the Web Audio API. The trick is to decode the entire track upfront into an AudioBuffer. Now when you hit a pad, the browser doesn't need to decode anything. It just starts reading from memory. The response feels instant because it basically is.

The tradeoff is memory. A 5-minute track takes about 50MB of RAM. But for an instrument that needs to feel responsive, it's worth it.

Latency
< 3ms
Sample Rate
44.1kHz
Context
Single, shared
Memory
~10MB/min
AudioBufferSourceNode
BiquadFilter (Low-pass)
AnalyserNode (FFT)
GainNode (Crossfade)
GainNode (Trim)
Master → Speakers
02 | WAVEFORM DISPLAY

Drawing Without Killing Performance

The naive approach to drawing a waveform is to read the audio data every frame and draw it. I tried this. It worked for about 3 seconds before the UI started stuttering.

The issue is that a 3-minute track at 44.1kHz has about 8 million samples per channel. Reading and processing that every 16 milliseconds (60fps) is way too much work.

Instead, I pre-compute a simplified dataset when the track loads. I chunk the audio into ~400 blocks, calculate the peak amplitude for each block, and store that. Now the canvas just draws 400 bars instead of millions of samples.

The only thing that updates each frame is the playhead position. One red line moving across the screen. Everything else is static until you load a new track.

Frame Rate
60 fps
Dataset
400 points
Realtime
Playhead only
Renderer
Canvas 2D
0:00 PRE-COMPUTED 3:24
03 | ONSET SLICING

Finding the Drum Hits Automatically

The lazy way to map 16 pads is to divide the track into 16 equal chunks. I tried this first. It's terrible. You end up landing in the middle of a word, between drum hits, nowhere musically useful.

What I actually want is to land on the attacks: the moment a kick hits, a snare cracks, a vocal starts. These are called transients, and detecting them is a whole field of audio research.

I implemented a simplified version: chunk the audio into 20ms frames, calculate the energy (loudness) of each frame, then look for sudden jumps in energy. A big jump usually means something percussive happened. I filter out duplicates by requiring at least 80ms between detected hits.

The algorithm finds the 16 strongest transients and maps them to pads in order. Now when you hit pad 1, you get the first kick. Pad 3 might be a snare. The track becomes playable without any manual slicing.

Frame Size
20ms
Hop
10ms
Min Gap
80ms
Output
16 slices
↑ Detected transients (energy spikes)
1
2
3
4
5
6
7
8
↑ Mapped to performance pads
04 | JOG WHEEL

Making a Circle Feel Like Vinyl

On a real CDJ, the jog wheel does two things: you can spin it to scrub through the track, and when the track is playing, the platter spins to show you it's moving.

My first implementation animated the entire wheel element. This caused a weird problem: when I dragged to seek, the rotation transform fought with my drag position calculations, making the wheel jump around.

The fix was to separate concerns. The outer wheel stays still and handles drag events. Inside it, there's a "rotor" element that spins independently when the track is playing. The red marker is attached to the rotor, so it spins with the music but doesn't interfere with seeking.

Dragging left/right maps to seeking backward/forward in the track. The sensitivity is tuned so it feels roughly like a real platter. Not too fast, not too sluggish.

Seek Sensitivity
0.012s/px
Rotor
Independent
Spin Speed
30 RPM
Input
Mouse drag
DECK A
DECK B
05 | CROSSFADER

Why Linear Fading Sounds Wrong

The obvious way to crossfade is linear: as you slide from A to B, deck A goes from 100% to 0% while deck B goes from 0% to 100%. Simple math.

But it sounds wrong. When both decks are at 50%, the mix sounds quieter than either deck alone. This is because volume perception is logarithmic, not linear. Two signals at 50% don't add up to 100%. They add up to about 70%.

The fix is an "equal power" curve. Instead of linear interpolation, I use sine and cosine. At the center position, both decks output at ~0.707 (which is √0.5). Mathematically, this keeps the total energy constant across the entire sweep.

The result: smooth transitions with no volume dip in the middle.

Curve
Equal Power
A @ Center
0.707
B @ Center
0.707
Range
0.0 → 1.0
DECK ADECK B
Deck A = cos(x × π/2)
Deck B = sin(x × π/2)
06 | SYNTH KEYBOARD

Four Waveforms, One Octave, Zero Latency

Sometimes you want to layer a tone on top of what you're playing. I added a monophonic synthesizer that generates tones through Web Audio oscillators. One octave of keys, mapped to your keyboard from Z to /.

The four waveforms follow the same order as the sampler's oscillator buttons: SIN, SQR, SAW, TRI. Each one has a distinct character that works differently in a mix.

SIN (Sine) is the purest tone. No overtones, just the fundamental frequency. Good for sub-bass drops or clean melodic lines that sit underneath a busy track without competing for space.

SQR (Square) is the harshest. It adds only odd harmonics, which gives it that retro, 8-bit, hollow buzzing quality. Cuts through a dense mix easily, but can be fatiguing, so I use it mostly for short stabs.

SAW (Sawtooth) contains all harmonics, odd and even. It's the richest, buzziest waveform. Sounds closest to a real synthesizer lead. Good for melodic lines where you want presence and warmth.

TRI (Triangle) is between sine and square. Soft, mellow, slightly nasal. Works well for pads or background tones where you want some harmonic content without dominating the mix.

The keyboard layout matches a real piano: white keys for naturals (C through B), black keys for sharps. Everything routes through the same master output as the decks, so your synth notes get captured if you're recording.

Type
Monophonic
Range
1 Octave
Waveforms
4 Types
Keys
Z to /
White keys: naturals | Black keys: sharps
SIN
SQR
SAW
TRI
07 | VISUAL DESIGN

Looking Like Hardware, Not Software

Early versions looked like a "cool synth app": neon colors, glowing effects, rounded corners. It felt like a toy. I wanted something that felt like actual equipment you'd trust on stage.

I studied the interfaces of gear I respect: Teenage Engineering's OP-1, Elektron's Digitakt, Pioneer's CDJs. They share a common language: matte surfaces, hard edges, restrained color, typography that's functional rather than decorative.

The constraint I set: monochrome base (black, gray, white only), maximum two accent colors. Red marks Deck A and primary actions. Amber marks Deck B and secondary states. That's it. No gradients on controls, no glows, no RGB gaming aesthetics.

JetBrains Mono became the typeface because it's technical but readable, and the tabular figures prevent the timecode displays from jittering as numbers change.

Typeface
JetBrains Mono
Border Radius
0px
Base
#040404
Accents
2 colors only
Deck A | #F90022
Deck B | #FFB300
Base | #040404
Surface | #0E0E0E
08 | DISPLAY MODES

Four Ways to See the Same Sound

Audio is invisible. That's a problem when you're performing. You need to see what's happening in the sound so you can make decisions about what to do next. Each display mode gives you a different lens on the same audio data, and each one is useful for different tasks.

WAV (Waveform) shows the pre-computed amplitude over time. This is your bird's-eye view of the track. You can see where the drops are, where it gets quiet, where the build-ups happen. I use this for navigation: finding the right moment to jump to or loop around.

FFT (Spectrum) runs a Fast Fourier Transform on the live audio and renders it as a frequency bar chart. Low frequencies on the left, highs on the right. This is how you see whether the bass is clashing between two tracks, or whether the EQ needs adjusting before a transition.

OSC (Oscilloscope) draws the raw waveform in real time. Unlike WAV, which shows the whole track, this shows the actual signal shape right now. It's the most visceral mode. You can literally watch the wave change shape when you sweep the filter or switch oscillator types on the synth.

SEQ (Sequencer) divides the canvas into 16 numbered regions, one per pad. It highlights the current region as the track plays and shows loop boundaries when they're active. This is your structural view. It tells you where you are in relation to your slicing grid, which matters when you're building patterns from onset-detected hits.

Together, these four modes map the core dimensions of audio: time-domain overview (WAV), frequency content (FFT), instantaneous signal (OSC), and musical structure (SEQ). You're always one click away from the view that answers your current question.

WAV
Navigation
FFT
Mix clarity
OSC
Signal shape
SEQ
Structure
WAV
NAVIGATION
FFT
MIX CLARITY
OSC
SIGNAL SHAPE
SEQ
STRUCTURE
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
09 | RECORDING

Capturing Everything You Hear

An instrument isn't much use if you can't record what you play. I wanted recording to capture exactly what comes out of the speakers: both decks mixed together, with all your crossfader moves, EQ sweeps, and cue jumps included.

The Web Audio API makes this possible with MediaStreamDestination. I route the master output to both the speakers and a stream destination. The MediaRecorder API then captures that stream in real time.

When you hit stop, it packages everything into a blob and triggers a download. The file format is WebM (or OGG on Safari). Not ideal for every use case, but it's what browsers support natively without additional encoding libraries.

The button says "REC MIX" instead of just "REC" to make it clear you're recording the output, not your microphone.

API
MediaRecorder
Format
WebM / OGG
Shortcut
Shift + R
Output
Auto-download
REC MIX
Master Output Stream Destination MediaRecorder Download .webm