TX-Z: Browser-Native DJ Instrument

01 | AUDIO ENGINE

Why I Ditched the Audio Tag

My first attempt used a standard <audio> element. It worked, but felt wrong. Every time I hit a pad, there was a tiny delay before the sound played. Maybe 50 milliseconds. Doesn't sound like much, but when you're trying to hit a beat, it's the difference between feeling like you're playing an instrument versus fighting with software.

The problem is that <audio> was designed for streaming music, not for musical performance. When you seek to a new position, the browser has to decode that chunk of audio first. That takes time.

So I switched to the Web Audio API. The trick is to decode the entire track upfront into an AudioBuffer. Now when you hit a pad, the browser doesn't need to decode anything. It just starts reading from memory. The response feels instant because it basically is.

The tradeoff is memory. A 5-minute track takes about 50MB of RAM. But for an instrument that needs to feel responsive, it's worth it.

Latency
< 3ms

Sample Rate

44.1kHz

Context

Single, shared

Memory

~10MB/min

AudioBufferSourceNode

↓

BiquadFilter (Low-pass)

↓

AnalyserNode (FFT)

↓

GainNode (Crossfade)

↓

GainNode (Trim)

↓

Master → Speakers

02 | WAVEFORM DISPLAY

Drawing Without Killing Performance

The naive approach to drawing a waveform is to read the audio data every frame and draw it. I tried this. It worked for about 3 seconds before the UI started stuttering.

The issue is that a 3-minute track at 44.1kHz has about 8 million samples per channel. Reading and processing that every 16 milliseconds (60fps) is way too much work.

Instead, I pre-compute a simplified dataset when the track loads. I chunk the audio into ~400 blocks, calculate the peak amplitude for each block, and store that. Now the canvas just draws 400 bars instead of millions of samples.

The only thing that updates each frame is the playhead position. One red line moving across the screen. Everything else is static until you load a new track.

Frame Rate
60 fps

Dataset

400 points

Realtime

Playhead only

Renderer

Canvas 2D

0:00 PRE-COMPUTED 3:24

03 | ONSET SLICING

Finding the Drum Hits Automatically

The lazy way to map 16 pads is to divide the track into 16 equal chunks. I tried this first. It's terrible. You end up landing in the middle of a word, between drum hits, nowhere musically useful.

What I actually want is to land on the attacks: the moment a kick hits, a snare cracks, a vocal starts. These are called transients, and detecting them is a whole field of audio research.

I implemented a simplified version: chunk the audio into 20ms frames, calculate the energy (loudness) of each frame, then look for sudden jumps in energy. A big jump usually means something percussive happened. I filter out duplicates by requiring at least 80ms between detected hits.

The algorithm finds the 16 strongest transients and maps them to pads in order. Now when you hit pad 1, you get the first kick. Pad 3 might be a snare. The track becomes playable without any manual slicing.

Frame Size

20ms

Hop

10ms

Min Gap
80ms

Output

16 slices

↑ Detected transients (energy spikes)

↑ Mapped to performance pads

04 | JOG WHEEL

Making a Circle Feel Like Vinyl

On a real CDJ, the jog wheel does two things: you can spin it to scrub through the track, and when the track is playing, the platter spins to show you it's moving.

My first implementation animated the entire wheel element. This caused a weird problem: when I dragged to seek, the rotation transform fought with my drag position calculations, making the wheel jump around.

The fix was to separate concerns. The outer wheel stays still and handles drag events. Inside it, there's a "rotor" element that spins independently when the track is playing. The red marker is attached to the rotor, so it spins with the music but doesn't interfere with seeking.

Dragging left/right maps to seeking backward/forward in the track. The sensitivity is tuned so it feels roughly like a real platter. Not too fast, not too sluggish.

Seek Sensitivity

0.012s/px

Rotor
Independent

Spin Speed

30 RPM

Input

Mouse drag

DECK A

DECK B

05 | CROSSFADER

Why Linear Fading Sounds Wrong

The obvious way to crossfade is linear: as you slide from A to B, deck A goes from 100% to 0% while deck B goes from 0% to 100%. Simple math.

But it sounds wrong. When both decks are at 50%, the mix sounds quieter than either deck alone. This is because volume perception is logarithmic, not linear. Two signals at 50% don't add up to 100%. They add up to about 70%.

The fix is an "equal power" curve. Instead of linear interpolation, I use sine and cosine. At the center position, both decks output at ~0.707 (which is √0.5). Mathematically, this keeps the total energy constant across the entire sweep.

The result: smooth transitions with no volume dip in the middle.

Curve
Equal Power

A @ Center

0.707

B @ Center

0.707

Range

0.0 → 1.0

DECK ADECK B

Deck A = cos(x × π/2)
Deck B = sin(x × π/2)

06 | SYNTH KEYBOARD

Four Waveforms, One Octave, Zero Latency

Sometimes you want to layer a tone on top of what you're playing. I added a monophonic synthesizer that generates tones through Web Audio oscillators. One octave of keys, mapped to your keyboard from Z to /.

The four waveforms follow the same order as the sampler's oscillator buttons: SIN, SQR, SAW, TRI. Each one has a distinct character that works differently in a mix.

SIN (Sine) is the purest tone. No overtones, just the fundamental frequency. Good for sub-bass drops or clean melodic lines that sit underneath a busy track without competing for space.

SQR (Square) is the harshest. It adds only odd harmonics, which gives it that retro, 8-bit, hollow buzzing quality. Cuts through a dense mix easily, but can be fatiguing, so I use it mostly for short stabs.

SAW (Sawtooth) contains all harmonics, odd and even. It's the richest, buzziest waveform. Sounds closest to a real synthesizer lead. Good for melodic lines where you want presence and warmth.

TRI (Triangle) is between sine and square. Soft, mellow, slightly nasal. Works well for pads or background tones where you want some harmonic content without dominating the mix.

The keyboard layout matches a real piano: white keys for naturals (C through B), black keys for sharps. Everything routes through the same master output as the decks, so your synth notes get captured if you're recording.

Type

Monophonic

Range

1 Octave

Waveforms
4 Types

Keys

Z to /

White keys: naturals | Black keys: sharps

SIN

SQR

SAW

TRI

07 | VISUAL DESIGN

Looking Like Hardware, Not Software

Early versions looked like a "cool synth app": neon colors, glowing effects, rounded corners. It felt like a toy. I wanted something that felt like actual equipment you'd trust on stage.

I studied the interfaces of gear I respect: Teenage Engineering's OP-1, Elektron's Digitakt, Pioneer's CDJs. They share a common language: matte surfaces, hard edges, restrained color, typography that's functional rather than decorative.

The constraint I set: monochrome base (black, gray, white only), maximum two accent colors. Red marks Deck A and primary actions. Amber marks Deck B and secondary states. That's it. No gradients on controls, no glows, no RGB gaming aesthetics.

JetBrains Mono became the typeface because it's technical but readable, and the tabular figures prevent the timecode displays from jittering as numbers change.

Typeface
JetBrains Mono

Border Radius

0px

Base

#040404

Accents

2 colors only

Deck A | #F90022

Deck B | #FFB300

Base | #040404

Surface | #0E0E0E

08 | DISPLAY MODES

Four Ways to See the Same Sound

Audio is invisible. That's a problem when you're performing. You need to see what's happening in the sound so you can make decisions about what to do next. Each display mode gives you a different lens on the same audio data, and each one is useful for different tasks.

WAV (Waveform) shows the pre-computed amplitude over time. This is your bird's-eye view of the track. You can see where the drops are, where it gets quiet, where the build-ups happen. I use this for navigation: finding the right moment to jump to or loop around.

FFT (Spectrum) runs a Fast Fourier Transform on the live audio and renders it as a frequency bar chart. Low frequencies on the left, highs on the right. This is how you see whether the bass is clashing between two tracks, or whether the EQ needs adjusting before a transition.

OSC (Oscilloscope) draws the raw waveform in real time. Unlike WAV, which shows the whole track, this shows the actual signal shape right now. It's the most visceral mode. You can literally watch the wave change shape when you sweep the filter or switch oscillator types on the synth.

SEQ (Sequencer) divides the canvas into 16 numbered regions, one per pad. It highlights the current region as the track plays and shows loop boundaries when they're active. This is your structural view. It tells you where you are in relation to your slicing grid, which matters when you're building patterns from onset-detected hits.

Together, these four modes map the core dimensions of audio: time-domain overview (WAV), frequency content (FFT), instantaneous signal (OSC), and musical structure (SEQ). You're always one click away from the view that answers your current question.

WAV
Navigation

FFT

Mix clarity

OSC

Signal shape

SEQ

Structure

WAV

NAVIGATION

FFT

MIX CLARITY

OSC

SIGNAL SHAPE

SEQ

STRUCTURE

09 | RECORDING

Capturing Everything You Hear

An instrument isn't much use if you can't record what you play. I wanted recording to capture exactly what comes out of the speakers: both decks mixed together, with all your crossfader moves, EQ sweeps, and cue jumps included.

The Web Audio API makes this possible with MediaStreamDestination. I route the master output to both the speakers and a stream destination. The MediaRecorder API then captures that stream in real time.

When you hit stop, it packages everything into a blob and triggers a download. The file format is WebM (or OGG on Safari). Not ideal for every use case, but it's what browsers support natively without additional encoding libraries.

The button says "REC MIX" instead of just "REC" to make it clear you're recording the output, not your microphone.

API

MediaRecorder

Format

WebM / OGG

Shortcut
Shift + R

Output

Auto-download

REC MIX

Master Output ↓ Stream Destination ↓ MediaRecorder ↓ Download .webm

TX-Z / Dual Deck Browser Sampler

How It Works