NULLPUNKT

2026-03-02 . Project / Firmware / Hardware . Low-end analyzer for Eurorack

Prototype Project Firmware Hardware KARON

Dedicated low-end analysis tool for Eurorack. Mono compatibility, phase coherence and cancellation metrics for kick+bass -- filtered to the frequency band that matters, displayed without menus, with the audio signal passing through unmodified. Firmware updates via KARON.

What it is

NULLPUNKT answers the questions that matter when producing techno, house and other bass-heavy music in real time: Is my sub-bass range mono-compatible? Are kick and bass cancelling each other? What is the phase relationship between my basslines? Is my sidechain ducking working correctly?

It is not an oscilloscope or a spectrum analyzer. NULLPUNKT filters the input signal to the selected frequency band and displays only the metrics relevant to that range -- directly, without menus. The audio signal passes through unmodified (THRU).

Status: Hardware V1 complete. Firmware V1 ready -- all analysis modes, display views and KARON audio update are working.

Hardware

Spec	Detail
Format	Eurorack, 8HP
MCU	STM32F030RCT6 -- Cortex-M0, 48 MHz, no FPU
Display	128x64 OLED (SSD1306 or SH1106), I2C 400 kHz
ADC	12-bit, 48 kHz dual-channel (TIM3-triggered, DMA circular)
Inputs	CH1 In (Thru), CH2 In (Thru), Trigger In (Thru)
Controls	PEC11R rotary encoder with push button, mode button (SPST)
Indicator	RGB LED (sigma-delta PDM engine via TIM14 IRQ)
Power	Eurorack +/-12V via 2x8 shrouded header, MCP1703A LDO to 3.3V
Debug	ARM JTAG/SWD 10-pin, 7 test points
Update	Firmware via KARON audio bootloader

Signal path

Both audio channels share the same analog topology. Each input is AC-coupled through a 1 uF capacitor with 1M input impedance, blocking any DC offset from upstream modules. The signal then enters one half of an NE5532 dual op-amp configured as a unity-gain buffer with a VMID bias point generated from a 10k/10k precision resistor divider. This shifts the bipolar Eurorack signal (+/-10V) into the MCU's 0-3.3V ADC input range.

After the op-amp, an RC anti-aliasing filter (3.3k + 2.2nF, cutoff ~22 kHz) rolls off content above the Nyquist frequency before the ADC samples at 48 kHz. 1N4148W clamping diodes protect the ADC inputs from voltage excursions beyond the supply rails -- Eurorack signals can swing well beyond what the STM32 tolerates.

The trigger path is different. It does not need waveform fidelity -- only fast, clean edges. The incoming CV/gate signal passes through a 1 uF AC coupling capacitor, then a BAT54S Schottky diode pair clamps it to the 0-3.3V range. A BC817 NPN transistor with a 22k base resistor inverts the signal, producing a clean active-low digital edge on PA2. A 10k pull-up to 3.3V ensures a defined idle state. The firmware debounces the input in software with a 5 ms window on top of the hardware RC filtering already present on the board.

All three inputs (CH1, CH2, trigger) have normalized THRU jacks that pass the original signal through unmodified. The analysis path never touches the audio output.

ADC and DMA

The internal HSI oscillator (8 MHz) is multiplied by the PLL to produce 48 MHz SYSCLK. Flash latency is set to 1 wait state (required above 24 MHz on STM32F0). ADC1 is clocked from a dedicated 14 MHz HSI14 oscillator, independent of the system clock, to avoid jitter from bus contention.

TIM3 generates a TRGO update event at exactly 48 kHz. The ARR value is computed dynamically from the system clock: ARR = (SYSCLK / 48000) - 1, so the sample rate stays correct regardless of PLL configuration (ARR = 499 at 24 MHz, 999 at 48 MHz, 1499 at 72 MHz). Each TRGO trigger fires an ADC1 scan sequence that converts both channels (PA0 and PA1) in succession with 13.5 cycle sampling time. Before the ADC is enabled, the firmware runs a hardware calibration cycle (LL_ADC_StartCalibration) and waits for the ADRDY flag -- this compensates for internal reference voltage and offset errors that would otherwise add a few LSBs of systematic error to every conversion.

DMA1 Channel 1 transfers the results into a circular interleaved buffer of 512 bytes (128 sample pairs, [CH0, CH1, CH0, CH1, ...]).

De-interleaving happens in the DMA interrupt handler, not in the main loop. The DMA fires two interrupts per cycle: Half-Transfer (HT) when the first half of the buffer is complete, and Transfer-Complete (TC) when the second half is done. In each ISR, the firmware loops over the "safe" half (the one DMA is not currently writing to) and splits the interleaved pairs into separate buffer_ch1[] and buffer_ch2[] arrays: buffer_ch1[i] = dma_buf[i*2], buffer_ch2[i] = dma_buf[i*2+1]. This guarantees that the main loop always reads coherent, non-torn data without needing to disable DMA or manage explicit double-buffering. The DMA channel runs at NVIC priority 1 (above the display and encoder interrupts) to ensure no sample pairs are missed.

The application reads frames by computing the current DMA write position from the CNDTR register and extracting the 128 most recent samples from the circular buffer. At 48 kHz and 128 samples, each analysis frame covers approximately 2.7 ms of audio.

DC blocker

The raw ADC samples are centered around 2048 (half of the 12-bit range), but the analog front-end is never perfectly centered. Even a few LSBs of DC offset would corrupt every metric downstream -- MID/SIDE derivation, energy calculations, all of it. The DC blocker is a leaky integrator that tracks the DC component and subtracts it sample by sample, running independently on each channel.

The accumulator tracks the DC offset multiplied by 1024:

$$\mathrm{acc} = \mathrm{acc} - (\mathrm{acc} \gg 10) + x_n$$ $$x_n = x_n - (\mathrm{acc} \gg 10)$$

The right-shift by 10 gives a time constant of $2^{10} = 1024$ samples. At 48 kHz that is approximately 21 ms, which translates to a cutoff frequency of about $f_c = \frac{48000}{2\pi \cdot 1024} \approx 7.5\,\text{Hz}$ -- well below any musical content but fast enough to track thermal drift in the op-amp bias point. The accumulator converges to the true DC offset times 1024, so shifting right by 10 extracts the estimate for subtraction.

IIR lowpass filter

After DC removal, a 1-pole IIR lowpass filter restricts the analysis to the selected frequency band. The filter only affects the analysis path -- the audio THRU output is never filtered. The difference equation is:

$$y_n = y_{n-1} + \alpha \cdot (x_n - y_{n-1})$$

where $\alpha$ is the filter coefficient, precomputed for each preset from the continuous-time prototype:

$$\alpha = 1 - e^{-2\pi f_c \,/\, f_s}$$

The coefficient is stored as a Q15 fixed-point integer: $\alpha_{Q15} = \text{round}(\alpha \cdot 32768)$. Filter state is stored as Q8 (multiplied by 256) to preserve sub-LSB movement. This matters: at the SUB preset, $\alpha_{Q15} = 426$. A unit input difference would produce a state change of $426 \gg 15 = 0$ in plain integer math -- the filter would be completely dead. With Q8 state, the same difference produces $(426 \times 256) \gg 15 = 3$, which is enough for the filter to track correctly. The Q8 shift is removed when reading the output.

Preset	Cutoff	$\alpha_{Q15}$	Use case
SUB 100HZ	100 Hz	426	Club subwoofer zone, kick/bass mono safety
LOW 160HZ	160 Hz	679	Low-end mud, lower midrange buildup
BASS 250HZ	250 Hz	1054	Body, room, frequencies that tend to clutter
BYPASS	--	--	Full-bandwidth analysis, filter bypassed entirely

BYPASS does not set the coefficient to 1.0 (32767). That would overflow: $32767 \times$ a worst-case Q8 input can exceed INT32_MAX. Instead, BYPASS skips the filter entirely and passes the DC-blocked sample through unchanged. Preset changes take effect immediately with no click or discontinuity -- the filter state is not reset, avoiding a hard transient that would be worse than the gradual transition.

MID/SIDE derivation

After filtering, each sample pair is split into MID and SIDE components. This is the standard M/S encoding used in broadcast and mastering -- not something invented for this module:

$$\text{MID} = \frac{\text{CH1} + \text{CH2}}{2} \qquad \text{SIDE} = \frac{\text{CH1} - \text{CH2}}{2}$$

MID represents what both channels have in common (mono content). SIDE represents the difference (stereo content). A perfectly mono signal produces $\text{SIDE} = 0$. A polarity-inverted pair produces $\text{MID} = 0$. When the two channels drift out of phase, energy moves from MID into SIDE -- which is exactly what NULLPUNKT measures. All three metrics (width, cancellation, correlation) are different ways of quantifying this energy distribution.

The division by 2 prevents overflow in the subsequent squaring operations. With 12-bit ADC values centered at zero, the worst case after DC removal is +/-2047. After the division, MID and SIDE are at most +/-1023, so the squared sum over 128 samples is at most $128 \times 1023^2 \approx 134\text{M}$, well within INT32_MAX (2147M).

Width

Width is the ratio of SIDE energy to MID energy, expressed as a percentage. It answers the question: how much of the signal is stereo content versus mono content?

$$W = \frac{E_{\text{SIDE}}}{E_{\text{MID}} + 1} \times 100$$

where $E = \frac{1}{N}\sum x_i^2$ is the mean squared energy over the frame. The +1 in the denominator prevents division by zero when the input is silent.

A width of 0% means the two channels are identical (pure mono). A width of 100% means all energy is in the difference signal -- nothing survives a mono fold. In practice, anything above 50% in the sub range is a problem. The RGB LED switches from green to yellow at that threshold ($\text{SIGNAL\_ANALYSIS\_WIDTH\_RISK\_THR} = 50$).

Cancellation

Cancellation measures how much signal is lost when the two channels are summed to mono. This is the question every mix engineer asks when checking club compatibility: if I fold this to mono, how much disappears?

$$C = 100 - \frac{2 \cdot \text{RMS}_{\text{MID}}}{\text{RMS}_{\text{CH1}} + \text{RMS}_{\text{CH2}}} \times 100$$

The denominator uses per-channel RMS rather than the RMS of the sum. This normalizes against level differences: if CH1 is 6 dB louder than CH2, the sum is dominated by CH1 and little cancellation occurs regardless of phase -- the formula accounts for that. The factor of 2 in the numerator compensates for the MID division by 2.

A cancellation of 0% means no signal is lost on mono fold (identical channels or one channel silent). 100% means total cancellation (exact polarity inversion). RMS values are computed via a Newton-Raphson integer square root that converges in 4-5 steps for 32-bit inputs. The initial guess is derived from the bit-width of the input: the algorithm finds the highest set bit, halves its position, and starts the iteration from $2^{\lfloor \log_2(n)/2 \rfloor + 1}$.

Correlation

Correlation is the Pearson coefficient, scaled to the range $-100$ to $+100$. It is the most informative single number about the phase relationship between two signals. Unlike width and cancellation, correlation is fully amplitude-normalized -- level differences between CH1 and CH2 do not affect the result.

$$r = \frac{\displaystyle\sum_{i=1}^{N} \text{CH1}_i \cdot \text{CH2}_i}{\sqrt{\displaystyle\sum_{i=1}^{N} \text{CH1}_i^2 \;\cdot\; \displaystyle\sum_{i=1}^{N} \text{CH2}_i^2}} \times 100$$

The cross-correlation sum in the numerator can exceed INT32_MAX when multiplied by 100 (worst case: $128 \times 2047^2 \times 100 \approx 54\text{G}$), so the computation uses int64 arithmetic. The denominator requires a 64-bit integer square root because the product of two 32-bit energy sums can reach $2^{58}$. The result fits in 32 bits because $\sqrt{2^{58}} = 2^{29}$. Rounding uses the standard integer technique: add half the divisor before dividing, with sign correction for negative values.

What the numbers mean in practice: $+100$ is identical signals (perfect mono). $0$ is uncorrelated (independent signals, like two different instruments panned center). $-100$ is exact polarity inversion (total cancellation when summed). For sub-bass in club music, anything below $-20$ is a warning and anything below $-50$ triggers a POLARITY! alert on the display.

Metric smoothing

Raw per-frame metrics are fed through a leaky integrator before display. Without smoothing, a 50 Hz signal would produce wildly flickering readings: the 2.7 ms analysis window captures only a fraction of the 20 ms period, so the frame might catch a zero-crossing where all values momentarily drop to zero.

The smoother uses the same structure as the DC blocker, but operates at frame rate (30 Hz) instead of audio rate (48 kHz):

$$s_n = s_{n-1} - (s_{n-1} \gg K) + (x_n \ll 8) \qquad \text{output} = s_n \gg 8$$

With $K = 2$, the time constant is $2^K = 4$ frames, approximately 133 ms at 30 fps. The Q8 state (shift left by 8 on input, shift right by 8 on output) prevents truncation at small metric values -- same reason as the IIR filter state. All three output metrics (width, cancellation, correlation) are smoothed independently. The result is stable readings that respond to actual changes in the mix within about 150 ms but do not flicker on every zero-crossing of a sub-bass signal.

Integer square root

Both cancellation and correlation need square roots. On a Cortex-M0 with no FPU and no hardware divide instruction, sqrtf() would compile to a massive software library call. NULLPUNKT uses a Newton-Raphson integer square root instead -- a tight loop that converges in 4-5 iterations for 32-bit inputs:

$$x_{n+1} = \frac{x_n + \lfloor N / x_n \rfloor}{2}$$

The initial guess is derived from the bit-width of the input: find the highest set bit via a shift loop, halve its position, and start from $2^{\lfloor \log_2(N)/2 \rfloor + 1}$. Iteration stops when $x_{n+1} \geq x_n$ (the sequence is monotonically decreasing toward $\lfloor\sqrt{N}\rfloor$ once it undershoots).

For the correlation denominator, where the product of two 32-bit energy sums can reach $2^{58}$, a 64-bit variant is used. The algorithm is identical but operates on uint64_t values. The result always fits in 32 bits because $\sqrt{2^{58}} = 2^{29}$.

Display views

Three views, cycled by rotating the encoder:

Goniometer

Lissajous XY plot with phosphor persistence. SIDE is mapped to X, MID to Y -- the standard orientation where a mono signal produces a vertical line, a fully out-of-phase signal produces a horizontal line and a stereo signal produces an ellipse or cloud. A ring buffer holds the last 4 frames (512 points total, stored as int8_t pairs clamped to $-127..+127$ after a right-shift by 2); all points are rendered every frame to produce phosphor-style persistence like a professional PC analyzer.

Dual meter

Width (top) and Cancellation (bottom) as horizontal bar meters, each 76 px wide with a numeric percentage readout.

Phase correlation

Single large bar spanning $-100$ to $+100$ with a center marker at 0. Grows right for positive correlation, left for negative. Displays a POLARITY! warning when correlation drops below $-50$.

Display driver

The SSD1306 (default) or SH1106 OLED is driven over I2C at 400 kHz via PB8/PB9 (AF1, open-drain with pull-ups). The bus abstraction supports both I2C and SPI at compile time via HW_OLED_BUS_MODE; the V1 hardware uses I2C. Each I2C transaction sends a control byte (0x00 for commands, 0x40 for data) followed by the payload, using I2C AUTOEND mode so the STOP condition is generated automatically by the peripheral. The bus layer waits for the BUSY flag to clear before each transaction -- since the OLED is the only I2C device on the bus, a simple busy-wait is sufficient.

The init sequence configures page addressing mode (0x02), COM output scan remap (0xC8), segment remap (0xA1, column 127 maps to SEG0), 1/64 multiplex ratio, charge pump enabled (0x8D 0x14), and contrast at maximum (0xFF). For SH1106 displays, an SSD1306_X_OFFSET constant shifts the column start address to compensate for the 132-column internal RAM (the SH1106 has 2 invisible columns on each side that the SSD1306 does not).

The framebuffer is 128x64 pixels = 1 KB, organized in 8 horizontal pages of 128 bytes each. To minimize bus time, a dirty-page tracking system uses an 8-bit bitmask where each bit corresponds to one page. Only pages with the dirty bit set are transferred. In a typical frame, 7 of 8 pages are dirty (the top status bar is usually static), saving about 3 ms per frame.

A full 8-page transfer takes ~25 ms at 400 kHz; with dirty-page skipping the typical transfer is ~22 ms, leaving ~11 ms of CPU headroom per 33 ms frame interval (30 fps).

The font renderer uses a 5x7 bitmap font stored as column bytes (bit 0 = top pixel, bit 6 = bottom pixel), covering ASCII 0x20 (space) to 0x5A (Z) -- just enough for the metric readouts and status labels. Each character cell is 6 pixels wide (5 glyph columns + 1 pixel gap). When the Y-coordinate is page-aligned (Y mod 8 = 0), glyph column bytes are OR'd directly into the framebuffer in a tight 5-iteration loop without any per-pixel shifting -- the fast path that handles all status bar and metric text. The slow path (arbitrary Y) falls back to individual DrawPixel calls for each set bit.

A 2x scaled variant renders each font pixel as a 2x2 block (12 px cell width, 14 px row height, 10 characters per line) for the large metric readouts. Horizontal and vertical line primitives, a faux-bold renderer (draw string twice offset by 1 pixel), and a full-framebuffer bitmap blit (DrawBitmap, 1024 bytes via memcpy) round out the graphics API.

At startup, the Oscaria logo is faded in using the SSD1306 hardware contrast register (0x81), stepped from 0 to 255 over 24 increments across ~480 ms, followed by a brief invert-blink accent.

Controls

Action	Function
Encoder rotate	Cycle display views
Encoder press	Select preset (SUB -> LOW -> BASS -> BYPASS -> SUB)
Mode button short	Freeze/Hold -- pauses analysis, last frame stays on screen
Mode button long	RGB LED on/off

The PEC11R rotary encoder (PB4/PB5 for quadrature A/B, PB6 for push button) is decoded entirely in hardware interrupts. EXTI lines 4 and 5 fire on both rising and falling edges; the ISR reads both pins, builds a 4-bit index from (old_state << 2) | new_state and looks up the direction in a 16-entry quadrature table that maps every valid state transition to +1, -1 or 0 (invalid/no movement). This gives one raw tick per edge, four per detent.

A signed 16-bit accumulator in the ISR collects raw ticks with INT16 clamping to prevent wrap-around on fast spins. The application layer reads and atomically clears this accumulator (IRQs disabled for the read-and-zero), then feeds the raw ticks into a substep accumulator that divides by 4 (ticks per detent) using a while-loop rather than integer division -- this preserves the remainder across calls so partial detents are never lost. The final UI delta is clamped to $[-127, +127]$ and returned as int8_t.

The push button on PB6 uses a separate EXTI with internal pull-up (no external pull-up present on this line) and latches a one-shot press event on the falling edge. The ISR checks the pin state after clearing the EXTI flag to distinguish press from release.

The mode button (PB7, active low with external 10k pull-up) is polled in the main loop rather than interrupt-driven. Short press toggles freeze/hold; long press toggles the RGB LED.

Trigger input

Accepts a CV/gate signal (active low via the BC817 inverter on PA2, configured as floating input since the external R14 pull-up holds the line high at idle). The firmware polls the pin each main-loop iteration and runs a three-variable debounce state machine: trig_candidate holds the last sampled level, trig_stable_since records when that level was first seen, and trig_prev stores the last accepted stable state. If the raw reading differs from the candidate, the candidate is updated and the timestamp restarts. If the candidate has been stable for 5 ms (checked via SystemTime_GetDelta), it is accepted and a falling-edge event (1 to 0 transition) sets the one-shot trigger flag.

On each confirmed trigger event, analysis is automatically frozen for 200 ms and [TRG] is shown in the top bar. Re-triggers if a new pulse arrives before the window expires. Designed for use with a kick drum gate to inspect the exact moment of impact.

RGB LED

The RGB LED (PC10/PC11/PC12) is driven by a sigma-delta PDM engine running in the TIM14 update interrupt at 20 kHz (prescaler and ARR computed dynamically from SystemCoreClock, same approach as the ADC timer). Each color channel has a 16-bit accumulator; on every tick, the target brightness (0-255) is added to the accumulator. If the result is $\geq 256$, the pin goes high and 256 is subtracted; otherwise the pin goes low. Over time, the duty cycle converges to $\text{target}/256$, producing flicker-free dimming at 256 brightness levels without requiring PWM timer channels or AF pin mappings.

All target values pass through a Scale8 function that applies a global brightness: (x * scale + 127) / 255, with rounding. The default global brightness is 64 out of 255 (25%), keeping the LED visible without being blinding in a dark studio.

A 1 ms animation scheduler runs inside the same ISR using a Bresenham-style accumulator: ms_acc += 1000; if (ms_acc >= tick_hz) { ms_acc -= tick_hz; Anim1ms(); }. This derives exact millisecond timing from the 20 kHz interrupt rate without requiring a separate timer. In idle mode, the LED cycles through a full HSV hue rotation over 8 seconds (1536 steps, 6 segments of 256 values each, integer-only -- no floating-point in the HSV conversion). The hue position is tracked as a 16.16 fixed-point accumulator with a per-millisecond step precomputed as $(1536 \ll 16) / \text{period\_ms}$. When analysis is active, the application overrides the color based on mono risk.

Color	Meaning
Green	Sub range is mono-compatible (width < 50%, correlation > -20)
Yellow	Getting wide -- width above 50% or correlation below -20
Red	Negative correlation -- phase problem, below -50

Typical workflows

Club check: is my sub mono?

Route a stereo bus from your DAW to CH1/CH2. Select SUB 100HZ. The goniometer should show a narrow vertical line and correlation should be near +100. If correlation fluctuates or the goniometer opens into an ellipse, there is a phase problem -- typically from EQ phase shift, compressor latency or poorly tuned samples.

Kick/bass tuning

Kick mono to CH1, bass mono to CH2. Select SUB 100HZ. Play both simultaneously. Correlation shows the sub-range phase relationship between the two signals, independent of level differences. Switch to BYPASS to check whether a phase issue extends into the broader frequency range.

Sidechain check

Kick on CH1, bass on CH2. When the sidechain ducking engages, the bass channel gets quieter -- the goniometer trace shrinks along the CH2 axis. Use the trigger input for automatic freeze on the kick downbeat.

Firmware

Three layers: hardware abstraction (hw/), drivers (drivers/) and application logic (app/). No vendor HAL -- all register-level, using only the STM32 LL (Low-Layer) headers for register definitions. The main loop runs at 30 Hz, locked to the display refresh cycle: capture frame, run analysis, update display, poll inputs. Everything fits comfortably within the 33 ms budget.

The clock tree is configured early in the boot sequence: HSI (8 MHz) is enabled and stable, then flash latency is set to 1 wait state before switching to the higher clock (the STM32F0 requires 1 wait state for SYSCLK above 24 MHz; setting it afterwards would cause hard faults). Only then is the PLL configured (HSI * 6 = 48 MHz), enabled, and switched in as system clock source. AHB and APB1 both run undivided at 48 MHz. SystemCoreClock is updated to 48000000 so that all timing-dependent peripherals (SysTick, ADC timer, LED PDM timer) automatically derive correct rates.

The timebase is a simple SysTick interrupt at 1 kHz (configured via LL_Init1msTick from SystemCoreClock, using HCLK as clock source). Each tick increments a volatile 32-bit counter. All timing in the firmware -- delays, debounce windows, trigger hold duration, animation scheduling -- uses this counter via SystemTime_GetDelta(start), which computes elapsed time by unsigned subtraction and therefore handles the 32-bit wrap correctly (~49 days).

Memory

Build	RAM	Flash
Debug (-O0)	9.3 KB (28%)	28 KB (11%)
Release (-Os)	9.3 KB (28%)	16 KB (6%)

Plenty of headroom in both configurations. Flash size at release is roughly half of debug due to -Os optimization. The STM32F030RCT6 has 32 KB RAM and 256 KB flash.