KARON

2026-03-02 . Project / Firmware . A/B audio bootloader

KARON Project Firmware

A/B bootloader for STM32. Receives firmware updates as audio through the existing input -- no USB, no UART, no exposed programming header needed. Named after Charon, the ferryman who carries souls across the river Styx. KARON carries firmware across audio.

The problem

Eurorack modules are small, sealed and installed once. No USB port on the front panel, no debug header you can reach without pulling the module, no network. Firmware still ships with bugs and people still want updates.

But every module already has an audio input, an analog front-end and a Microcontroller Unit (MCU) with an Analog-to-Digital Converter (ADC). KARON uses that existing path: encode the firmware image as a WAV file, play it into the module from a phone or laptop, done.

The constraints: it can't brick the module, even if the cable gets pulled mid-transfer. It has to work with uncalibrated analog front-ends, cheap phone Digital-to-Analog Converters (DACs) and internal oscillators that are off by 1%. And it has to fit in 16 KB of flash on a small MCU with no floating-point hardware.

How an update works

From the user's perspective, a firmware update is five steps. No software to install, no drivers, no special cables. Just a WAV file and something that can play audio.

The complete update flow from the user's perspective. One button, one audio cable, one LED to watch.

The LED color tells you what's happening. The exact colors depend on the module's hardware, but the pattern is always the same: waiting for audio, receiving, done (success or fail). If the update fails for any reason -- bad audio, cable pulled, file corrupted -- the module reboots into the old firmware. Nothing is lost.

What an update sounds like

A real firmware update, played into the module's audio input. Preamble first, then the calibration tone, then data packets. This is the same file the module receives through its 3.5 mm jack.

A/B slot design

KARON never writes to the slot that's currently running. Active firmware lives in one slot (say A), the incoming update goes into the other (B). The bootloader only marks B as active after the full image is received, verified and committed.

Corrupted transfer, power loss, bad image -- doesn't matter. The old slot is still there, still bootable. At no point during the update are both slots in an inconsistent state.

After flashing, the new image is marked "pending" with a try counter (default: 2). The application has to explicitly confirm that it booted successfully. If it crashes or hangs before that confirmation, the bootloader decrements the counter on the next boot. Two failures, automatic rollback to the last confirmed slot. A bad firmware update can't permanently brick a module.

0x08000000

Bootloader (KARON)

16 KB

0x08004000

Slot A (header + application)

118 KB

0x08021800

Slot B (header + application)

118 KB

0x0803F000

Boot control page 0

2 KB

0x0803F800

Boot control page 1

2 KB

Boot control log

The boot control log tracks which slot is active, how many boot attempts are left, the image Cyclic Redundancy Check (CRC) -- a checksum that detects corrupted data -- and a monotonically increasing sequence number. Each record is 32 bytes. Two 2 KB flash pages alternate for wear leveling -- when one fills up, the latest record gets compacted to the other page and the full one is erased.

The CRC32 is always the last field written. If power dies mid-write, the record is incomplete and its CRC won't match. On the next boot, KARON walks the log, skips anything with a bad CRC, and uses the last valid record. A state transition either fully committed or it never happened.

The confirmation call -- the one the application makes to say "I booted fine, keep this slot" -- runs entirely from RAM with interrupts off. No flash access during the confirmation write.

Normal boot

The bootloader runs every time the module powers on -- not just during updates. Most of the time, there's no update happening. The bootloader just needs to figure out which slot to boot and get out of the way as fast as possible.

Normal boot path. No update button held -- KARON checks the log, validates the image, jumps to the application.

The whole sequence takes milliseconds. Read the boot control log, find the last valid record, check CRC, check the try counter, remap the interrupt vectors, jump. The user never notices the bootloader is there.

If the active slot's CRC doesn't match (corrupted image) or the try counter has hit zero (app crashed twice without confirming), KARON falls back to the other slot automatically. If both slots are bad -- which should only happen on a brand-new chip with nothing flashed yet -- the bootloader stays in update mode and waits for audio.

What can go wrong

Every failure mode ends the same way: the module boots into working firmware. That's the whole point of the A/B design. Here's what happens in each case:

Three failure scenarios. Same outcome: the module boots into working firmware.

Why FSK

Frequency-Shift Keying (FSK) encodes data as different audio frequencies. Each symbol is a short tone, and the receiver figures out which tone is playing. That's it. No amplitude measurement (which would break with different gain stages and volume settings), no phase tracking (which would need a much better clock and a cleaner signal path). Just: which frequency has the most energy right now?

KARON uses four tones, so each symbol carries 2 bits (a "dibit": $\log_2 4 = 2$). Four combinations cover everything: 00, 01, 10, 11. At 200 symbols per second, that's 400 raw bits per second. Slow, but this is a firmware update that happens a few times per year -- speed is irrelevant, reliability is everything.

4-FSK: each symbol is a tone at one of four frequencies. Vertical position = frequency, horizontal = time.

The four frequencies are 2400, 3200, 4000 and 4800 Hz. $\Delta f = 800\,\text{Hz}$ spacing between them -- that's $4\times$ the symbol rate, which keeps the tones cleanly separated (orthogonal) even without any fancy filtering. The band sits at 2.4 to 4.8 kHz: above mains hum and DC coupling problems, below where cheap DACs and phone speakers start to roll off.

The symbol rate follows from the tone spacing: for four tones to be orthogonal, the spacing has to be a whole-number multiple of the symbol rate. 800 Hz spacing at four times the symbol rate gives a symbol rate of 200 Hz. At a typical audio sample rate, that means each symbol occupies a few hundred samples -- enough for the Goertzel filter to get a clean measurement, and slow enough that the whole system runs comfortably even on a small MCU. The effective data rate after framing overhead is low. A full firmware image takes a few minutes. Nobody said it was fast.

Goertzel demodulation

The receiver needs to figure out which of the four tones is playing in each symbol window. One way: run a full Fast Fourier Transform (FFT), get all frequency bins, pick the four you care about. But an FFT computes power at every frequency -- hundreds of bins -- when we only need four. Wasted work on a small MCU.

Goertzel is an algorithm that computes power at a single frequency. You feed it N audio samples and a coefficient that encodes which frequency you're looking for. It runs a tight loop -- one multiply, two subtracts, one add per sample -- and at the end spits out how much energy was at that frequency. Run it four times (once per tone), compare the four results, loudest one wins. That's the decoded dibit.

Inner loop -- runs once per sample:

$$s_n = c \cdot s_{n-1} - s_{n-2} + x_n$$

Coefficient -- one constant per frequency:

$$c = 2\cos\!\left(\frac{2\pi k}{N}\right) \qquad k = \text{freq bin},\; N = \text{window size}$$

After $N$ samples -- compute power:

$$P = s_{N-1}^2 + s_{N-2}^2 - c \cdot s_{N-1} \cdot s_{N-2}$$

For KARON's four tones at $N = 180$ samples, the coefficients (scaled to integers for fixed-point math) are 243, 234, 222 and 207. Those four numbers are the only magic constants in the entire demodulator.

Why N=180 and not the full symbol period? The outer 30 samples on each side are guard bands -- they're skipped. If the receiver's clock is slightly off, the analysis window drifts relative to the real symbol boundaries. Without guard bands, the window would eventually straddle two different symbols and mix their tones. With 30 samples of margin on each side, up to 30 samples of accumulated drift are absorbed. Measured worst case over a full transfer: about 18. Plenty of room.

"Scaled to integers" means KARON doesn't use floating point at all. The MCU has no floating-point hardware, so every float operation would compile to a slow software routine. Instead, the coefficients are multiplied by 128 (Q7 fixed-point: $c_{Q7} = \text{round}(c \cdot 128)$) so all the math stays in regular integer arithmetic. The first version used Q14 (scale factor $2^{14} = 16384$) for better precision, but that pushed some intermediates past 32 bits. This MCU has no 64-bit multiply either -- that compiled to a library call, about 35 cycles per multiply. Way too slow. Q7 keeps everything in 32 bits. One multiply per sample per tone, single cycle. The Q7 version runs roughly ten times faster than Q14 and finishes well within the symbol budget. Precision loss vs Q14: under 0.2%, completely irrelevant when the correct tone has $10\times$ to $100\times$ the power of the others.

Decision logic after all four tones are evaluated: pick the loudest one. No threshold, no signal quality check -- just argmax. Works because the tones are orthogonal: when f2 is transmitting, the Goertzel output for f0, f1 and f3 is near zero. Only fails when the window is badly misaligned or external noise sits exactly on one of the four frequencies.

Packet framing

Each packet has a sync word, a small header, variable-length payload and a CRC. Sync (0xC33C) is found by a 16-bit sliding window that shifts in one dibit at a time -- no byte alignment needed, so the receiver locks on regardless of where in the stream it starts listening.

Every packet is sent twice, separated by 10-symbol gaps (silence, encoded as f0). The receiver tracks the last accepted sequence number and drops duplicates. Why the redundancy: raw demodulation has a residual error of about 1 phantom symbol per 2000. Rare per packet, but over 500+ packets a few will fail CRC. Sending twice makes it very unlikely that both copies of the same packet are bad.

The gaps also prevent cascade failures. If a demod error corrupts the length field, the receiver tries to read more symbols than the packet actually contains. Without a gap, that overread eats the sync word of the next packet, and now two packets are lost instead of one. The f0 gap absorbs the overread -- the receiver hits silence instead of the next sync -- and recovery is clean.

WAV structure. Receiver state shown below each phase.

Field	Bytes	Notes
sync	2	0xC33C -- sliding window detection
type	1	HEADER (0x01), DATA (0x02), END (0x03)
seq	2	packet sequence number
len	1	payload length (max 252)
payload	0-252	scrambled application data
crc	2	CRC16-CCITT over header + scrambled payload

XOR scrambling

XOR is the simplest reversible operation in computing. Take a data byte and a random-looking byte, XOR them together, and the result looks random. XOR again with the same byte, and you get the original data back. No keys, no complex state -- just a bitwise toggle that undoes itself.

XOR scrambling is self-inverse: same sequence applied twice recovers the original.

Why bother? Look at the "data" row: 00 00 00 20 08 00. That's the start of a typical ARM vector table -- nearly all zeros. Encoded as FSK, those zeros become a long run of the same tone (f0), with a rare blip of a different tone in between. The problem: an isolated tone surrounded by f0 produces a weak Goertzel peak that's easy to miss if the analysis window isn't perfectly centered.

After XOR with the pseudo-random sequence: E1 AC 77 28 6A FB. Looks random. Encoded as FSK, the four tones are roughly evenly distributed -- no more pathological runs, and the Goertzel has a clean signal on every symbol.

KARON generates the random-looking sequence with a 16-bit Linear Feedback Shift Register (LFSR) -- essentially a tiny state machine that produces a long pseudo-random bit stream from a seed value. The seed resets at the start of every packet, so the transmitter and receiver always generate the same sequence. The LFSR produces one byte at a time by shifting bits and XORing a few of them together -- cheap to compute, completely deterministic, and the same function works in both directions.

One detail that matters: the CRC is computed over the scrambled bytes, not the original data. If a transmission error flips a bit, the CRC catches it before the descrambler ever runs. Corrupted packets are dropped right at the CRC check, and the redundant copy takes over.

Clock drift and calibration

The transmitter (a PC, phone, or audio player) and the receiver (the MCU's internal oscillator) never run at exactly the same speed. The MCU thinks it's sampling at the nominal rate, but the actual rate might be off by half a percent or more. That difference is tiny per sample, but it adds up. After a few thousand symbols, the Goertzel window has drifted so far that it's looking at the wrong part of the signal.

Imagine reading a book where every page is shifted a tiny bit to the right. By page 10 you're reading half of one page and half of the next. That's what clock drift does to symbol boundaries.

Exaggerated: the MCU's analysis windows gradually shift relative to the actual symbol boundaries.

The WAV opens with 200 consecutive symbols of the same tone. The receiver measures how many samples that run actually took -- comparing the expected count (200 symbols times the nominal samples per symbol) against what it observed. The difference is the total drift over the calibration window. Divide by 200 and you have the drift per symbol.

Finding the exact boundaries of the calibration run is tricky because the Direct Memory Access (DMA) buffer doesn't line up with symbol edges. KARON uses a sliding mini-Goertzel ($N = 120$ samples) that scans across two adjacent buffers in 2-sample steps, comparing tone power at each position. Where the power flips from "calibration tone" to "not calibration tone" is the boundary. All positions are tracked in fixed-point (scaled by $2^8 = 256$) to keep sub-sample precision without floating point.

Two values come out: the drift rate (how far off each symbol is, used to continuously adjust the window position during the rest of the transfer) and the phase offset (how far the window is from the true symbol boundary right now, corrected once immediately after calibration).

Bresenham remainder tracking

The drift rate comes from dividing the total measured drift by the number of calibration symbols. Integer division. And integer division throws away the remainder.

Say the total drift over 200 symbols is 102 (in fixed-point units). $\lfloor 102 / 200 \rfloor = 0$, remainder $102$. The integer result says "zero drift per symbol" -- which is wrong. That remainder is real error, and it accumulates: by the end of a long transfer with thousands of symbols, the uncorrected truncation eats a significant chunk of the guard band. All from a rounding error.

Without correction, truncation error grows linearly (red). Bresenham keeps it bounded (green).

Bresenham solved the same problem in 1962 for drawing straight lines on a pixel grid: how do you distribute a fractional step evenly across many integer steps? His answer was an accumulator. Each step, you add the remainder to a running total. When the total exceeds the divisor, you subtract the divisor and apply one extra unit of correction. The result is a sawtooth that never exceeds one unit of error.

In KARON: each symbol, add the remainder (102) to an accumulator. When it hits 200, subtract 200 and nudge the window position by one extra unit. The correction distributes evenly across the stream instead of piling up at the end. Total error stays below one sample over the entire transfer. Cost: two integer additions and one comparison per symbol -- essentially nothing.

Safe flash writes

On most microcontrollers, the CPU fetches instructions from flash over the same internal bus that the flash controller uses for write and erase operations. While a write is in progress, the bus is busy -- the CPU can't fetch its next instruction and stalls until the operation completes. That stall can be several milliseconds. For a real-time demodulator processing audio samples on a tight deadline, a multi-millisecond pause in the middle of a symbol means missed data.

KARON avoids this by only writing during the inter-packet gaps. Three consecutive f0 symbols means "inside a gap, not a data packet" -- time to flush pending data to flash. The gaps are 10 symbols long, the actual write takes a fraction of that. During active demod, received data just sits in a RAM buffer.

The write routine and the timer interrupt handler are placed in a RAM-executable section of the linker script. Code in RAM doesn't need the flash bus, so it keeps running even while a flash operation is in progress.

Bootloader to application handoff

The target MCU (Cortex-M0) has no way to relocate its interrupt vector table in hardware. KARON works around this by copying the application's vector table into RAM and then remapping RAM to address zero. After the remap, the CPU fetches interrupt vectors from RAM (the app's table) instead of flash (the bootloader's table).

Before jumping to the application, the bootloader puts the core into a clean state: stop the system timer, clear any pending timer interrupt, disable interrupts globally, remap the vector table. The application then re-initializes its own clocks and peripherals, re-enables interrupts and confirms itself as stable.

Getting the order wrong is painful. If the system timer is still running when the bootloader disables interrupts, a pending tick fires the instant the app turns interrupts back on -- but the app's handler isn't set up yet, so the CPU falls into a default infinite loop. Looks like a hardware lockup. The debugger shows a valid program counter, everything seems fine, until you check the interrupt status register and find the core stuck in an exception. Took longer to find than to fix.

Why this exists

There are faster ways to update firmware. A simple serial connection (UART -- Universal Asynchronous Receiver-Transmitter) would be hundreds of times faster. USB or a debug probe would be orders of magnitude beyond that. If raw transfer speed is the goal, 4-FSK over audio is a terrible choice.

KARON exists because the goal was never speed. It was: can I build a complete, production-grade bootloader -- A/B slots, rollback, CRC verification, clock compensation, error recovery -- that works over the single worst transport layer I could think of? Audio through an uncalibrated analog front-end, with no handshake, no backchannel, no shared clock and no guarantee that the receiving hardware was designed for this at all.

Every hard problem in the project came from that constraint. Clock drift wouldn't matter over UART. Guard bands wouldn't exist if the clocks were synchronized. Scrambling wouldn't be needed if the bitrate were high enough that pattern sensitivity didn't matter. The Bresenham accumulator only exists because integer division has a remainder and the transport is slow enough for that remainder to accumulate into a real problem. Every section on this page is a consequence of choosing the hardest possible path on purpose.

It's a learning project. A deep one. Goertzel filters, fixed-point arithmetic, DMA double buffering, flash wear leveling, power-safe state machines, clock recovery, Bresenham accumulators, post-mortem debugging, linker scripts for RAM-executable code -- all in 16 KB on a chip with no Floating-Point Unit (FPU) and no barrel shifter. The kind of project where you learn more from the problems than from the solutions.

And honestly -- hearing your firmware come out of a speaker is just cool.