Hunting down a phantom -- Oscaria Audio

Debugging phantom symbol insertions in a 4-FSK audio bootloader on STM32. Root cause analysis across 9 sessions, 15+ eliminated hypotheses and a 3-line fix. A deep dive into Goertzel filters, clock drift, DMA timing and the kind of bugs that only appear at scale.

KARON is an A/B bootloader for STM32F030RC that receives firmware updates through audio. A bootloader is a small program that runs before the main firmware. Its job is to decide whether to start the existing firmware or accept a new one. KARON does this over audio: encode firmware as a 4-FSK (Frequency-Shift Keying -- a method of encoding data as different audio tones) modulated WAV file, play it into the module's Analog-to-Digital Converter (ADC), demodulate on-chip, write the new image to flash memory. No UART (Universal Asynchronous Receiver-Transmitter -- a serial connection), no USB, no programmer. Just audio through a 3.5 mm cable.

The system runs on a Cortex-M0 with 256 KB flash and 32 KB RAM. No Floating-Point Unit (FPU) -- every calculation uses integer math. No hardware Vector Table Offset Register (VTOR) -- the chip cannot relocate its interrupt table, so KARON copies it into RAM manually. No external crystal -- the internal oscillator (HSI) runs at 8 MHz +/-1%, multiplied via a Phase-Locked Loop (PLL) to 48 MHz. That +/-1% tolerance means the chip's idea of "48,000 samples per second" can differ from the PC's actual sample rate by several hundred samples per second. This mismatch turns out to be the entire story.

Four tones (2400, 3200, 4000, 4800 Hz) carry two bits each. At 200 symbols per second the raw data rate is 400 bits per second -- slow enough that a full firmware image takes a few minutes. Each packet is protected by a Cyclic Redundancy Check (CRC16 -- a checksum that detects corrupted data). The WAV file starts with a preamble, followed by a calibration tone, then the actual data packets separated by short gaps. For the full protocol, see the KARON lab notes.

Everything worked -- until it did not.

During testing, the bootloader would frequently fail to receive all packets. CRC errors appeared seemingly at random. Sometimes an update would succeed, sometimes not. Even short test images (a few hundred bytes, producing WAV files of roughly 30 seconds) had problems. Failure rate was unpredictable -- the same WAV file could succeed on one attempt and fail on the next.

Think of it like a fax machine that sometimes garbles page 3 of a 5-page document, but works fine if you send it again. Same document, same machine, different result.

Initial assumption was noise. Various band-aids were applied: packet repetitions (send each DATA packet twice so the receiver can pick whichever copy arrives intact), inter-packet gaps (10 silence symbols between packets), data whitening (an LFSR scrambler -- a Linear Feedback Shift Register that XORs the data with a pseudo-random sequence to break up repetitive byte patterns) and a confidence filter in the tone detector. None solved it. The CRC failures persisted.

The cascade effect: Without inter-packet gaps, a single phantom could corrupt the length field, causing the assembler to over-read through the next sync words. Evidence: sync[9]->sync[10] = 3120 dibits = 3 x 1040 -- three entire packets consumed by one phantom. The fix -- 10 silence symbols between packets (50 ms) -- absorbs the over-read so the next sync word stays intact.

KARON detects tones using a Goertzel filter -- an algorithm that measures how much energy a signal contains at one specific frequency. Run it four times (once per tone) and pick the strongest. The result is one of four symbols, each carrying two bits.

The first fix seemed obvious: if the Goertzel cannot clearly distinguish which tone is present, reject the symbol as noise rather than guessing. After computing all four bins, compare the best power to the second-best. If the ratio is less than 2x, return NO_TONE.

Result with GOERTZEL_CONFIDENCE_RATIO=2: zero DATA packets received. Complete failure.

The post-mortem buffer told the story. In the DATA region, 15-70% of symbols were being rejected as NO_TONE. A single DATA packet requires 520 consecutive valid symbols to assemble correctly. The probability of getting 520 in a row at a 20% rejection rate: 0.80⁵²⁰ ~ 10^-50. That is a number with 50 zeros after the decimal point. Statistically impossible.

The symbols that looked "uncertain" were at tone boundaries where the analysis window straddled two different tones. The Goertzel was doing its job -- reporting that two frequencies had similar energy -- because the window genuinely contained both. Rejecting them did not help. It destroyed everything. Confidence gating was permanently shelved.

SEGGER J-Link RTT (Real-Time Transfer) seemed like the right tool -- it prints debug messages over the chip's debug wire (Serial Wire Debug, or SWD) without needing a serial port. Minimal overhead, supposedly. Except RTT streaming requires constant traffic on the SWD bus, which shares the chip's internal data highway (the AHB -- Advanced High-performance Bus) with the Direct Memory Access (DMA) controller. DMA is the hardware that shuttles audio samples from the ADC into RAM without bothering the CPU. On the STM32F030, the SWD debug interface has hardwired priority over all other bus masters including DMA.

Imagine a two-lane road where the debug probe has a permanent blue-light siren. Every time it wants to read or write, DMA has to pull over and wait. The audio samples still arrive at the ADC -- they just do not make it into RAM on time.

RTT captures showed a 20.9% noise rate -- but the noise organized itself into 15 distinct bursts of roughly 10 consecutive NO_TONE symbols, perfectly correlated with 2048-byte RTT buffer fill events. Between bursts: 0.46% noise. The DMA priority register was irrelevant -- DMA_CCR priority bits only arbitrate between DMA channels, not between DMA and the debug port. SWD always wins.

The debugging tool was creating the problem it was supposed to diagnose. Without J-Link connected: no SWD traffic, no bus contention, no problem. But you cannot debug that way if your only diagnostic tool creates its own artifacts.

Solution: a RAM-based post-mortem debug buffer (roughly 16.5 KB, enabled via a compile-time flag POST_MORTEM_DEBUG). Record everything during reception -- symbols, Goertzel powers, diagnostic events -- then read it all out via SWD once after the transfer completes. Zero bus traffic during audio. The host finds the buffer by scanning RAM for a magic number (0xDB600001), the same technique J-Link uses to locate its own RTT control block.

0x0000 Header + magic (0xDB600001) 32 B

0x0020 Symbol stream (linear, ~60 s @ 200 sym/s) 16 000 B

0x3E80 Power ring (512 entries x 4 x uint16, rolling) 4 096 B

0x4E80 Event log (128 x 4 B entries) 512 B

~ 16.5 KB total -- half of the 32 KB RAM. All hooks compile to no-ops without POST_MORTEM_DEBUG.

A Python script does the heavy lifting: symbol histograms, inter-sync interval analysis, power ratio plots, automated WAV comparison.

With clean post-mortem data, a clear pattern emerged. The receiver counts how many symbols arrive between two consecutive sync words (a known bit pattern that marks the start of each packet). That count was either exactly 1050 or exactly 1051:

Interval	Symbols	CRC
sync[8] -> [9]	1050	CRC_OK
sync[9] -> [10]	1051	CRC_FAIL	<- phantom
sync[10] -> [11]	1050	CRC_OK
sync[11] -> [12]	1051	CRC_FAIL	<- phantom

1050 is correct. Here's the breakdown:

8

Sync

16

Header

1008

Payload

8

CRC16

10

Gap (f0)

8 + 16 + 1008 + 8 + 10 = 1050 symbols per DATA interval (sync to sync).

Every +1 phantom meant one extra symbol inserted somewhere in the stream. Each symbol carries two bits. The receiver assembles four consecutive symbols (8 bits) into one byte. If an extra symbol appears in the middle of a packet, every byte after that point reads two bits from the correct byte plus two bits from the next one -- like reading a book where someone inserted a single extra letter on page 5. Every word after that point is shifted and unreadable. Even a single phantom anywhere in 1050 symbols guarantees a CRC failure.

The phantom was not noise. The symbol buffer showed noise=0 for every DATA interval -- every demodulated symbol was a valid tone (f0-f3). Something was inserting a real-looking symbol that did not exist in the WAV file.

The alternating pattern

With dual packet repetitions, the pattern was perfectly alternating:

Copy	Seq	Symbols	CRC
copy 1	seq=0	1050	CRC_OK
copy 2	seq=0	1051	CRC_FAIL	<- phantom
copy 1	seq=1	1050	CRC_OK
copy 2	seq=1	1051	CRC_FAIL	<- phantom

Every copy 1 had 1050 symbols. Every copy 2 had 1051. Over 14 consecutive packets, the probability of this being random at a 1/3900 phantom rate: approximately 10^-45. Not random.

Flash-write timing

The alternating pattern suggested timing. Each DATA packet is sent twice. When copy 1 arrives with a valid CRC, the firmware processes it and queues a flash write. When copy 2 arrives (CRC failed, phantom inside), nothing happens. Flash programming stalls the CPU -- during a write, the CPU cannot fetch its own instructions from flash because the flash controller is busy. DMA keeps filling the audio buffer in the background, so when the CPU resumes, it finds roughly one extra symbol's worth of unprocessed samples waiting.

Test: disabled flash writing entirely. Result: much worse -- only 1/14 packets received instead of 12/14. With flash writing disabled, a safety guard rejects every subsequent DATA packet after the first because the previous data was never written out. But even the packets before the guard kicked in still had phantoms. Flash writing was not the cause.

DMA race conditions

Four dedicated diagnostic counters checked hardware registers on every audio processing call. Each counter targets a specific failure mode: did the ADC produce samples faster than DMA could transfer them (overrun)? Did DMA encounter a bus error (transfer error)? Did the software accidentally process the same buffer region twice (half-buffer guard)? How many unprocessed samples piled up at peak load?

0

GUARD

0

OVR

0

TEIF

625

Max Avail

Zero hardware anomalies. Peak lag of 625 samples means 1423 samples of margin -- nowhere near a wrap-around or torn-read scenario. All four hardware hypotheses eliminated at once.

Deferred ProcessPacket -- breaking the alternation

The perfectly alternating pattern was the strongest clue. What was different between processing a packet with valid CRC and one with failed CRC?

On CRC OK: the firmware runs the full packet handler -- copies 252 bytes to a flash buffer, updates the state machine, increments counters. Estimated time: 100-200 us (microseconds). On CRC FAIL: increments an error counter and does nothing else. Time: roughly 1 us.

That 200 us gap matters. During those 200 us the audio processing loop does not run. DMA keeps filling the buffer with fresh samples. When the loop resumes, it finds a slightly different alignment between the analysis window and the incoming signal.

Fix: move the packet handler out of the CRC check path entirely. On CRC OK, just copy the packet into a holding buffer and set a flag. The actual processing happens later in the main loop, outside the time-critical audio path. Now both code paths take the same time inside the sample processing loop.

Session 17 -- before

1050, 1051, 1050, 1051, 1050, 1051
Strict period-2 alternation

Session 18 -- after

1050, 1051, 1051, 1050, 1051, 1051
Alternation broken, rate unchanged

The perfect alternation broke -- proving that ProcessPacket timing had determined which copy got the phantom. But the phantom rate stayed the same: 4/6 packets had phantoms (vs. 3/6 before). ProcessPacket timing was a correlate, not a cause.

The flash-EMI hypothesis

This one gets its own section because it was the most convincing hypothesis and took the most work to kill.

The theory: when the chip writes data to its internal flash memory, it needs a high voltage. A small charge pump circuit inside the chip generates that voltage, drawing sharp current spikes from the power supply. Those spikes create Electromagnetic Interference (EMI) -- electrical noise that can leak into the analog input. If the noise hits during a Goertzel analysis window, the detector might see tone-like energy where there is only silence, producing a phantom symbol.

The physics checked out. The STM32F030's flash programming draws 10-50 mA current spikes from the internal supply rail. The STM32F030 does have a separate analog supply pin (VDDA), but on the test hardware it was tied directly to VDD without dedicated filtering -- the board was never designed for precision analog work. A 50 mA charge pump transient on VDD would couple straight through to the ADC reference, causing 10-20 mV of supply bounce. On a 12-bit ADC with a 3.3V reference that translates to 1-2 LSB (Least Significant Bits -- the smallest unit the ADC can resolve) of error injected into every conversion during the flash write.

Diagnostic: gap-gated flash writing -- only program flash when the receiver is in an inter-packet gap (3 or more consecutive silence symbols), never during active tone reception. Plus: log the global symbol counter at flash write start to correlate phantom positions with flash write timing.

Interval	Sym	CRC	Flash?
sync[8] -> [9]	1050	CRC_OK	FLUSH @3277
sync[9] -> [10]	1051	CRC_FAIL	--	<- phantom, no flash
sync[10] -> [11]	1051	CRC_FAIL	--	<- phantom, no flash
sync[11] -> [12]	1050	CRC_OK	FLUSH @6428
sync[12] -> [13]	1051	CRC_FAIL	--	<- phantom, no flash
sync[13] -> [14]	1051	CRC_FAIL	--	<- phantom, no flash

The smoking gun that killed the EMI theory: The HEADER region contained a phantom at sync[2]->sync[3] (73 symbols instead of 72). HEADERs do not trigger any flash write -- they are stored in RAM only. No charge pump, no current spike, no EMI. If flash-EMI caused the phantoms, HEADER packets should be immune. They were not.

Insert/skip analysis showed no correlation (18 and 19 inserts at both 1050 and 1051 intervals). Bresenham simulation across all 200 remainder values could not produce the observed pattern. Python Goertzel demodulation of the WAV confirmed all six DATA intervals had exactly 1050 symbols, copies bitwise identical.

Gap-gated flash writing was kept anyway -- there is no reason to program flash during active tone reception when there is a 50 ms gap sitting right there.

The breakthrough came from a different angle: simulating the firmware's packet assembly logic in Python using the real post-mortem data. A script loaded the raw symbol stream from the debug dump and fed each two-bit symbol through an exact replica of the firmware's state machine -- same sync detection, same byte assembly, same state transitions.

Simulated (Python)

1050, 1051, 1051, 1050, 1051, 1051

Firmware (pm.bin)

1050, 1051, 1051, 1050, 1051, 1051

OK PERFECT MATCH

The simulation reproduced the exact phantom pattern. That meant the symbols in the debug dump already contained the phantoms -- the problem was in the Goertzel detector itself, not in the packet assembly logic.

Two phantom mechanisms

Detailed analysis of the bit-level trace at packet boundaries revealed two distinct mechanisms:

Mechanism 1 -- length-byte corruption: Each packet contains a byte that tells the receiver how long the payload is. At symbol 4330, the Goertzel misdetected one two-bit symbol in the length field, turning 0xFC (decimal 252) into 0xFF (decimal 255). The firmware then read 3 extra bytes (12 symbols) from the gap region before concluding the packet -- consuming part of the silence that was supposed to separate it from the next packet.

Mechanism 2 -- extra symbol at tone transition: At the end of a DATA packet, the audio changes from scrambled data tones back to f0 (silence/gap). If the Goertzel analysis window straddles this transition -- half of it seeing the last data tone, half seeing f0 -- the mixed signal can produce an extra f0 detection. A symbol that does not exist in the WAV file.

The shared root cause

Both mechanisms trace to the same underlying problem: Goertzel window phase drift.

The STM32's internal oscillator and the PC's audio clock are never perfectly aligned. The chip thinks it is sampling at exactly 48,000 Hz, but the real rate might be 48,028 Hz or 47,970 Hz. The difference is small -- a few hundred parts per million -- but it accumulates. Over a 1050-symbol DATA packet, the analysis window drifts by roughly 18 samples relative to the true symbol boundaries.

Think of two people reading the same sheet music at very slightly different tempos. At first they are in sync. After a few hundred bars, one is a fraction of a beat ahead of the other. The Bresenham drift compensation tries to correct this with occasional skip/insert adjustments (like one player occasionally holding a note slightly longer), but it cannot prevent the analysis window from gradually sliding. At tone transitions -- where one tone ends and another begins -- the window ends up straddling both, producing a mixed reading.

Goertzel analysis window at start vs. end of a 1050-symbol DATA packet. 18-sample drift causes the window to straddle two tones at boundaries.

plen clamp (Mechanism 1)

Clamp the payload length to MAX_PAYLOAD_SIZE. Even if the Goertzel misdetects the length byte (252 -> 255), the firmware will not over-read into the gap. Four lines, zero overhead.

The N=200 disaster -- spectral leakage

First attempt at guard bands: shrink the analysis window from 240 samples to 200, leaving 20 guard samples on each side (just barely above the 18-sample drift).

Total failure. Every single DATA packet failed CRC. Power ratios dropped from the normal 10-100x range to 1.0-1.2x. The Goertzel was guessing randomly.

Root cause: fractional bin alignment. The Goertzel algorithm is tuned to a specific frequency by choosing a bin index k = frequency x N / sample_rate. For clean detection, k must be an exact integer. If k lands between two integers, the algorithm's energy "leaks" across all bins -- like trying to tune a radio to a frequency between two stations, hearing both at once. This is called spectral leakage. k = freq x N / fs. For this to work, k must be an exact integer -- otherwise power leaks across all bins (spectral leakage).

Tone	Freq	N=240 (original)	N=200 (attempted)
f0	2400 Hz	k = 12.0 OK	k = 10.0 OK
f1	3200 Hz	k = 16.0 OK	k = 13.33 X
f2	4000 Hz	k = 20.0 OK	k = 16.67 X
f3	4800 Hz	k = 24.0 OK	k = 20.0 OK

f1 and f2 land between integer bins. Energy leaks into all four bins roughly equally, argmax picks at random. Post-mortem confirmed it: power dumps showed f0 and f3 with nearly equal power (ratio 1.0x).

N=180 -- exact bin alignment

The constraint: N must yield exact integer k-values for all four frequencies. k = freq x N / fs is integer for all four tones when N is a multiple of fs / GCD(f0, f1, f2, f3). GCD(2400, 3200, 4000, 4800) = 800. fs / 800 = 48000 / 800 = 60. N must be a multiple of 60. Valid choices: 240 (original), 180, 120, 60.

Tone	Freq	N=180	Q7 coeff
f0	2400 Hz	k = 9 OK	243
f1	3200 Hz	k = 12 OK	234
f2	4000 Hz	k = 15 OK	222
f3	4800 Hz	k = 18 OK	207

Guard band layout: 30-sample guards absorb up to +/-30 samples of clock drift. Typical accumulated drift is ~18 samples.

The Goertzel coefficients stay identical -- 2xcos(2pixfreq/fs) depends on freq/fs, not N, when k/N is constant (9/180 = 12/240 = 1/20). Q7 values: [243, 234, 222, 207].

Trade-off: ~1.2 dB less SNR from the shorter window. With typical on-frequency power ratios of 10-100x, this is negligible.

Implementation

Three files, three changes: GOERTZEL_WINDOW=180 and GOERTZEL_GUARD=30 in goertzel.h, the Goertzel loop bound changed from SYMBOL_SAMPLES to GOERTZEL_WINDOW in goertzel.c and the Goertzel input pointer offset by GOERTZEL_GUARD samples in transport_audio.c.

The Goertzel algorithm needs to multiply coefficients by running state values on every sample. The original code used Q14 fixed-point coefficients -- numbers stored as integers but treated as fractions with 14 bits after the decimal point. High precision, but on the Cortex-M0 (a chip with no hardware support for large multiplies) every multiplication became a call to a library function that does 64-bit math in software. Roughly 35 clock cycles per call, roughly 200 bytes of code, and it runs in the hot path: 4 tones x 180 samples = 720 multiplies per symbol.

Q14 + int64

5.8 ms / sym

Q7 + int32

0.5 ms / sym

Budget

5.0 ms

Switching to Q7 (7 bits after the decimal point, coefficients scaled by 128 instead of 16384) keeps every multiplication within the range of a single 32-bit integer. One hardware MUL instruction, 1 cycle. The trade-off is precision -- but the error compared to Q14 is less than 0.2%, far below what matters when typical power ratios are 10-100x. This was not directly related to the phantom bug, but the headroom it freed up made the guard band approach practical. At 5.8 ms per symbol the time budget was already blown; at 0.5 ms there is room to breathe.

Getting clock drift calibration right was its own multi-session adventure. The WAV contains 200 consecutive f3 symbols after the preamble. The firmware measures the exact sample count between f0->f3 and f3->f0 transitions using a cross-buffer boundary sweep.

Attempt 1 -- f1 instead of f3: Using f1 (3200 Hz) for calibration. The f0<->f1 spacing is only 800 Hz = 1 bin at the sweep window's resolution. Could not distinguish f0 from f1 -> random boundary positions -> 74.8% match rate. Fix: switch to f3 (4800 Hz), where f0<->f3 = 2400 Hz = 6 bins.

Attempt 2 -- single-buffer dead zone: If the tone transition falls within 60 samples of a buffer edge, the sweep cannot find it-- start_x256=0 in ~50% of phase alignments. Fix: cross-buffer sweep operating on two adjacent 240-sample buffers (480 virtual samples).

Attempt 3 -- wrong sign: drift = expected - measured instead of measured - expected. Negative drift -> insert instead of skip -> made things worse. At adaptive drift=-11, only 2/6 DATA packets succeeded; at drift=0, 19/22 succeeded.

Attempt 4 -- phase offset ignored: Calibration measured cal_start_x256 = 9340 (36.5 samples offset) but never applied a phase correction. The Goertzel window was permanently 15% off from true symbol boundaries. The mystery of why a hardcoded drift=+36 worked perfectly: the frequent skips (every 7 symbols) accidentally compensated the 36.5-sample phase offset. It was not drift correction -- it was unintentional phase correction.

Attempt 5 -- near-miss early-exit: start_x256=-30647, only 73 x256 units (0.29 samples) above the threshold. Power interpolation barely missed the boundary. Fix: threshold with 4-sample margin.

Attempt 6 -- integer division truncation: drift_total / CAL_EXPECTED loses the remainder. With drift_total=2702 and CAL_EXPECTED=200: quotient=13, remainder=102 -> 0.51 x256/symbol error -> 12 samples at stream end. Fix: Bresenham remainder tracking -- error now <=0.7 samples over the entire stream.

With N=180 guard bands active, short test images (the same ones that previously failed unpredictably) transfer reliably. Longer WAV files of 2-3 minutes also succeed. Phantom symbols are eliminated. Every packet interval shows the correct 1050-symbol count.

The guard band makes the system tolerant of clock drift up to +/-30 samples -- well beyond the typical 18 samples accumulated over a full packet.

15+ distinct hypotheses tested and eliminated over 9 sessions before the root cause was found:

ADC noise / electrical interference: noise=0 in DATA region -- all phantoms are valid tones

Confidence gating: RATIO=2 -> zero reception (10^-50) -- catastrophically destructive

RTT bus contention: removed RTT -> noise 20.9% -> 0.46% -- observation artifact only

DMA priority: set to Very High -- only arbitrates between DMA channels

NDTR torn read: double-read in ProcessADC -- no change

Flash write stall: disabled FlushWrite -- worse (1/14 received)

Flash EMI (charge pump): gap-gated FlushWrite -- HEADER phantom without flash

Flash page erase: checked vs page boundaries -- no clustering

DMA half-buffer guard: GUARD=0 -- never triggered

ADC overrun: OVR=0 -- never overran

DMA transfer error: TEIF=0 -- no errors

ProcessPacket timing: deferred -- pattern changes, rate unchanged

Drift insert/skip: same counts at both 1050 and 1051 intervals

Bresenham truncation: simulated all 200 remainders -- cannot produce pattern

WAV generator bug: Python Goertzel demod -- all 1050, copies identical

Sync state machine: 16-bit sliding window -- same phantoms

Sync phase correction: boundary sweep at SYNC_FOUND -- unstable, sometimes worse

Observation bias is real. The J-Link RTT streaming introduced 20.9% noise through AHB bus contention. The measurement tool was the disease.

Statistical impossibility is a signal. When a pattern has a 10^-45 probability of occurring randomly, it is not random. The alternating 1050/1051 pattern pointed directly at something deterministic in software.

The obvious fix can backfire. Confidence gating sounds like good engineering -- reject uncertain measurements. But when the "uncertain" measurements are correct readings of boundary-straddling windows, rejecting them destroys reception entirely.

Spectral constraints are unforgiving. N=200 seemed reasonable (20-sample guard) but puts two of four frequencies on fractional Goertzel bins. The resulting spectral leakage turned a working detector into a random number generator. Integer constraints on N are non-negotiable.

Work from the data, not the hypothesis. Over the course of the investigation, 15+ hypotheses were tested and eliminated: flash EMI, DMA race conditions, ProcessPacket timing, torn reads, WAV file errors, sync state machine bugs, clock drift miscalculation. Each hypothesis had a plausible mechanism. The data eliminated all of them until only the true root cause remained.

The cheapest fix is often the best. Three lines of code changing the Goertzel window from 240 to 180 samples. It works because it makes the system tolerant of drift, instead of trying to eliminate it (impossible with independent clocks) or correct it per-symbol (complex, fragile).

Hunting down a phantom

The setup

The symptom: sporadic CRC failures

Phase 1: confidence gating -- the fix that made everything worse

Phase 2: the debugging tool was the problem

Phase 3: the 1050/1051 pattern

The alternating pattern

Phase 4: systematic elimination

Flash-write timing

DMA race conditions

Deferred ProcessPacket -- breaking the alternation

The flash-EMI hypothesis

Phase 5: root cause -- Goertzel window phase drift

Two phantom mechanisms

The shared root cause

Phase 6: the fix

plen clamp (Mechanism 1)

The N=200 disaster -- spectral leakage

N=180 -- exact bin alignment

Implementation

Sidebar: the Q7 detour

Sidebar: the calibration saga

Results

Eliminated hypotheses

Takeaways