Section 1
Bit-Plane Recursion — Decay Profile
Recursive bit-plane decomposition: each layer operates on flag positions from the previous. Halts when flags = 0.
Layer-by-layer breakdown
Odd ratio per layer
Layer size cascade
Structural Complexity Map
opt-in — runs chunked analysis across the file
Splits the data into fixed-size chunks and runs per-chunk structural analysis. Reveals where structure lives inside heterogeneous files — different regions of a GGUF model, video container, or mixed binary will show distinct fingerprint classes. For homogeneous files the map is uniform.
Cost: ~16–25 analysis passes on the loaded data. Takes 1–3 seconds.
Cost: ~16–25 analysis passes on the loaded data. Takes 1–3 seconds.
Section 1a — Encoder feature
Stride Detection
Conditional entropy H(Xi | Xi−k) for k ∈ {1,2,3,4,6,8,12,16}. The encoder selects the k with lowest conditional entropy before each remap.
What this measures
The encoder measures inter-symbol periodicity at lag k. If knowing Xi−k significantly reduces uncertainty about Xi, the data has a stride-k structure the encoder can exploit. Stride detection is independent of bit-plane structure — a file can score well on bit-plane metrics but also have strong stride correlation.
Stride confidence = (H(k=1) − H(k_winner)) / H(k=1). Near 0% means no useful stride structure. 20%+ means the encoder will find meaningful stride correlation.
Note: Analysis is performed on up to 200,000 bytes (the file analysis cap). The encoder applies its own 1 MB cap internally; for files larger than 200 KB, stride results may not reflect full-file periodicity.
Stride confidence = (H(k=1) − H(k_winner)) / H(k=1). Near 0% means no useful stride structure. 20%+ means the encoder will find meaningful stride correlation.
Note: Analysis is performed on up to 200,000 bytes (the file analysis cap). The encoder applies its own 1 MB cap internally; for files larger than 200 KB, stride results may not reflect full-file periodicity.
Conditional entropy by stride k — lower = more periodicity
Stride table
Section 1b
Bit Distribution
Fraction of values with each bit set, per layer. Bit 0 = LSB. Uniform = ~0.5 for random data.
Reading this view
The swept bit starts near its natural set-frequency. After the L0 bit-clearing pass, it collapses to 0.0 in the aligned layer — that's the structural claim. The Bit column in the Decay Profile shows which bit was targeted at each layer.
Section 4
Structure Probe — mod-N Alignment
Fraction of values already aligned to mod-N, before and after a single even-alignment pass.
What this measures
Even-alignment increases mod-N alignment because rounding odd values down promotes divisibility. This quantifies how much latent mod-N structure is unlocked for free by a single bit-plane pass.
mod-N structure: raw vs post-even-alignment
Spec §2.1 — The central open question
Entropy Analysis
Does AIM bit-plane recursion genuinely reduce total entropy, or does it relocate it? Shannon entropy in vs. out across all streams.
How this works
Raw data has H bits/byte of Shannon entropy. After bit-plane decomposition, the aligned stream and each flag layer's bitset carry their own entropy. If
Flag entropy model: each flag layer is treated as a bitset of N bits with K set. Entropy = N × H(K/N) where H is the binary entropy function. This measures the information content of knowing which positions were flagged.
This page's entropy model uses fixed bit-0 (LSB clearing) as a controlled target to isolate its entropy contribution — one of eight possible targets. The Decay Profile and encoder use adaptive bit selection (sparsest bit per layer), which is why halt depths there may differ. The table below extends this fixed-bit-0 analysis to all 8 positions for comparison.
H(aligned) + H(flags) < H(raw), entropy was genuinely reduced — the data had structure the bit-plane pass could separate. If it equals or exceeds the input, the decomposition relocated entropy without reducing it. Both are real findings.
Flag entropy model: each flag layer is treated as a bitset of N bits with K set. Entropy = N × H(K/N) where H is the binary entropy function. This measures the information content of knowing which positions were flagged.
This page's entropy model uses fixed bit-0 (LSB clearing) as a controlled target to isolate its entropy contribution — one of eight possible targets. The Decay Profile and encoder use adaptive bit selection (sparsest bit per layer), which is why halt depths there may differ. The table below extends this fixed-bit-0 analysis to all 8 positions for comparison.
All-bit AIM target comparison — entropy outcome (lower output = better)
Sweep winner = fewest total flags · Entropy winner = lowest total output entropy
Bit 0 / LSB — stream entropy breakdown
Bit 0 / LSB — per-layer flag entropy
Spec §2.1b — Mechanistic explanation
Per-Bit Entropy Profile
Binary entropy for each bit position 0–7, before and after the L0 bit-plane pass. Reveals whether LSB was genuinely carrying less information than higher bits.
The core hypothesis
In structured data, the LSB (bit 0) carries disproportionately less entropy than higher-order bits because value clustering means the LSB is quantization noise on top of a smoother signal. The bit-plane pass strips that noise and encodes it separately. For uniform random data, every bit carries identical entropy (~1.0 bit) — the decomposition can only relocate, not reduce. The asymmetry in this profile is the mechanistic explanation of AIM's structural claim.
Reading the chart: A bit near 1.0 is carrying maximum entropy (set in ~50% of values). A bit near 0.0 is nearly deterministic — almost always set or almost always clear. Bits far from 1.0 are the cheap ones: they cost little to encode separately, and separating them exposes the smoother structure underneath.
Cross-reference: The Bit Clearing Sweep shows which bit position produces the fewest total flags. The Entropy Analysis shows which produces the lowest total output entropy. This page explains why they answer slightly different questions — fewer flags ≠ lower entropy if the aligned stream picks up entropy from the clearing operation.
Reading the chart: A bit near 1.0 is carrying maximum entropy (set in ~50% of values). A bit near 0.0 is nearly deterministic — almost always set or almost always clear. Bits far from 1.0 are the cheap ones: they cost little to encode separately, and separating them exposes the smoother structure underneath.
Cross-reference: The Bit Clearing Sweep shows which bit position produces the fewest total flags. The Entropy Analysis shows which produces the lowest total output entropy. This page explains why they answer slightly different questions — fewer flags ≠ lower entropy if the aligned stream picks up entropy from the clearing operation.
Raw data — bit entropy by position
After L0 even-alignment — aligned stream
Delta — entropy change per bit after L0
Negative = bit became cheaper to encode. Positive = bit became more expensive.
LSB analysis — the structural claim
Spec §1.2 / §2.4 — Practical compression test
Compression Benchmark
Does AIM decomposition make data more compressible under gzip? Raw gzip vs AIM+gzip across all 8 bit targets. Five analysis modes: blob, generic split, targeted split, targeted + final gzip, and predicted halt output.
Five modes, one question: does AIM decomposition make data more compressible?
Note: modes 1–4 simulate full-depth recursion. The actual encoder halts early via HALT_ANS_STRIDE for structured data — see the Predicted Halt section below and the Decay Profile page for the estimated cutoff.
Blob — aligned bytes + all flag layers concatenated, single gzip. Baseline: forces gzip to handle heterogeneous content in one pass. (Full depth.)
Generic split — each stream independently gzip'd, sizes summed. Better, but applies gzip uniformly to every layer regardless of size — tiny layers get hit with header overhead that exceeds their content. (Full depth.)
Targeted split — per-layer optimal selection: bitset-raw, delta-raw, bitset+gzip, delta+gzip, picks smallest. Skips gzip for layers below ~32 bytes. This is what the three-stream architecture delivers. (Full depth.)
Targeted + final gzip — outer gzip applied to the full targeted-encoded concatenation. (Full depth.)
Predicted halt (Mode 5) — estimated output size if HALT_ANS_STRIDE fires at the predicted depth. Uses the HALT predictor from the Decay Profile page. This is the most realistic estimate of what the actual encoder would produce for structured data.
Ratio < 1.0 = AIM wins.
Note: modes 1–4 simulate full-depth recursion. The actual encoder halts early via HALT_ANS_STRIDE for structured data — see the Predicted Halt section below and the Decay Profile page for the estimated cutoff.
Blob — aligned bytes + all flag layers concatenated, single gzip. Baseline: forces gzip to handle heterogeneous content in one pass. (Full depth.)
Generic split — each stream independently gzip'd, sizes summed. Better, but applies gzip uniformly to every layer regardless of size — tiny layers get hit with header overhead that exceeds their content. (Full depth.)
Targeted split — per-layer optimal selection: bitset-raw, delta-raw, bitset+gzip, delta+gzip, picks smallest. Skips gzip for layers below ~32 bytes. This is what the three-stream architecture delivers. (Full depth.)
Targeted + final gzip — outer gzip applied to the full targeted-encoded concatenation. (Full depth.)
Predicted halt (Mode 5) — estimated output size if HALT_ANS_STRIDE fires at the predicted depth. Uses the HALT predictor from the Decay Profile page. This is the most realistic estimate of what the actual encoder would produce for structured data.
Ratio < 1.0 = AIM wins.
All-bit results — gzip ratio vs raw (lower = better, <1.0 = AIM wins)
Single blob ratio — all bits
aligned + flags concatenated → gzip
Generic split ratio — all bits
gzip(aligned) + gzip(flags) independently, summed
Targeted split ratio — all bits ★
per-layer: bitset-raw / delta-raw / bitset+gzip / delta+gzip — picks smallest
Targeted + final gzip — all bits
outer gzip over all optimally-encoded streams concatenated — tests for residual cross-stream redundancy
Mode 5 — Predicted halt output size ★
Estimated encoder output if HALT_ANS_STRIDE fires at predicted depth
Stream size breakdown — best bit target
Reading these results
Spec: Byte-Alignment Invariant
Invariant Check
For datasets whose length is a multiple of 8, the parity of byte 0 deterministically predicts the terminal halt value.
The invariant
When dataset length is a multiple of 8, the bit-plane recursion tree is perfectly symmetric at every depth — no remainder elements to break parity propagation. This means: even byte 0 → terminal halt at 0 (complete structural collapse); odd byte 0 → terminal halt at 1 (irreducible LSB). This is deterministic, not probabilistic. It is a zero-cost integrity check: compute the prediction before reconstruction, compare after. If they disagree, reconstruction failed.
Why it matters beyond integrity: It tells you something real about the data. A dataset that predicts and delivers terminal 0 has had its arithmetic structure completely resolved — every layer found structure all the way down. Terminal 1 means an irreducible identity remains — the last LSB the transform couldn't absorb.
Why it matters beyond integrity: It tells you something real about the data. A dataset that predicts and delivers terminal 0 has had its arithmetic structure completely resolved — every layer found structure all the way down. Terminal 1 means an irreducible identity remains — the last LSB the transform couldn't absorb.
Reliability scope (clarification from aim_core_v3.py SPEC DELTA §8):
The invariant holds deterministically only when the full symmetric recursion tree plays out — primarily deep-decay datasets with an even byte 0 (e.g. Gradient: byte_0=0, always predicts and delivers terminal 0). For other cases:
- Random data: prediction holds ~50% of the time (coin flip, not deterministic).
- Rapid-collapse data (Prime Gaps: halts at depth 1): terminal value is 0 regardless of byte_0 parity because the deep symmetric tree never develops.
Spec §3.2 — Reference profile library
Structural Fingerprint
Your data's decay profile scored against six canonical reference classes. The fingerprint is two-dimensional: halt depth + bit distribution shape.
What the fingerprint captures
Two datasets with identical Shannon entropy can have completely different fingerprints. Random noise and natural language both run 13 layers — but their bit distributions at L0 cleanly separate them (random: flat ~0.5; language: clustered at bits 5–6 from ASCII 32–127). The fingerprint is the structural class signature: not what the data means, but what kind of mathematical object it is at the byte level.
Practical use beyond compression: An unknown binary file that fingerprints as "structured / rapid collapse" is likely a numerical sequence or table. "Language / ASCII" means text-like encoding. "Uniform noise" means the data is either truly random or already compressed/encrypted (which looks the same to this instrument). "Oscillating deep" is a gradient or ramp. These classifications work without knowing the file's domain, format, or meaning.
Practical use beyond compression: An unknown binary file that fingerprints as "structured / rapid collapse" is likely a numerical sequence or table. "Language / ASCII" means text-like encoding. "Uniform noise" means the data is either truly random or already compressed/encrypted (which looks the same to this instrument). "Oscillating deep" is a gradient or ramp. These classifications work without knowing the file's domain, format, or meaning.
Your fingerprint
What it suggests
Reference library — similarity scores
Scored by halt depth (60%) + L0 odd ratio (40%)
Section 2
Linear Chain Sweep
Alternates even_pass and mod-N on flag positions. Measures total flags across all layers — lower is better.
Interpretation
Single even-alignment is the baseline. Each mod-N column shows the total flag count when mod-N is interleaved. A value lower than
Why chaining fails Flag positions from an even-alignment pass are indices in a position space — they carry no reason to exhibit modular structure. A mod-N pass on those positions generates flag ratios of 0.75–0.93 at every layer, spreading entropy across more layers without reducing it. The mod-N operation is looking for periodic structure in a list of index values that have none. This is not a failure of mod-N as an operation — it's a failure of target selection. The Seer step (Part 1 of the AIM formula) would reject this application: studying position indices through a modular lens reveals no coherent relationship to the target.
Single would be a genuine win; in practice, chaining consistently produces more flags than single even-alignment alone.
Why chaining fails Flag positions from an even-alignment pass are indices in a position space — they carry no reason to exhibit modular structure. A mod-N pass on those positions generates flag ratios of 0.75–0.93 at every layer, spreading entropy across more layers without reducing it. The mod-N operation is looking for periodic structure in a list of index values that have none. This is not a failure of mod-N as an operation — it's a failure of target selection. The Seer step (Part 1 of the AIM formula) would reject this application: studying position indices through a modular lens reveals no coherent relationship to the target.
Total flag positions across all layers (lower = better)
Section 1c
Bit Clearing Sweep
Runs the AIM bit-plane operation independently for each of the 8 bit positions. Reveals which bit has the most latent structure — and whether even-alignment (bit 0) was actually the best target for this data.
How to read this
Each bit position defines a different structural target. Clearing bit b means: if a value has that bit set, subtract 2b and record the position as a flag. This is identical to what AIM does for bit 0 (subtract 1 from odd values), generalized to any power of two.
Fewer total flags = more structure. If a bit is already clear in most values, few positions get flagged and the recursion terminates quickly. That means the data is naturally aligned to that bit's boundary — a genuine structural property. The sweep finds which power-of-two boundary the data is most aligned to, without assuming it's always 2 (even-alignment).
Fewer total flags = more structure. If a bit is already clear in most values, few positions get flagged and the recursion terminates quickly. That means the data is naturally aligned to that bit's boundary — a genuine structural property. The sweep finds which power-of-two boundary the data is most aligned to, without assuming it's always 2 (even-alignment).
What each bit targets
Total flags per target bit — full recursive decay (lower = more structure)
Flag count comparison — all 8 bits
Per-bit decay profile — L0 flag ratio
L0 = fraction of raw values with target bit set
Section 3
Disconnected Chain — AIM Tree
Applies mod-N to even-aligned values (not flag positions). Both branches documented independently for lossless reconstruction.
AIM Tree vs linear chain
A disconnected chain splits the decomposition into two branches: Branch A recurses on even-pass flag positions; Branch B recurses on mod-N positions from aligned values. Both are lossless. The cost is the sum of both branch totals.
Experiment V1 — JS↔Python discrepancy
Entropy Models: Browser vs. Python
Two ways to price the flag layer. The browser tool uses Model A (L0-only flat bitset). Python's
compute_all_bit_entropies() uses Model B (full bit-plane recursion on flag positions). They can disagree on verdict.
Key discrepancy: The browser always uses the L0 flat-bitset flag cost — H(K/N)×N bits for one layer. Python recurses on flag positions across all layers, which is cheaper whenever flags have positional structure. For Prime Gaps bit 4, Python's Model B confirms genuine reduction; the browser's Model A also agrees here. For random noise bit 0, the two models should now agree since v11 fixed the random noise generator to use Mulberry32 (see JS↔C Encoder Delta page).
Experiment N1 — O(N) entropy winner prediction
Entropy Winner Predictor
Can the entropy winner (bit with lowest total output entropy) be predicted from a cheap O(N) pre-pass without running the full decomposition?
Predictor formula (from §N1): For each bit b, split the data into two sub-distributions: values with bit b set (cleared to aligned values) and values with bit b clear (unchanged). Predicted net = p×H_set + (1−p)×H_clear + H(p) − H(raw). If predicted_net < −raw_H×0.01, predict "reduction". The winner is the most negative predicted_net.
Reference — Implementation transparency
JS Analyzer ↔ C Encoder (v34) Delta
Every documented difference between this browser analysis tool and the shipped C encoder (aim_v34.c). Python (aim_core_v3.py) is a secondary reference that predates the C implementation. These entries are not bugs — they are deliberate or unavoidable divergences with known impact on results.
RESOLVED
Random Noise: LCG vs. Mulberry32 (fixed in v11)
v9 (historical)
genRandomNoise used LCG: s = (s×1664525 + 1013904223) >>> 0, seed=42. The increment 1013904223 is odd, so every output alternated parity. All odd values landed at even-indexed positions, meaning L1 received all-even position indices and halted immediately — halt depth ≈ 1 instead of the correct ~13.v11 (current)
genRandomNoise now uses Mulberry32 seeded at 42: a small, fast PRNG producing statistically uniform output with no parity artifact. Halt depth now matches Python Mersenne Twister at ~13 layers.StatusFixed. Random Noise demo now produces the correct noise fingerprint (~13 layers). This entry is retained for historical transparency.
MODERATE
Entropy Flag Model: L0-only flat bitset vs. full recursion
JS Model A (browser)Flag layer cost = H(K/N) × N bits. This is a single-layer flat bitset — it treats flag storage as one flat N-bit structure with K bits set.
Python Model B
compute_all_bit_entropies() recurses on flag positions using REAL, summing bitset entropy across all recursive layers: Σ H(K_l/N_l) × N_l. This is cheaper whenever flag positions have positional structure.Impact on resultsModel B always produces equal or lower total output bits than Model A. Verdict (reduction/relocation/increase) can differ for borderline cases. For Prime Gaps bit 4, both agree on "reduction". For Fibonacci bit 0, both agree on "increase". Near-threshold datasets may show different verdicts.
ThresholdBoth use ±1% of raw bits as the reduction/increase boundary (explicit in JS, confirmed in Python port).
MINOR
max_depth: C encoder max_depth=8 is a mathematical proof, not a parameter
C encoder (v34)
max_depth = 8 — hardcoded. Recursion terminates naturally when the symbol range reaches [0,1] (only two distinct values possible). Since each byte has 8 bits, the recursive even/odd split can produce at most 8 meaningful layers before the symbol space collapses. This is a mathematical property of the transform, not a configurable limit.JS Analyzer & Python v3Use
max_depth = 40 as a safety limit on the analysis loop. In practice the encoder's natural halt at depth 8 means layers beyond 8 are never produced; the analysis loop exits on empty flag lists long before 40.ImpactNo behavioral difference for well-structured data. The demo UI now labels the max_depth=8 constraint as a C encoder property, not a parameter. Noise-class data can produce spurious layers 9–13 in the JS simulator because it does not replicate the exact encoder halt conditions; treat layers beyond 8 as a simulator artifact for noise inputs.
MINOR
Gzip compression level: browser ≈ 6, Python defaults 9
Browser (CompressionStream)Uses the browser's built-in CompressionStream API, which typically applies gzip level 6. Results vary by browser engine.
Python stdlib gzipDefaults to
compresslevel=9 (maximum). Pass gz_level=6 to compute_compression_benchmark() or run_all() to reproduce browser numbers more closely.ImpactPython compression ratios will be equal or slightly better than browser ratios. The ~32-byte GZIP_OVERHEAD constant is shared and correct for both.
MODERATE
Byte-alignment invariant: narrower reliability than spec implies
Spec claimFor factor-of-8 datasets, parity of byte 0 deterministically predicts the terminal halt value.
Empirical clarificationThe invariant holds reliably only when the terminal value is structurally determined by initial parity — primarily datasets that halt cleanly (
odd_count=0) with an even byte 0 (e.g. Gradient: always byte_0=0). For random data it holds ~50% of the time (coin flip, not deterministic). For datasets that collapse very quickly (Prime Gaps halts at depth 1 with a single odd value), the terminal value is 0 regardless of byte_0 parity, because the deep symmetric-tree propagation never occurs.ImpactThe invariant check in the browser UI correctly shows "not applicable" for non-factor-of-8 data and "prediction held" for qualifying cases. The claimed determinism is real for the subset of datasets where the full symmetric recursion tree plays out (deep decay, even byte_0). Not a reliable integrity check for noise or rapid-collapse data.
MINOR
Entropy sampling: browser samples ≤500 aligned values for H estimation at L1+
BrowserFor layers beyond L0, Shannon entropy of the aligned stream is estimated from the first 500 values to maintain UI responsiveness. The flag-layer bitset entropy (the compression-relevant quantity) is computed exactly.
PythonComputes Shannon entropy of aligned streams exactly at all layers, with no sampling limit.
ImpactThe aln_h value shown per layer may be an approximation in the browser for large inputs. The compression verdict (reduction/relocation/increase) is not affected — it uses bitset entropy, not aligned-stream entropy.
MINOR
mod_pass() absent from spec Python; present in JS and Python v3
Spec v19Mentions modular alignment in chaining experiments but never provides a standalone Python implementation.
JS & Python v3
modPass(data, N) / mod_pass(data, N) is a first-class function used by linear chain sweep and disconnected chain analysis.ImpactNo behavioral difference — the chaining experiments work correctly. This is a documentation gap, not a discrepancy in results.
MINOR
Powers of 2: JS uses Math.pow() float; Python uses int **
JS
Math.pow(2, i%8) — returns a float (1.0, 2.0, 4.0, …, 128.0). Masking with & 0xFF is applied to extract the byte value.Python
2 ** (i % 8) — integer result (1, 2, 4, …, 128).ImpactNo difference. Both produce the same 8-cycle sequence: [1, 2, 4, 8, 16, 32, 64, 128]. The spec notes this explicitly (SPEC DELTA in gen_powers_of_two).
MODERATE
HALT_ANS_STRIDE: encoder halts early; analyzer models full depth
C encoder (v34)At each recursion depth, the encoder computes the cost of rANS-encoding the aligned stream directly (the "ANS stride" path) and compares it to the cost of continuing recursion. If ANS encoding wins, recursion halts early. This is
HALT_ANS_STRIDE in aim_v34.c. For structured data, the encoder typically halts 2–4 layers before the natural flag-empty terminus.JS Analyzer (v11)The Decay Profile now shows a predicted HALT_ANS_STRIDE depth (annotated on the chart) computed from an approximate cost model: ANS stride cost ≈ H(k_winner) × N_d bits vs subtree cost. Layers beyond the predicted halt are shown dimmed as structural information only — the encoder would not process them.
Impact on resultsCompression size estimates for modes 1–4 in the Compression Benchmark assume full-depth recursion. The predicted halt mode (Mode 5) corrects this for structured data. The predictor may be off by ±1 depth for borderline cases. See the Compression Benchmark page for the predicted-halt estimate.
MODERATE
Stride detection: encoder measures H(X_i | X_{i-k}); analyzer now replicates this
C encoder (v34)For each stride k in {1,2,3,4,6,8,12,16}, the encoder computes conditional entropy H(X_i | X_{i-k}) using a joint frequency table, and selects the k that minimizes this entropy. This stride value informs both the HALT_ANS_STRIDE cost estimate and the ANS encoding schedule.
JS Analyzer (v11+)The Stride Detection page replicates this computation using the same joint-frequency-table method, capped at min(1 MB, file size) to match encoder behavior. v12 adds the C encoder's
STRIDE_GAIN_THRESH = 0.05 bits/byte: if the gain is below this threshold, stride is suppressed (kWinner reset to 1, confidence 0). Previously any gain was reported, over-reporting stride significance for marginal cases.StatusThreshold applied in v12. Stride detection now matches C encoder behavior including the suppression threshold.
RESOLVED
caim mode: byte-accurate estimate added in v12
C encoder (v34)The encoder runs two passes in parallel: (a) the recursive bit-plane decomposition path, and (b) a
caim mode that sweeps all 8 bits (selecting least-set bit each step), concatenates all bitsets into one blob, then gzips the blob and the aligned stream. It takes whichever produces the smaller output.JS Analyzer (v12)
computeCaimEstimate() now replicates the C caim_encode() algorithm exactly: 8-bit adaptive sweep, bitset concatenation, gzip of flags blob + gzip of aligned stream. The byte-accurate comparison appears on the Structural Fingerprint page (caim vs recursive verdict card) and the Compression Benchmark page.Residual gapBrowser
CompressionStream uses gzip level ~6; C uses level 9. The JS caim estimate is slightly conservative — actual caim output will be marginally smaller. This is noted in the display. All size comparisons in the UI account for this bias direction.
RESOLVED
Bit sweep: encoder selects sparsest bit; analyzer now replicates this (new in v12)
C encoder (v34)At each recursion depth d,
sweep() counts set values for bits 0 through (7-d) and selects the bit with the fewest set values. This adaptive selection minimizes flag count per layer and is the core of the AIM recursive algorithm.JS Analyzer (v11 — historical)Always used
evenPass() which cleared bit 0 (LSB) at every depth. This caused halt depth to be massively overestimated for structured data (e.g., Prime Gaps would show 13 layers instead of 1-2).JS Analyzer (v12 — current)
realTransform() now uses sweepPass() with the same adaptive selection as the C encoder: at depth d, considers bits 0..(7-d) and picks the sparsest non-zero bit. Maximum depth is correctly capped at 8.StatusFixed in v12. Decay profiles, fingerprint classification, and halt predictor all now reflect the correct sweep-based behavior. The
evenPass() function is retained only for the chain experiments and entropy models, which are separate analytical paths.
INFORMATIONAL
Decode overflow bug (historical, resolved in v25): GGUF ~87.8% result affected
Prior to v25File offset arithmetic used 32-bit unsigned integers (
u32). For files larger than ~4 GB, offset values silently truncated on overflow. The encoder appeared to succeed but the decoder read incorrect byte ranges, producing a decoded file that was structurally wrong — not a valid decompression of the input.v34 (current)All file offsets use 64-bit unsigned integers (
u64). Large-file handling is correct.Impact on published resultsThe ~87.8% compression ratio reported for a large GGUF file was produced by a v16 encoder/decoder pair subject to this overflow bug. The decoded output reflected data loss from the offset truncation, not genuine compression. This result should not be cited as a validated AIM benchmark. Current encoder versions have not been re-benchmarked on the same GGUF at this time.
LOW
DR3L / symbol-mapping: ⊕ operator is more general than bit-clear
Paper formulationThe original AIM paper's ⊕ operator is defined as a general symbol-space mapping — not specifically bit-clear (even-alignment). Any invertible mapping from a symbol alphabet to a sub-alphabet qualifies.
DR3L experiment
dreal_minimal_Bits.py explored mapping decimal digit sequences (0–9) into a reduced symbol space using arbitrary bijections. DR3L demonstrated that the symbol space is a free variable: one can choose a mapping that exploits domain-specific structure rather than always using bit-clear. The experiment found 2.17× expansion for natural decimal sequences — the mapping did not exploit structure efficiently enough to beat the raw representation.RelevanceThe shipped encoder uses bit-clear as its ⊕ operator because it is fast, invertible, and works universally on byte data. DR3L showed that alternate symbol mappings are possible but require domain knowledge to be beneficial. Not a current gap in the analyzer — included for research context.