{"filename":"agent_20260505_1413.md","content":"# Agent Report — Window-Dependence of Orientation Score on Real Market Data\n**Date**: 2026-05-05 14:13\n**Piano**: 2\n**Tension explored**: First real-market test of the D-ND orientation pipeline\n\n## Claim Under Test\n\nThe D-ND ordered-vs-shuffle orientation score, which passed synthetic calibration in cycle 1, detects genuine regime structure in real equity and crypto markets — not just window-placement artifacts.\n\n## Question\n\nWhen the orientation score is applied to real market data across different assets and window lengths, does the signal persist (real regime) or collapse (window-placement artifact)?\n\n## Experiment Design\n\nFour runs of `exp_regime_shift.py --from-market` on real OHLCV data:\n\n1. **SPY 1y** — S&P 500 ETF, 252 daily obs (2025-05-05 to 2026-05-05)\n2. **SPY 2y** — same asset, doubled window, 501 obs (2024-05-06 to 2026-05-05)\n3. **QQQ 1y** — Nasdaq-100 ETF, 252 daily obs (2025-05-05 to 2026-05-05)\n4. **BTC 365d** — Bitcoin via CoinGecko, 366 daily obs (2025-05-06 to 2026-05-05)\n\nEach run: 128 shuffle surrogates, same-distribution null baseline. Naive baselines: VaR 95% and realized volatility. Promotion threshold: effect_z >= 3.0 and ordered > shuffle_mean.\n\nPrior art context: this test is analogous to a Bai-Perron change-point test (does the midpoint coincide with a structural break?) but measured via orientation preservation under shuffle rather than parameter estimation. Hamilton Markov-Switching would detect regimes regardless of window placement — our score does not, because it is a fixed-split metric.\n\n## Results\n\n| Asset | Window | n | ordered | shuffle_mean | shuffle_std | effect_z | VaR_95 | realized_vol | cassini_residue | verdict |\n|-------|--------|--:|--------:|-------------:|------------:|---------:|-------:|-------------:|----------------:|---------|\n| SPY | 1y | 251 | 5.91e-06 | 1.29e-06 | 1.54e-06 | **2.998** | -0.0129 | 0.124 | 0.00934 | NO_DELTA |\n| SPY | 2y | 500 | 1.11e-06 | 2.02e-06 | 2.44e-06 | **-0.373** | -0.0158 | 0.166 | 0.00433 | NO_DELTA |\n| QQQ | 1y | 251 | 1.29e-05 | 2.45e-06 | 3.11e-06 | **3.349** | -0.0181 | 0.162 | 0.01011 | DND_DELTA |\n| BTC | 365d | 365 | 3.94e-05 | 1.23e-05 | 1.24e-05 | **2.192** | -0.0364 | 0.356 | 0.00287 | NO_DELTA |\n\nData provenance:\n- SPY/QQQ: `yfinance`, auto_adjust=True, retrieved 2026-05-05T14:01-14:15 UTC\n- BTC: CoinGecko free tier (close-only, OHL=close proxy), retrieved 2026-05-05T14:01 UTC\n\n## Key Findings\n\n### 1. QQQ 1y crosses 3-sigma; SPY 1y is borderline; SPY 2y collapses\n\nQQQ 1y (effect_z=3.35) is the first DND_DELTA on real market data. SPY 1y at 2.998 is within rounding distance of the threshold. But SPY 2y at -0.37 shows the score is **below** the shuffle mean when the window doubles.\n\n### 2. The orientation score is window-placement-dependent (structural limitation)\n\nThe `orientation_score()` function splits the series at its midpoint and measures mean/vol/transition gaps between left and right halves. This means it answers: \"Is there a regime boundary near the temporal midpoint of this window?\" — not \"Does this window contain a regime shift?\"\n\n- SPY 1y midpoint: ~late Oct 2025. If a real vol regime change occurred near that date, the score captures it.\n- SPY 2y midpoint: ~late Oct 2024. Different location, different regime context — the signal reverses (effect_z = -0.373, below shuffle mean).\n\nThis is not a bug in the pipeline; it is a structural property of the fixed-split design. The score is a **necessary** detector (if it fires, the midpoint coincides with a structure break) but not a **sufficient** one (if it doesn't fire, a regime shift may exist elsewhere in the window).\n\n### 3. BTC has higher absolute orientation but also higher variance\n\nBTC's ordered score (3.94e-05) is 6.6x larger than SPY's, but so is its shuffle variance. The effect_z normalizes this correctly — BTC at 2.19 does not separate from its own null. Bitcoin's fat tails (realized_vol=0.356 vs SPY's 0.124) create a wider shuffle distribution that absorbs more of the ordered signal.\n\n### 4. The QQQ DND_DELTA is not promotable as a regime finding\n\nPer protocol, a single-window DND_DELTA is necessary but not sufficient. We would need DND_DELTA on a second independent window (e.g., QQQ 6mo, QQQ 2y, or a different era) to rule out coincidental midpoint placement. This cycle does not promote it as a regime finding — it promotes the **window-dependence constraint** instead.\n\n### 5. Cassini residue does not discriminate\n\nCassini delta (ordered minus shuffle mean) is small and sign-inconsistent across assets: -0.0042 (SPY 1y), -0.0018 (SPY 2y), -0.0046 (QQQ), +0.0006 (BTC). No interpretable pattern; the Cassini diagnostic shows no discriminating power at daily frequency on these instruments (deltas range from -0.0046 to +0.0006, sign-inconsistent across assets).\n\n## Verdict\n\n**NO_DELTA (pipeline constraint promoted, not regime finding).**\n\n**The Claim Under Test was falsified**: the orientation score does not detect genuine regime structure across assets and windows — it detects midpoint-coincidence artifacts. SPY 1y vs 2y (same asset, different midpoint, effect_z collapses from 2.998 to -0.373) is the falsifying evidence. The new claim that emerged is: the score is a midpoint-boundary detector, not a regime detector, and requires a sliding-split redesign.\n\nThe D-ND orientation pipeline successfully ran on real market data for the first time. The single DND_DELTA (QQQ 1y, effect_z=3.35) does not survive the multi-window requirement: same-asset different-window (SPY 1y vs 2y) shows the score is midpoint-placement-dependent, not regime-intrinsic.\n\n**Promotable constraint**: The fixed-split `orientation_score()` detects \"regime boundary at window midpoint\", not \"regime boundary anywhere in window.\" Future cycles must either:\n- (a) Scan across split points (sliding midpoint) to find the maximum separation, with Bonferroni correction for multiple comparisons, or\n- (b) Replace fixed-split with a rolling-window orientation that reports the location of maximum ordered-vs-shuffle separation.\n\nThis is the natural next tension for the pipeline.\n\n## Bicono della scoperta\n\n- **Due radici**:\n  - Root 1 (negative): The fixed-split design conflates \"regime exists\" with \"regime boundary happens to be near the midpoint\" — a positional artifact, not a structural detector.\n  - Root 2 (positive): The pipeline end-to-end works on real market data (fetch, cache, data_card, shuffle null, verdict). The infrastructure for cycle 2+ is validated.\n\n- **Singolare**: The orientation score is a midpoint detector, not a regime detector. This is one constraint that applies to all windows, all assets, all frequencies. It cannot be resolved by running more assets — it requires changing the score function itself.\n\n- **Invariante di passaggio**: What survives from cycle 1 to cycle 2: the ordered-vs-shuffle protocol is sound (correct null, correct normalization). What does not survive: the assumption that fixed-split orientation is sufficient for regime detection.\n\n- **Campo di possibilita'**: The sliding-split variant opens a new axis: instead of one effect_z per window, a profile of effect_z(t) across all possible split points. The maximum of this profile locates the regime boundary; its significance (after multiple-comparison correction) determines whether the regime is real. This is the next experiment.\n\n## Files\n- Report: `data/finance/reports/agent_20260505_1413.md`\n- SPY cache: `data/finance/market_cache/yfinance_spy_1y_now_1d_*.json`\n- QQQ cache: `data/finance/market_cache/yfinance_qqq_1y_now_1d_*.json`\n- BTC cache: `data/finance/market_cache/coingecko_bitcoin_*.json`\n- Tool: `domains/finance/tools/exp_regime_shift.py`\n- Data source: `domains/finance/tools/market_data.py`\n","title":"Agent Report — Window-Dependence of Orientation Score on Real Market Data","verdict":"","bicono":{"roots":"- Root 1 (negative): The fixed-split design conflates \"regime exists\" with \"regime boundary happens to be near the midpoint\" — a positional artifact, not a structural detector.\n  - Root 2 (positive): The pipeline end-to-end works on real market data (fetch, cache, data_card, shuffle null, verdict). The infrastructure for cycle 2+ is validated.","singular":"The orientation score is a midpoint detector, not a regime detector. This is one constraint that applies to all windows, all assets, all frequencies. It cannot be resolved by running more assets — it requires changing the score function itself.","invariant":"What survives from cycle 1 to cycle 2: the ordered-vs-shuffle protocol is sound (correct null, correct normalization). What does not survive: the assumption that fixed-split orientation is sufficient for regime detection.","field":"The sliding-split variant opens a new axis: instead of one effect_z per window, a profile of effect_z(t) across all possible split points. The maximum of this profile locates the regime boundary; its significance (after multiple-comparison correction) determines whether the regime is real. This is the next experiment."},"size":7769,"mtime":"2026-05-05T14:18:09.067955+00:00"}