Validation

We use robustness and surrogate tests suited to heterogeneous records (PADM2M long, CALS10k short, GGF100k differing). Classical train/holdout isn’t appropriate; instead, we:

1. Internal robustness (within-record) - Leave-one-signal-out: spikes that persist are robust. - Downsample/coarsen cadence: spikes that survive scale changes are credible. - Window sensitivity: vary ±25–50% and track timing drift bands.

2. Surrogate/negative controls - IAAFT surrogates preserve spectrum and distribution; spike rates and magnitudes should drop toward baseline. - Block-shuffle preserves local structure while breaking long-range alignment. - Full time-scramble as a strong negative control.

3. Age-model uncertainty - Monte Carlo “wiggle” timestamps and recompute ΔZ; report spike retention rate and timing uncertainty. - Coarse bin alignment (e.g., 500–1000 yr) should preserve broad spike clusters.

4. Cross-product consistency (no fitting) - Minimal harmonization (coarse cadence + shared band-pass) across overlapping spans; report descriptive coincidence only.

5. Event overlays (descriptive) - Overlay independent bands strictly for visualization; compute coincidence vs. block-shift baselines for empirical p-values.

Acceptance criteria (alpha)

- Robustness: ≥60% of prominent PADM2M spikes persist under leave-one-out and ±33% window variation; ≤10% timing drift (relative to window). - Surrogates: Spike rate −40% and top-quantile magnitude −30% vs. real. - Age-model: ≥50% spike retention; median timing shift < one bin width. - Cross-product: Overlaps exceed 95% of block-shift baselines.

Artifacts

Code (validation utilities)

- run_padm2m_validation.py - surrogates.py - robustness.py - age_wiggle.py - overlay_stats.py

Plan

- docs/VALIDATION_PLAN.md

Data JSON

- assets/data/validation/padm2m_validation.json

Figures

- /figures/validation/padm2m_window_sensitivity.png - /figures/validation/padm2m_surrogate_q95.png - /figures/validation/padm2m_downsample_overlay.png

PADM2M snapshots

Overlaid ΔZ curves for several smoothing/rolling windows; base curve in black.
Window sensitivity: ΔZ vs. varying windows; stability suggests robustness.
Histogram of surrogate ΔZ Q95 values with a vertical line for the real-data Q95.
Real vs. IAAFT surrogates (Q95 of ΔZ). Real exceeds surrogate baseline.
Base ΔZ overlaid with coarsened cadence versions to show stability.
Downsampling robustness: broad spikes persist under coarsening.

JSON summary: padm2m_validation.json.

Safe summary

Given heterogeneous datasets, we use robustness and surrogate tests instead of classical holdouts. ΔZ spikes persist under signal dropouts and parameter variation and are rarer in surrogates that preserve autocorrelation. Coincidence with independent event bands exceeds baselines from block-shift tests. These results support ΔZ as a regime-change detector; they do not establish mechanism or prediction.