fix: resolve fast-unit-tests CI collection errors and stale config assertions by lee101 · Pull Request #77 · lee101/stock-prediction

lee101 · 2026-03-27T10:52:18Z

Summary

Four issues were causing the `fast-unit-tests` CI job (Python 3.13) to fail at collection time on PR #76:

JAX import errors (`test_jax_losses.py`, `test_jax_policy.py`, `test_jax_trainer_wandboard.py`): All three failed with `ModuleNotFoundError: No module named 'jax'` / `'flax'`. These packages are not in `requirements-ci.txt`. Added `pytest.skip(allow_module_level=True)` guards so the tests are cleanly skipped when jax/flax are absent instead of crashing collection.
Missing `resolve_data_path` function (`test_train_crypto_lora_sweep.py`): `ImportError: cannot import name 'resolve_data_path' from 'scripts.train_crypto_lora_sweep'`. Added `resolve_data_path(symbol, data_root)` that checks both flat (`{root}/{symbol}.csv`) and stocks-subdirectory (`{root}/stocks/{symbol}.csv`) layouts, and updated `main()` to use it.
Stale config assertions (`test_120d_eval_scripts.py::test_deployed_config_values`): `DEPLOYED_CONFIG` in `scripts/run_120d_worksteal_eval.py` was updated (`dip_pct` 0.20→0.18, `profit_target_pct` 0.15→0.20, `stop_loss_pct` 0.10→0.15) but the test assertions were not kept in sync. Updated test to match actual deployed values.

Tests run

```
CI=1 FAST_CI=1 CPU_ONLY=1 python -m pytest -v
-m "unit and not slow and not model_required and not cuda_required"
--tb=short --maxfail=10 tests/
```

Result: 78 passed, 14 skipped, 3963 deselected (0 failures, 0 collection errors)

Also verified `tests/test_train_crypto_lora_sweep.py::test_resolve_data_path_supports_mixed_hourly_root` passes independently.

🤖 Generated with Claude Code

feat: MKTD v3 — 20 intraday features (vol, morning_ret, vwap_dev, gap_open)

Exports pufferlib checkpoint (MLP/Residual/Transformer) to TorchScript format for libtorch C API inference. Includes round-trip verification, metadata JSON output, and logits-only wrapper for C trader. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- policy_infer.cpp: libtorch C++ with extern C API, optional build - export_torchscript.py: convert pufferlib checkpoints to TorchScript - Makefile: libcurl + libtorch optional linking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add generate_markdown_report() to profile_training.py: parses Chrome trace, speedscope flamegraph, timing.json, and gprof to produce profiles/report.md with throughput, kernel hotspots, and recommendations - Add --quick (torch.profiler only, skip py-spy) and --report-only (skip profiling, regenerate report from existing files) flags - Add tools/profile_report.py: standalone CLI for report generation - Load Chrome trace once per report (single _load_trace_events call shared between kernel and memory parsers, eliminating duplicate JSON read) - Fix _parse_chrome_trace_top_kernels return type to always tuple[list, float] - Remove unused `import re` - Add 17 tests covering all parsers, CLI flags, and report content Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fuses obs normalization + first linear layer + ReLU into a single Triton kernel, eliminating the intermediate obs_norm (B, OBS) tensor allocation. Integrates with TradingPolicy._encode() when --obs-norm is active. - pufferlib_market/kernels/fused_obs_encode.py: new CuTE-style kernel with CC-aware autotune configs (CC>=9 / CC==8 / CC<8) - pufferlib_market/train.py: set_obs_norm_stats(), _encode() Path 1, training loop skips CPU normalize when fused path active - pufferlib_market/bench_obs_encode.py: benchmark vs baseline - tests/test_fused_obs_encode.py: 14 correctness/dtype/integration tests - pufferlib_market/kernels/fused_mlp.py: H100 warp specialization note Benchmark on RTX 5090 (CC 12.0): 1.55–1.71x speedup at stocks12 sizes (OBS=209, H=1024); peak alloc drops from ~1144 KB to ~128 KB at B=64. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: fused_obs_encode.py — CuTE-style fused obs normalization + linear + ReLU

F.linear fails when input and bias have different dtypes in PyTorch 2.x. Cast bias to match weight dtype in the fallback path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g fix - autoresearch_rl.py: add --stocks12 convenience flag that uses stocks12 data by default and runs the combined STOCK_EXPERIMENTS + H100_STOCK_EXPERIMENTS pool (excluding requires_gpu='h100' configs). Sets periods_per_year=252, fee_rate=0.001, holdout_eval_steps=90. - train.py: add --early-stop-patience N flag (default 0=disabled) that stops training when ep_return does not improve by >=0.001 for N consecutive logging steps. - h100_experiment_plan.md: document 90s vs 300s overfitting finding, update recommended command to time_budget=90, max_trials=500. - scripts/alpaca_cli.py: fix typer 0.24+ compat (Annotated syntax for typer.Argument). - tests: fix test_backout_logic.py stubs for typer and src.fixtures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Download yfinance split-adjusted data for stocks12 from 2019 (PLTR limits to 2020-09-30) - Export stocks12_extended_train.bin: 1797 days (+38% vs original 1302) - Export stocks11_{train,val}.bin: 2434 days without PLTR, 11 symbols - Document eval_hours calibration: C env counts calendar days, use --eval-hours 130 for ~90 trading days - Update H100 plan to use stocks12_extended data - Add splits_audit_report.csv: 258 entries, 0 UNRECOGNIZED in stocks12 symbols Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Scans all daily and hourly stock CSVs for unadjusted forward splits - Fetches split history from yfinance in parallel (40 workers, ~700 symbols) - Detects unadjusted splits via price-ratio check (tolerance 15%) - Filters spin-off adjustments with MIN_SPLIT_FACTOR=1.9 threshold - Fixes timezone bug: convert to UTC before normalize() to avoid 4h offset - Deduplicates same-day rows before scanning for big drops (handles SPAC data) - Auto-fixes CSVs: divides pre-split prices, multiplies pre-split volume - Always backs up CSVs to .pre_split_backup before modifying - Re-exports affected MKTD binaries (stocks12/stocks20 train+val) - Applied 44 fixes: ANET, APH, CMG, COO, CTAS, DD, DECK, ETR, FAST, GOLD, GOOGL, ISRG, LRCX, MNST, NDAQ, NEE, NOW, NVO, ODFL, ORLY, PANW, SHOP, SHW, SMCI, SONY, SRE, TPL, TSCO, WMT, WSM (daily+hourly) - 36 unit tests covering all helpers and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… missing winner configs Dataset experiments (2026-03-22): - stocks12_extended (1797d, 2020+) is WORSE than stocks12_daily (1302d, 2022+) - Extra 2020-2021 COVID-era data hurts generalization on 2025-2026 val - stocks11 (2434d, no PLTR) also worse — more data ≠ better for out-of-distribution - Confirmed: stocks12_daily_train.bin is the right training set for H100 H100_STOCK_EXPERIMENTS additions: - h100_rmu4424_style/wd005/slip8: h=256 variants from random_mut_4424 (0% neg, +7.3%) - h100_h256_mut2272: h=256 with random_mut_2272 regularization - h100_rmu1228_style/slip5/wd005: obs_norm=True variants from random_mut_1228 (0% neg, +6.8%) - h100_mut2272_s4424, h100_rmu4424_s2272: cross-seed variants of top configs - Pool now 141 configs (was 132); 100 random mutations still included H100 final command updated to use stocks12_daily_train.bin Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…found Local 50-trial sweep found stock_trade_pen_05 (just trade_penalty=0.05, all defaults) is the best config seen so far: score=-3.5, 5% neg, +27.8% median, sortino=4.06. This beats random_mut_2272 (score=-5.2, 0% neg, +10.7% median, sortino=2.22). Added 8 H100 variants of trade_pen_05: - h100_trade_pen_05 (exact match, plus seeds s123/s7/s42) - h100_trade_pen_05_ent03, ent08 (entropy sweep) - h100_trade_pen_05_wd005 (with weight decay) - h100_trade_pen_05_anneal_ent (entropy annealing) H100_STOCK_EXPERIMENTS: 149 configs (was 132) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… sweep SOTA V2 50-trial sweep (stocks12_daily_train.bin, 90s/trial) found new SOTA: - stock_drawdown_pen: drawdown_penalty=0.05, trade_penalty=0.03, NO training slippage → 0% negative windows, +22.9% median, +4.8% p10, Sortino=7.25, worst=+3.3% → score=+24.9 (beats random_mut_2272 at ~-5 and all previous configs) - stock_trade_pen_05_s123: 0% negative, +16.6% median, +7.7% p10, score=+14.9 H100_STOCK_EXPERIMENTS expanded: 127 → 162 configs Added 13 h100_drawpen_* variants (seeds + drawdown/trade pen hyperparams) Added 8 h100_trade_pen_05_* variants (from previous session) Added 4 h100_rmu4424_* + 3 h100_rmu1228_* variants Key finding: drawdown_penalty outperforms slippage training as regularizer. Drawdown penalty forces policy to avoid equity dips → no reckless behavior on holdout. Updated h100_experiment_plan.md: - New SOTA table (drawpen beats random_mut_2272 by 2x on median) - Revised deployment conditions (bar raised to match drawpen results) - Updated H100 pool summary (162 configs, 500 trials = 12.5h on H100) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…score scale Bug: _quick_val_eval returns raw val_return (~0.09-0.20) but was compared against best_trial_rank_score * 0.8 where rank_score = holdout_robust_score (~24.9). This means threshold was 19.9 but val_return is never > 1.0 in normal cases, so every trial after stock_drawdown_pen was automatically early-rejected. Fix: track best_val_return separately (same scale as _quick_val_eval output) and use that for the early rejection comparison instead of best_rank_score. The new threshold ~0.073 (7.3%) is comparable to typical val_returns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…OT --h100-mode Critical finding from A40 preview sweep: - --h100-mode forces num_envs=256, minibatch_size=4096 → A40 trains 15.6M steps in 82s (cap hit before SIGTERM) → 5x more steps than stock_drawdown_pen discovery → OVERFITS - ALL drawpen configs failed under h100-mode (early rejected, holdout -68 to -104) - Real H100 would also hit 15.6M cap (in ~31s) → same overfitting Fix: use --stocks12 --max-timesteps-per-sample 200 instead of --h100-mode - Caps each trial at 3.1M steps (12 × 1302 × 200) regardless of GPU speed - Matches stock_drawdown_pen discovery conditions (3.2M steps in 90s on A40) - H100 trains 3.1M steps in ~9s, holdout ~30s → ~40s/trial → 500 trials ≈ 5.5h - Default batch size (128 envs, 2048 minibatch) gives 94 PPO updates vs 47 with h100-mode Updated H100 recommended command in h100_experiment_plan.md accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Key findings from standalone 13-variant drawpen verification sweep: - ALL drawpen seed/param variants score -49 to -170 in holdout - stock_drawdown_pen (+24.9) in v2 sweep was a ~2% lucky training run - RL is non-deterministic; same config+seed gives wildly different results H100 strategy revised: - Increase max-trials from 500 to 1200 (diversity over depth) - Early rejection is irrelevant for H100: training completes in ~9s before the 25% time check fires at 22.5s - Target: realistic holdout improvements over random_mut_2272 baseline - Expected: ~24+ positive-score configs from 1200 diverse trials Also commit leaderboard CSVs: - autoresearch_stocks12_v2_50trial.csv (50-trial v2 sweep, 2 positive) - autoresearch_h100_drawpen_preview_v2.csv (partial, killed for early rejection bias) - autoresearch_h100_drawpen_standalone.csv (13-variant verification, all negative) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tive Full 13-variant sweep (seeds 7/42/123/999/2272, param variants tp02/tp05/dd02/dd10/ent03/wd005/slip5) with early_reject_threshold=0.0 and correct 200x step cap: Best: h100_drawpen_tp05 score=-37.7, neg=25%, median=+3.1%, p10=-2.4% Most: scores -49 to -170, 20-100% negative windows Confirms stock_drawdown_pen (+24.9, v2 sweep trial 20) was a ~2% lucky training run. True hit rate for drawpen family: ~0/13 = 0% (unlucky batch) to ~1/50 = 2% at scale. H100 strategy: run 1200 diverse trials, expect ~24-48 positive configs at 2-4% hit rate. Do NOT specifically target drawpen — include in pool for coverage only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…hrough) Key finding: extending stocks12 training from 1302 days (2022-2025) to 1797 days (2020-2025) dramatically improves generalization on the hard 201-day val (Sep 2025 - Mar 2026, includes Nov 2025 - Feb 2026 bear market). Results: - Old training (1302d): 0/50 configs score positive on hard extended val - New training (1797d): stock_trade_pen_03 scores +3.10 (seed 777) and -7.81 (seed 999) vs -102 with old data — first ever positive on hard val Root cause: 2020-2021 data (COVID recovery + 2021 bull market) teaches the model about market cycles and regime detection. Models trained from 2022 only see one bear market and one recovery; they fail when encountering the 2025-2026 bear market. The extended data fixes this. Changes: - audit_stock_splits.py: add stocks12_daily_train_2019 config (2019-01-02 start, effective 2020-09-30 due to PLTR IPO, 1797 calendar days) - h100_experiment_plan.md: v5 update with extended training breakthrough, corrects previous "extended data is worse" finding (that used old easy val), updates H100 command to use stocks12_daily_train_2019.bin, updates step cap to 4,312,800 (12*1797*200), updates hit rate expectation to 5-15% (vs 0% with old data) - Add sweep result CSVs: extended_val_50trial, train2019_10trial, train2019_50trial Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

trade_penalty=0.03 identified as sweet spot on hard 201-day val with extended training data (2020-2025): scored +3.1 (seed 777) vs -102 with old training data. Add seed/param variants to increase coverage: - tp03_s7/s42/s123/s2272: seed sweep - tp03_slip5/slip10: slippage friction variants - tp03_wd01/wd05: weight decay variants - tp03_obs: observation normalization - tp03_ent03/annent: entropy variants - tp03_h512/h2048: network size variants - tp03_cosine: cosine LR schedule - tp03_full_reg: combined regularization Pool size: 253 total (95 STOCK + 158 non-GPU H100) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ariants findings tp03 variants sweep (16 configs, seed 1337, extended 1797d training) results: - tp03_s2272: -33.8 (best, seed 2272 is special for this config class) - tp03_wd01: -39.4 (median=+5.6%, wd=0.01 helps) - tp03_h2048: -50.0 (median=+6.0%, larger net benefits from 5yr data) - tp03_slip5/slip10: -110 to -130 (AVOID: slippage training hurts bear market generalization) - tp03_obs: -124 (AVOID: obs_norm hurts with trade_pen_03) Add best-combo configs: tp03_s2272_wd01, tp03_h2048_wd01, tp03_s2272_h2048 Pool is now 98 STOCK + 158 non-GPU H100 = 256 total Key rule: trade_pen_03 without slippage, without obs_norm, with wd=0.01 or h2048 Update H100 plan with full tp03 variants findings table and updated pool summary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mote pipeline - autoresearch_rl.py: added 108 tp03 variants to STOCK_EXPERIMENTS: - tp03_s777 (KNOWN WINNER on hard 201-day bear val, robust=+3.10) - tp03_s{7,42,123,888,1111,2272,3141,4242,5678,7777,9999} for seed discovery - tp03_wd01_s{777,42,2272} + tp03_h2048_s{777,42,2272} (best modifiers x seeds) - tp03_seed_{1..50}: dense sequential seed sweep for H100 (expect ~17 positive) - tp03_wd01_seed_{1..25}: wd=0.01 modifier seeds for H100 - remote_training_pipeline.py: add max_timesteps_per_sample param to build_autoresearch_cmd() and build_remote_autoresearch_plan() - launch_stocks_autoresearch_remote.py: add --max-timesteps-per-sample CLI arg (default 200, gives ~4.3M steps on 1797-day 2019 training data) Key finding: previous tp03_variants sweep used --seed 1337 override which masked all explicit per-config seeds. The actual tp03 hit rate at native seeds needs testing via the tp03_multiseed sweep (no global override, early-reject disabled). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…m, cuDNN Add torch.manual_seed(args.seed) + cuda.manual_seed_all + random.seed + numpy.seed at training startup, plus cudnn.benchmark=False for cuDNN algorithm stability. Previously only the C environment was seeded (via vec_init/vec_reset). Network weight initialization was non-deterministic, causing large result variance even with identical configs. Now each --seed value produces a reproducible training trajectory, enabling systematic seed sweeps on local hardware before H100 runs. Key implication: tp03_seed_{1..50} dense sweep will now give reproducible results so we can identify which seeds work on the hard 201-day bear market val before committing to expensive H100 time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ep cap Key bugs fixed: 1. --max-timesteps-per-sample default was 200 (4.3M steps) — models need 33M+ steps to converge. Changed to 10000 (effectively no cap; 300s wall-clock is binding). 2. --stocks12 flag was never passed to autoresearch_rl.py — remote runs used the default crypto EXPERIMENTS pool instead of STOCK_EXPERIMENTS. 3. --time-budget default changed from 300 to 90 for H100 (90s x 390k steps/sec = ~35M steps ≈ local A40 300s convergence point). Root cause of recent 0/34 positive sweep: the 200-sample cap (4.3M steps) was 8x shorter than the ~33M steps needed for convergence (found in all winning models). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Key finding: 200-sample cap (4.3M steps) was the root cause of 0/34 failures. Winning models need 33-37M steps (300s on A40). Document correct H100 command: time-budget=90 + no step cap = ~35M steps on H100 ≈ A40 300s convergence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… (2015-2026) Key changes: 1. STOCK_EXPERIMENTS: move tp03 dense seed sweep (75 configs) to new STOCK_TP03_SEED_EXPERIMENTS constant — all 75 are negative at 300s on extended 201-day bear market val and were blocking random mutations from being reached (random mutations were at index 157, now at index 82). 2. Expand random mutation slots: 30 → 300, enabling H100 500-trial sweeps with ~218 random mutation trials (after 82 named configs). 3. Add extend_stocks_history.py: downloads 2015-2019 historical data from yfinance for stocks11 (no PLTR) to extend training data 2x: - stocks12_daily_train_2019.bin: 12×1797 = 21,564 samples (from 2020-09-30) - stocks11_daily_train_2015.bin: 11×3895 = 42,845 samples (from 2015-01-02) Extended data includes: COVID crash (Mar 2020), 2018 Q4 correction, 2015-2019 diverse regimes — critical for bear market generalization. Running experiments to compare stocks12 vs stocks11-extended hit rates on the 201-day hard val (Sep 2025–Mar 2026, includes Nov 2025–Feb 2026 bear). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…y v7 - Expand random mutation slots: 300 → 450 (total pool: 532 = 82 named + 450 random) Supports 500-trial H100 runs with majority of trials as random mutations - Update h100_experiment_plan.md with v7 final config: * Pool restructuring benefits documented * stocks11 extended (42,845 samples) as H100 alternative * Expected 17-21 positive models from 500-trial H100 run at 4-5% hit rate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace torch.no_grad() with torch.inference_mode() in cutechronos validation and test functions. The main predict() methods already used inference_mode; this completes the migration for the module. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…#61) Add forecast_bias_weight to WorkStealConfig and forecast_data param to run_worksteal_backtest. Positive forecasts boost candidate scores, negative reduce them. Weight=0.0 (default) preserves identical behavior. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) Replace custom Triton attention with PyTorch SDPA (scale=1.0) as the preferred CUDA backend. SDPA auto-selects FlashAttention2/cuDNN kernels and is ~2x faster than the eager fallback on RTX 5090. New module cutechronos/modules/flex_attention.py provides: - sdpa_unscaled_attention: SDPA with scale=1.0 (recommended) - flex_unscaled_attention: FlexAttention for mask-free case, SDPA fallback for masked - eager_unscaled_attention: delegates to existing _fallbacks implementation - Backend registry with benchmark_backends() and get_best_attention_backend() Integration: FusedTimeSelfAttention and model.py now use SDPA on CUDA, with Triton and eager as fallbacks for non-CUDA paths. 56 new tests, 195 total tests passing. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Single kernel fuses residual-add and RMS LayerNorm, eliminating one full read-write of hidden state per sub-layer (36 round-trips across all encoder blocks). Provides both out-of-place and in-place variants via compile-time INPLACE constexpr flag. 26 tests covering FP32/BF16, 2D/3D shapes, edge cases, and cross-variant consistency. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Updated top-10 5bps leaderboard (145 evaluated): #4 s456: +8,802% (ultra-robust: 5bps > 8bps, Sortino=6.71) #6 s452: +8,002% (ultra-robust: 5bps > 8bps, Sortino=6.65) #7 s734: +7,160% (ultra-robust: 5bps > 8bps) #10 s446: +6,536% (ultra-robust: 5bps > 8bps) #15 s827: +4,801% (ultra-robust: 5bps > 8bps) 7 sweeps ongoing: s201-900 at 55-62% complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…56(+8802% ROBUST) New absolute record: s275 at +23,595% ann, 5bps>8bps>pool (tri-consistent). s456 enters top-5 at +8,802% ROBUST (5bps >> 8bps). 169 seeds now properly evaluated in 5bps leaderboard. New seeds: s357(+1708% ROBUST), s359(+2998%), s437(+1112%), s751(+2135%), s649(+1601%), s842(pending). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Leaderboard updated (169 evaluated at 5bps): #1 s275: +23,595% (Sortino=9.0, ultra-robust: 5bps > 8bps) #2 s240: +17,642% #3 s434: +10,359% #4 s71: +9,381% #5 s456: +8,802% (new) #6 s507: +8,273% #7 s452: +8,002% (new) Top-10 mean: +10,796% ann | 7/10 ultra-robust (5bps >= 8bps) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…42%), s845(+2662%), s760(+2385%) Batch eval of 103 seeds completes 5bps coverage. Notable new ROBUST seeds: - s467: +3242% ann, sortino=6.10 (s401-500) - s845: +2662% ann, sortino=4.71 (s801-900) - s760: +2385% ann, sortino=5.23 (s701-800) - s279: +2127% ROBUST (fixed: 5bps=3.62 >> 8bps=2.63) - s210: +4461% ROBUST confirmed - s209: +3091% ROBUST confirmed - s904: +986% ROBUST (s901-1000 not all bad!) Also fixed s277/s279 swapped entries from parallel eval. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…0(+991%) New high-value ROBUST finds: - s658: +1837% ann (s601-700), 5bps=3.31 > 8bps=3.21 - s279: +2127% ann (s201-300) ROBUST, corrected from earlier swap - s277: +1230% ann (s201-300) ROBUST - s660: +991% ann (s601-700) ROBUST - s567: +776% ann, s470: +1028% ann (overfitters) Also: s564(+1261%), s552(+665% ROBUST), s465(+855% overfitter) 196 seeds evaluated, 85 ROBUST confirmed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ew finds Extraordinary s901-1000 discoveries: - s921: 5bps=6.86, +6386% ann, sortino=5.60, ROBUST! (pool=5.76 -> honest=6.66 -> 5bps=6.86) - s914: +1839% OVERFITTER, s915: +1414% ROBUST, s920: +680% OVERFITTER s801-900: - s850: 5bps=5.86, +4869% ann, sortino=5.99, ROBUST! (pool=5.12 -> 5bps=5.86) Other new ROBUST seeds: s658(+1837%), s660(+991%), s279(+2127%), s284(+894%) Total: 206 seeds in 5bps leaderboard, 81 ROBUST. Sweeps ~65-87% complete per range. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s discovered Top seeds by 5bps annualized return: - s275: +23,595% (Sortino=9.0, ultra-robust) — all-time champion - s240: +17,642% (Sortino=7.0) - s434: +10,359% (Sortino=6.99) - s71: +9,381% (Sortino=8.29) - s456: +8,802% (Sortino=6.71, ultra-robust) New champions this session: s921 (+6,386%, ultra-robust), s850 (+4,869%) Coverage: s61-120 ✓, s121-200 ✓, others 50-80% complete Auto 5bps monitor running continuously, all seeds >800% evaluated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…(+7647%), s578(+5688%), s1203(+3619%) Key new finds (all ROBUST): - s292: +19,815% ann, S=7.99 — new #2 ROBUST champion - s765: +7,647% ann, S=5.76 — new #7 ROBUST - s578: +5,688% ann, S=8.14 — new #18 ROBUST (high sortino) - s1203: +3,619% ann, S=6.43 — new from s1201-1300 range - s1202: +4,121% ann, S=5.65 — new from s1201-1300 range - s1206: +1,012% ann ROBUST, s1005: +854% ann ROBUST New ranges discovered: s1001-1100 (103 seeds), s1101-1200 (6 seeds), s1201-1300 (14 seeds) 5bps auto-monitor updated to cover all ranges up to s1300+ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Champion leaderboard (top 3 by 5bps annualized): - s670: +29,099% (Sortino=7.56, p50=15.45x/180d) — NEW ALL-TIME CHAMPION - s275: +23,595% (Sortino=9.00, ultra-robust) - s292: +20,000% (Sortino=7.99, ultra-robust) 233 seeds evaluated at 5bps; sweep ~70% complete. Updated prod.md with comprehensive top-10 leaderboard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ents, unified hourly fixes - src/robust_trading_metrics.py: new robust trading metrics module - scripts/evaluate_binance_lora_candidate.py: binance lora candidate evaluator - scripts/run_binance_crypto_lora_sweep.py: expanded binance crypto lora sweep - pufferlib_market/autoresearch_rl.py: improved autoresearch with gpu pool support - pufferlib_market/gpu_pool_rl.py: gpu pool RL training - pufferlib_market/replay_eval.py: improved replay evaluation - unified_hourly_experiment/trade_unified_hourly.py: hourly trading improvements - tests: comprehensive test coverage additions - alpacaprogress6.md: alpaca progress notes - leaderboard CSVs: mixed23 sweep results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…neal_ent + stocks20 - scripts/stocks_deep_sweep.sh: 5-phase sweep covering: - Phase A: stocks12 tp05 s100-299 @ 35M steps - Phase B: stocks20 tp05 s1-80 @ 35M steps - Phase C: stocks12 tp03 s1-60 @ 35M steps - Phase D: stocks12 tp07 s1-60 @ 35M steps - Phase E: stocks12 anneal_ent tp05 s1-60 @ 35M steps - Inspired by crypto70: need 200+ seeds to find champions - pufferlib_market/stocks12_seed_sweep_leaderboard.csv: s51-87 results at 15M steps - s55 (med=9.38%, 5/50 neg) best of first batch — retraining at 35M - Disk cleanup: freed 110GB by removing non-champion old checkpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace manual `processed_batches += 1` counter inside the for loop with `enumerate(loader, start=1)` as required by ruff SIM113. This fixes the failing CI lint job (Fast CI / lint). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Portfolio combiner: per-symbol models -> diversified portfolio with equal/inverse_vol/sqrt_sortino allocation. sqrt_sortino winner: +40.69% med ret, Sort=5.21, -0.87% DD, 100% positive (15 symbols, 10x30d). New research features in trainer: - Spectral regularization (penalize max singular value) - Multi-period loss (train on sub-windows for horizon diversity) - WARP weight averaging (tested: hurts vs best single checkpoint) R5 experiment configs: spectral, multiperiod, combos, champion tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- scripts/finetune_alpaca_symbols.sh: fine-tune Chronos2 LoRA for all 18 Alpaca live symbols using proven differencing preaug (33.7% MAE improvement on QUBT). Promotes any symbol improving >5% over baseline. - scripts/stocks_extended_sweep.sh: Phase F sweep — stocks12 extended (7.1yr) training data, evaluating on standard val for fair comparison. Runs after deep sweep completes. - e2etraining smoke test confirmed working (2026-03-27) - Key finding: stocks seeds overfit at 35M steps — s55 fell from 9.38% to 0.48% med. Short scan (15M) + selective 35M retrain is the right strategy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…sertions Four issues were causing the fast-unit-tests CI job to fail at collection time: 1. tests/test_jax_losses.py, test_jax_policy.py, test_jax_trainer_wandboard.py: All three fail with ModuleNotFoundError for jax/flax which are not in requirements-ci.txt. Added pytest.skip(allow_module_level=True) guards so tests are cleanly skipped when jax/flax are absent rather than erroring. 2. tests/test_train_crypto_lora_sweep.py: ImportError for resolve_data_path which was missing from scripts/train_crypto_lora_sweep.py. Added resolve_data_path() that checks both {root}/{symbol}.csv (flat) and {root}/stocks/{symbol}.csv (sub-dir) layouts, and updated main() to use it. 3. tests/test_120d_eval_scripts.py::test_deployed_config_values: DEPLOYED_CONFIG in scripts/run_120d_worksteal_eval.py was updated (dip_pct 0.20->0.18, profit_target_pct 0.15->0.20, stop_loss_pct 0.10->0.15) but the test assertions were not kept in sync. Updated test to match actual deployed values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lee101 · 2026-03-27T10:52:21Z

Codex Infinity
Hi! I'm Codex Infinity, your coding agent for this repo.

Start a task on this PR's branch by commenting:

@codeggsinfinity review
@codeggsinfinity fix the failing tests
/codex

Tasks and logs: https://codex-infinity.com

lee101 and others added 30 commits March 22, 2026 20:46

Merge pull request #66 from lee101/worktree-agent-a076df9a

92e2a22

feat: MKTD v3 — 20 intraday features (vol, morning_ret, vwap_dev, gap_open)

Merge pull request #67 from lee101/worktree-agent-aa3d4dc8

3d4f3e1

feat: fused_obs_encode.py — CuTE-style fused obs normalization + linear + ReLU

fix: cast bias to weight.dtype in fused_obs_encode fallback

c068ad8

F.linear fails when input and bias have different dtypes in PyTorch 2.x. Cast bias to match weight dtype in the fallback path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lee101 and others added 28 commits March 25, 2026 14:40

fx

8a6d731

fx

eadcf1c

newresults

6dd9316

fix: correct unified hourly crypto hold timing

7f455b5

Add JAX stock trainer and RunPod launcher

10e0966

Improve PyTorch checkpoint robustness

d286c78

Add unified HF trainer bridge

1519869

fxhmm

9d0290d

Improve WandBoard scouting and RunPod safety

8b38d10

Add hourly multiseed market scout

73f697c

fx

197ac41

merge: resolve upstream conflicts (keep both branches' additions)

a847614

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' of github.com:lee101/stock-prediction

913ecf3

lee101 force-pushed the main branch from 0ce1bac to 597a942 Compare April 13, 2026 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve fast-unit-tests CI collection errors and stale config assertions#77

fix: resolve fast-unit-tests CI collection errors and stale config assertions#77
lee101 wants to merge 522 commits into
mainfrom
ci-fix/stock-prediction-fast-unit-tests-76

lee101 commented Mar 27, 2026

Uh oh!

lee101 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lee101 commented Mar 27, 2026

Summary

Tests run

Uh oh!

lee101 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant