fix: resolve CI fast-unit-tests collection errors (jax skip + resolve_data_path)#76
fix: resolve CI fast-unit-tests collection errors (jax skip + resolve_data_path)#76lee101 wants to merge 516 commits into
Conversation
New experiments: robust_reg_tp005_ent at seeds 42/7/123, h1536_robust_ent, h2048_robust_ent for A40 sweep. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ctrader sim parity verification
… --multi-period-eval to autoresearch_rl - evaluate_fast.py: add multi_period_eval() that evaluates across multiple window sizes (default 5/15/30/60/90 days) and returns a smoothness_score (weighted avg of p10_sortino, shorter windows weighted more to penalise single-spike wins). Module-level _SMOOTHNESS_WEIGHTS constant (5=3,15=2, 30=2,60=1,90=1). CLI gains --multi-windows and --n-windows-per-size flags. - autoresearch_rl.py: add --multi-period-eval flag to run_trial; when set, calls multi_period_eval() in-process after training and writes smooth_score + per-window p10_sortino columns to the leaderboard CSV. smooth_score is now the top-priority rank metric in select_rank_score(). Also adds --multi-period-windows, --multi-period-n-per-size, --multi-period-slippage-bps flags and smooth_score as a valid --rank-metric choice. - tests/test_multi_period_eval.py: 7 tests covering signature, defaults, CLI flag presence, error-path behaviour, and autoresearch help output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement Binance REST API client in C with libcurl
Introduces src/split_monitor.py to detect recent forward splits via yfinance and force-close any held positions before the policy observes distorted price data (stale pre-split entry_price causes fake losses). Integrates into execute_stock_signals in the unified orchestrator: checks held symbols once per cycle, logs any split event to logs/split_events.log, and adds affected symbols to trail_exit_syms so no new orders are placed on them in the same cycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: MKTD v3 — 20 intraday features (vol, morning_ret, vwap_dev, gap_open)
Exports pufferlib checkpoint (MLP/Residual/Transformer) to TorchScript format for libtorch C API inference. Includes round-trip verification, metadata JSON output, and logits-only wrapper for C trader. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- policy_infer.cpp: libtorch C++ with extern C API, optional build - export_torchscript.py: convert pufferlib checkpoints to TorchScript - Makefile: libcurl + libtorch optional linking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add generate_markdown_report() to profile_training.py: parses Chrome trace, speedscope flamegraph, timing.json, and gprof to produce profiles/report.md with throughput, kernel hotspots, and recommendations - Add --quick (torch.profiler only, skip py-spy) and --report-only (skip profiling, regenerate report from existing files) flags - Add tools/profile_report.py: standalone CLI for report generation - Load Chrome trace once per report (single _load_trace_events call shared between kernel and memory parsers, eliminating duplicate JSON read) - Fix _parse_chrome_trace_top_kernels return type to always tuple[list, float] - Remove unused `import re` - Add 17 tests covering all parsers, CLI flags, and report content Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fuses obs normalization + first linear layer + ReLU into a single Triton kernel, eliminating the intermediate obs_norm (B, OBS) tensor allocation. Integrates with TradingPolicy._encode() when --obs-norm is active. - pufferlib_market/kernels/fused_obs_encode.py: new CuTE-style kernel with CC-aware autotune configs (CC>=9 / CC==8 / CC<8) - pufferlib_market/train.py: set_obs_norm_stats(), _encode() Path 1, training loop skips CPU normalize when fused path active - pufferlib_market/bench_obs_encode.py: benchmark vs baseline - tests/test_fused_obs_encode.py: 14 correctness/dtype/integration tests - pufferlib_market/kernels/fused_mlp.py: H100 warp specialization note Benchmark on RTX 5090 (CC 12.0): 1.55–1.71x speedup at stocks12 sizes (OBS=209, H=1024); peak alloc drops from ~1144 KB to ~128 KB at B=64. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: fused_obs_encode.py — CuTE-style fused obs normalization + linear + ReLU
F.linear fails when input and bias have different dtypes in PyTorch 2.x. Cast bias to match weight dtype in the fallback path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g fix - autoresearch_rl.py: add --stocks12 convenience flag that uses stocks12 data by default and runs the combined STOCK_EXPERIMENTS + H100_STOCK_EXPERIMENTS pool (excluding requires_gpu='h100' configs). Sets periods_per_year=252, fee_rate=0.001, holdout_eval_steps=90. - train.py: add --early-stop-patience N flag (default 0=disabled) that stops training when ep_return does not improve by >=0.001 for N consecutive logging steps. - h100_experiment_plan.md: document 90s vs 300s overfitting finding, update recommended command to time_budget=90, max_trials=500. - scripts/alpaca_cli.py: fix typer 0.24+ compat (Annotated syntax for typer.Argument). - tests: fix test_backout_logic.py stubs for typer and src.fixtures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Download yfinance split-adjusted data for stocks12 from 2019 (PLTR limits to 2020-09-30)
- Export stocks12_extended_train.bin: 1797 days (+38% vs original 1302)
- Export stocks11_{train,val}.bin: 2434 days without PLTR, 11 symbols
- Document eval_hours calibration: C env counts calendar days, use --eval-hours 130 for ~90 trading days
- Update H100 plan to use stocks12_extended data
- Add splits_audit_report.csv: 258 entries, 0 UNRECOGNIZED in stocks12 symbols
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Scans all daily and hourly stock CSVs for unadjusted forward splits - Fetches split history from yfinance in parallel (40 workers, ~700 symbols) - Detects unadjusted splits via price-ratio check (tolerance 15%) - Filters spin-off adjustments with MIN_SPLIT_FACTOR=1.9 threshold - Fixes timezone bug: convert to UTC before normalize() to avoid 4h offset - Deduplicates same-day rows before scanning for big drops (handles SPAC data) - Auto-fixes CSVs: divides pre-split prices, multiplies pre-split volume - Always backs up CSVs to .pre_split_backup before modifying - Re-exports affected MKTD binaries (stocks12/stocks20 train+val) - Applied 44 fixes: ANET, APH, CMG, COO, CTAS, DD, DECK, ETR, FAST, GOLD, GOOGL, ISRG, LRCX, MNST, NDAQ, NEE, NOW, NVO, ODFL, ORLY, PANW, SHOP, SHW, SMCI, SONY, SRE, TPL, TSCO, WMT, WSM (daily+hourly) - 36 unit tests covering all helpers and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… missing winner configs Dataset experiments (2026-03-22): - stocks12_extended (1797d, 2020+) is WORSE than stocks12_daily (1302d, 2022+) - Extra 2020-2021 COVID-era data hurts generalization on 2025-2026 val - stocks11 (2434d, no PLTR) also worse — more data ≠ better for out-of-distribution - Confirmed: stocks12_daily_train.bin is the right training set for H100 H100_STOCK_EXPERIMENTS additions: - h100_rmu4424_style/wd005/slip8: h=256 variants from random_mut_4424 (0% neg, +7.3%) - h100_h256_mut2272: h=256 with random_mut_2272 regularization - h100_rmu1228_style/slip5/wd005: obs_norm=True variants from random_mut_1228 (0% neg, +6.8%) - h100_mut2272_s4424, h100_rmu4424_s2272: cross-seed variants of top configs - Pool now 141 configs (was 132); 100 random mutations still included H100 final command updated to use stocks12_daily_train.bin Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…found Local 50-trial sweep found stock_trade_pen_05 (just trade_penalty=0.05, all defaults) is the best config seen so far: score=-3.5, 5% neg, +27.8% median, sortino=4.06. This beats random_mut_2272 (score=-5.2, 0% neg, +10.7% median, sortino=2.22). Added 8 H100 variants of trade_pen_05: - h100_trade_pen_05 (exact match, plus seeds s123/s7/s42) - h100_trade_pen_05_ent03, ent08 (entropy sweep) - h100_trade_pen_05_wd005 (with weight decay) - h100_trade_pen_05_anneal_ent (entropy annealing) H100_STOCK_EXPERIMENTS: 149 configs (was 132) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… sweep SOTA V2 50-trial sweep (stocks12_daily_train.bin, 90s/trial) found new SOTA: - stock_drawdown_pen: drawdown_penalty=0.05, trade_penalty=0.03, NO training slippage → 0% negative windows, +22.9% median, +4.8% p10, Sortino=7.25, worst=+3.3% → score=+24.9 (beats random_mut_2272 at ~-5 and all previous configs) - stock_trade_pen_05_s123: 0% negative, +16.6% median, +7.7% p10, score=+14.9 H100_STOCK_EXPERIMENTS expanded: 127 → 162 configs Added 13 h100_drawpen_* variants (seeds + drawdown/trade pen hyperparams) Added 8 h100_trade_pen_05_* variants (from previous session) Added 4 h100_rmu4424_* + 3 h100_rmu1228_* variants Key finding: drawdown_penalty outperforms slippage training as regularizer. Drawdown penalty forces policy to avoid equity dips → no reckless behavior on holdout. Updated h100_experiment_plan.md: - New SOTA table (drawpen beats random_mut_2272 by 2x on median) - Revised deployment conditions (bar raised to match drawpen results) - Updated H100 pool summary (162 configs, 500 trials = 12.5h on H100) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…score scale Bug: _quick_val_eval returns raw val_return (~0.09-0.20) but was compared against best_trial_rank_score * 0.8 where rank_score = holdout_robust_score (~24.9). This means threshold was 19.9 but val_return is never > 1.0 in normal cases, so every trial after stock_drawdown_pen was automatically early-rejected. Fix: track best_val_return separately (same scale as _quick_val_eval output) and use that for the early rejection comparison instead of best_rank_score. The new threshold ~0.073 (7.3%) is comparable to typical val_returns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OT --h100-mode Critical finding from A40 preview sweep: - --h100-mode forces num_envs=256, minibatch_size=4096 → A40 trains 15.6M steps in 82s (cap hit before SIGTERM) → 5x more steps than stock_drawdown_pen discovery → OVERFITS - ALL drawpen configs failed under h100-mode (early rejected, holdout -68 to -104) - Real H100 would also hit 15.6M cap (in ~31s) → same overfitting Fix: use --stocks12 --max-timesteps-per-sample 200 instead of --h100-mode - Caps each trial at 3.1M steps (12 × 1302 × 200) regardless of GPU speed - Matches stock_drawdown_pen discovery conditions (3.2M steps in 90s on A40) - H100 trains 3.1M steps in ~9s, holdout ~30s → ~40s/trial → 500 trials ≈ 5.5h - Default batch size (128 envs, 2048 minibatch) gives 94 PPO updates vs 47 with h100-mode Updated H100 recommended command in h100_experiment_plan.md accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Key findings from standalone 13-variant drawpen verification sweep: - ALL drawpen seed/param variants score -49 to -170 in holdout - stock_drawdown_pen (+24.9) in v2 sweep was a ~2% lucky training run - RL is non-deterministic; same config+seed gives wildly different results H100 strategy revised: - Increase max-trials from 500 to 1200 (diversity over depth) - Early rejection is irrelevant for H100: training completes in ~9s before the 25% time check fires at 22.5s - Target: realistic holdout improvements over random_mut_2272 baseline - Expected: ~24+ positive-score configs from 1200 diverse trials Also commit leaderboard CSVs: - autoresearch_stocks12_v2_50trial.csv (50-trial v2 sweep, 2 positive) - autoresearch_h100_drawpen_preview_v2.csv (partial, killed for early rejection bias) - autoresearch_h100_drawpen_standalone.csv (13-variant verification, all negative) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tive Full 13-variant sweep (seeds 7/42/123/999/2272, param variants tp02/tp05/dd02/dd10/ent03/wd005/slip5) with early_reject_threshold=0.0 and correct 200x step cap: Best: h100_drawpen_tp05 score=-37.7, neg=25%, median=+3.1%, p10=-2.4% Most: scores -49 to -170, 20-100% negative windows Confirms stock_drawdown_pen (+24.9, v2 sweep trial 20) was a ~2% lucky training run. True hit rate for drawpen family: ~0/13 = 0% (unlucky batch) to ~1/50 = 2% at scale. H100 strategy: run 1200 diverse trials, expect ~24-48 positive configs at 2-4% hit rate. Do NOT specifically target drawpen — include in pool for coverage only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hrough) Key finding: extending stocks12 training from 1302 days (2022-2025) to 1797 days (2020-2025) dramatically improves generalization on the hard 201-day val (Sep 2025 - Mar 2026, includes Nov 2025 - Feb 2026 bear market). Results: - Old training (1302d): 0/50 configs score positive on hard extended val - New training (1797d): stock_trade_pen_03 scores +3.10 (seed 777) and -7.81 (seed 999) vs -102 with old data — first ever positive on hard val Root cause: 2020-2021 data (COVID recovery + 2021 bull market) teaches the model about market cycles and regime detection. Models trained from 2022 only see one bear market and one recovery; they fail when encountering the 2025-2026 bear market. The extended data fixes this. Changes: - audit_stock_splits.py: add stocks12_daily_train_2019 config (2019-01-02 start, effective 2020-09-30 due to PLTR IPO, 1797 calendar days) - h100_experiment_plan.md: v5 update with extended training breakthrough, corrects previous "extended data is worse" finding (that used old easy val), updates H100 command to use stocks12_daily_train_2019.bin, updates step cap to 4,312,800 (12*1797*200), updates hit rate expectation to 5-15% (vs 0% with old data) - Add sweep result CSVs: extended_val_50trial, train2019_10trial, train2019_50trial Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
trade_penalty=0.03 identified as sweet spot on hard 201-day val with extended training data (2020-2025): scored +3.1 (seed 777) vs -102 with old training data. Add seed/param variants to increase coverage: - tp03_s7/s42/s123/s2272: seed sweep - tp03_slip5/slip10: slippage friction variants - tp03_wd01/wd05: weight decay variants - tp03_obs: observation normalization - tp03_ent03/annent: entropy variants - tp03_h512/h2048: network size variants - tp03_cosine: cosine LR schedule - tp03_full_reg: combined regularization Pool size: 253 total (95 STOCK + 158 non-GPU H100) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ariants findings tp03 variants sweep (16 configs, seed 1337, extended 1797d training) results: - tp03_s2272: -33.8 (best, seed 2272 is special for this config class) - tp03_wd01: -39.4 (median=+5.6%, wd=0.01 helps) - tp03_h2048: -50.0 (median=+6.0%, larger net benefits from 5yr data) - tp03_slip5/slip10: -110 to -130 (AVOID: slippage training hurts bear market generalization) - tp03_obs: -124 (AVOID: obs_norm hurts with trade_pen_03) Add best-combo configs: tp03_s2272_wd01, tp03_h2048_wd01, tp03_s2272_h2048 Pool is now 98 STOCK + 158 non-GPU H100 = 256 total Key rule: trade_pen_03 without slippage, without obs_norm, with wd=0.01 or h2048 Update H100 plan with full tp03 variants findings table and updated pool summary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mote pipeline
- autoresearch_rl.py: added 108 tp03 variants to STOCK_EXPERIMENTS:
- tp03_s777 (KNOWN WINNER on hard 201-day bear val, robust=+3.10)
- tp03_s{7,42,123,888,1111,2272,3141,4242,5678,7777,9999} for seed discovery
- tp03_wd01_s{777,42,2272} + tp03_h2048_s{777,42,2272} (best modifiers x seeds)
- tp03_seed_{1..50}: dense sequential seed sweep for H100 (expect ~17 positive)
- tp03_wd01_seed_{1..25}: wd=0.01 modifier seeds for H100
- remote_training_pipeline.py: add max_timesteps_per_sample param to
build_autoresearch_cmd() and build_remote_autoresearch_plan()
- launch_stocks_autoresearch_remote.py: add --max-timesteps-per-sample CLI arg
(default 200, gives ~4.3M steps on 1797-day 2019 training data)
Key finding: previous tp03_variants sweep used --seed 1337 override which masked
all explicit per-config seeds. The actual tp03 hit rate at native seeds needs
testing via the tp03_multiseed sweep (no global override, early-reject disabled).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…m, cuDNN
Add torch.manual_seed(args.seed) + cuda.manual_seed_all + random.seed + numpy.seed
at training startup, plus cudnn.benchmark=False for cuDNN algorithm stability.
Previously only the C environment was seeded (via vec_init/vec_reset). Network
weight initialization was non-deterministic, causing large result variance even
with identical configs. Now each --seed value produces a reproducible training
trajectory, enabling systematic seed sweeps on local hardware before H100 runs.
Key implication: tp03_seed_{1..50} dense sweep will now give reproducible results
so we can identify which seeds work on the hard 201-day bear market val before
committing to expensive H100 time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ep cap Key bugs fixed: 1. --max-timesteps-per-sample default was 200 (4.3M steps) — models need 33M+ steps to converge. Changed to 10000 (effectively no cap; 300s wall-clock is binding). 2. --stocks12 flag was never passed to autoresearch_rl.py — remote runs used the default crypto EXPERIMENTS pool instead of STOCK_EXPERIMENTS. 3. --time-budget default changed from 300 to 90 for H100 (90s x 390k steps/sec = ~35M steps ≈ local A40 300s convergence point). Root cause of recent 0/34 positive sweep: the 200-sample cap (4.3M steps) was 8x shorter than the ~33M steps needed for convergence (found in all winning models). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Key finding: 200-sample cap (4.3M steps) was the root cause of 0/34 failures. Winning models need 33-37M steps (300s on A40). Document correct H100 command: time-budget=90 + no step cap = ~35M steps on H100 ≈ A40 300s convergence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…9+s623 More 5bps verified entries: - s446: +6536% (VERY ROBUST) - already committed - s431: +1588% (robust, pool≈honest≈5bps) - s428: +1028% (VERY ROBUST, 5bps>>8bps>>pool) - s413: +979% (ROBUST, escalating cascade pool→honest→5bps) - s430: +672% (reliable, 3% drop) - s627: +1510% (pool=0.53→+1510%) - s623: +1151% (ROBUST, escalating cascade) - s617: +833% (ROBUST, 5bps>>8bps>>pool) - s620: +692% (ROBUST) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…+1241%) Confirmed entries from prior sessions + new find s832: - s832: +1163% ann (new s801-900 find, saved as champion) - s820: +1548% ann (ROBUST, confirmed from memory) - s817: +1532% ann (VERY ROBUST, 5bps>>8bps>>pool escalating) - s816: +1241% ann (ROBUST, confirmed from memory) - s744: +844% ann (ROBUST, pool=1.46→honest=2.05→5bps=2.02) - s623: +1151% ann (ROBUST, escalating cascade) - s617: +833% ann (ROBUST), s620: +692% (ROBUST) - s431: +1588% ann, s428: +1028%, s413: +979%, s430: +672% - s627: +1510% ann (pool=0.53→5bps=+1510%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…%),s834(+1917%) More major finds: - s452: +8002% ann (VERY ROBUST! 5bps>>8bps) NEW #5 all-time - s747: +2905% ann (s701-800 strong find) - s543: +1636% ann (pool=3.55≈honest=3.53) - s834: +1917% ann (pool=2.04→honest=3.52→5bps=3.40) - s835: +912% ann (VERY ROBUST, 5bps>>8bps) - s832: +1163% ann - s450: +619% ann Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…+1185%) More strong finds from continuing sweep monitoring: - s455: +1975% ann (ROBUST! 5bps>8bps>pool, s401-500 escalating) - s646: +2486% ann (VERY ROBUST! 5bps=3.97 vs 8bps=3.33, s601-700) - s837: +1312% ann (ROBUST, s801-900) - s349: +1185% ann (ROBUST, tri-consistent, s301-400) - s645: +862% ann (ROBUST, pool=0.69→+862%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updated top-10 5bps leaderboard (145 evaluated): #4 s456: +8,802% (ultra-robust: 5bps > 8bps, Sortino=6.71) #6 s452: +8,002% (ultra-robust: 5bps > 8bps, Sortino=6.65) #7 s734: +7,160% (ultra-robust: 5bps > 8bps) #10 s446: +6,536% (ultra-robust: 5bps > 8bps) #15 s827: +4,801% (ultra-robust: 5bps > 8bps) 7 sweeps ongoing: s201-900 at 55-62% complete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…56(+8802% ROBUST) New absolute record: s275 at +23,595% ann, 5bps>8bps>pool (tri-consistent). s456 enters top-5 at +8,802% ROBUST (5bps >> 8bps). 169 seeds now properly evaluated in 5bps leaderboard. New seeds: s357(+1708% ROBUST), s359(+2998%), s437(+1112%), s751(+2135%), s649(+1601%), s842(pending). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Leaderboard updated (169 evaluated at 5bps): #1 s275: +23,595% (Sortino=9.0, ultra-robust: 5bps > 8bps) #2 s240: +17,642% #3 s434: +10,359% #4 s71: +9,381% #5 s456: +8,802% (new) #6 s507: +8,273% #7 s452: +8,002% (new) Top-10 mean: +10,796% ann | 7/10 ultra-robust (5bps >= 8bps) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…42%), s845(+2662%), s760(+2385%) Batch eval of 103 seeds completes 5bps coverage. Notable new ROBUST seeds: - s467: +3242% ann, sortino=6.10 (s401-500) - s845: +2662% ann, sortino=4.71 (s801-900) - s760: +2385% ann, sortino=5.23 (s701-800) - s279: +2127% ROBUST (fixed: 5bps=3.62 >> 8bps=2.63) - s210: +4461% ROBUST confirmed - s209: +3091% ROBUST confirmed - s904: +986% ROBUST (s901-1000 not all bad!) Also fixed s277/s279 swapped entries from parallel eval. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…0(+991%) New high-value ROBUST finds: - s658: +1837% ann (s601-700), 5bps=3.31 > 8bps=3.21 - s279: +2127% ann (s201-300) ROBUST, corrected from earlier swap - s277: +1230% ann (s201-300) ROBUST - s660: +991% ann (s601-700) ROBUST - s567: +776% ann, s470: +1028% ann (overfitters) Also: s564(+1261%), s552(+665% ROBUST), s465(+855% overfitter) 196 seeds evaluated, 85 ROBUST confirmed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ew finds Extraordinary s901-1000 discoveries: - s921: 5bps=6.86, +6386% ann, sortino=5.60, ROBUST! (pool=5.76 -> honest=6.66 -> 5bps=6.86) - s914: +1839% OVERFITTER, s915: +1414% ROBUST, s920: +680% OVERFITTER s801-900: - s850: 5bps=5.86, +4869% ann, sortino=5.99, ROBUST! (pool=5.12 -> 5bps=5.86) Other new ROBUST seeds: s658(+1837%), s660(+991%), s279(+2127%), s284(+894%) Total: 206 seeds in 5bps leaderboard, 81 ROBUST. Sweeps ~65-87% complete per range. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s discovered Top seeds by 5bps annualized return: - s275: +23,595% (Sortino=9.0, ultra-robust) — all-time champion - s240: +17,642% (Sortino=7.0) - s434: +10,359% (Sortino=6.99) - s71: +9,381% (Sortino=8.29) - s456: +8,802% (Sortino=6.71, ultra-robust) New champions this session: s921 (+6,386%, ultra-robust), s850 (+4,869%) Coverage: s61-120 ✓, s121-200 ✓, others 50-80% complete Auto 5bps monitor running continuously, all seeds >800% evaluated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(+7647%), s578(+5688%), s1203(+3619%) Key new finds (all ROBUST): - s292: +19,815% ann, S=7.99 — new #2 ROBUST champion - s765: +7,647% ann, S=5.76 — new #7 ROBUST - s578: +5,688% ann, S=8.14 — new #18 ROBUST (high sortino) - s1203: +3,619% ann, S=6.43 — new from s1201-1300 range - s1202: +4,121% ann, S=5.65 — new from s1201-1300 range - s1206: +1,012% ann ROBUST, s1005: +854% ann ROBUST New ranges discovered: s1001-1100 (103 seeds), s1101-1200 (6 seeds), s1201-1300 (14 seeds) 5bps auto-monitor updated to cover all ranges up to s1300+ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Champion leaderboard (top 3 by 5bps annualized): - s670: +29,099% (Sortino=7.56, p50=15.45x/180d) — NEW ALL-TIME CHAMPION - s275: +23,595% (Sortino=9.00, ultra-robust) - s292: +20,000% (Sortino=7.99, ultra-robust) 233 seeds evaluated at 5bps; sweep ~70% complete. Updated prod.md with comprehensive top-10 leaderboard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ents, unified hourly fixes - src/robust_trading_metrics.py: new robust trading metrics module - scripts/evaluate_binance_lora_candidate.py: binance lora candidate evaluator - scripts/run_binance_crypto_lora_sweep.py: expanded binance crypto lora sweep - pufferlib_market/autoresearch_rl.py: improved autoresearch with gpu pool support - pufferlib_market/gpu_pool_rl.py: gpu pool RL training - pufferlib_market/replay_eval.py: improved replay evaluation - unified_hourly_experiment/trade_unified_hourly.py: hourly trading improvements - tests: comprehensive test coverage additions - alpacaprogress6.md: alpaca progress notes - leaderboard CSVs: mixed23 sweep results Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…neal_ent + stocks20 - scripts/stocks_deep_sweep.sh: 5-phase sweep covering: - Phase A: stocks12 tp05 s100-299 @ 35M steps - Phase B: stocks20 tp05 s1-80 @ 35M steps - Phase C: stocks12 tp03 s1-60 @ 35M steps - Phase D: stocks12 tp07 s1-60 @ 35M steps - Phase E: stocks12 anneal_ent tp05 s1-60 @ 35M steps - Inspired by crypto70: need 200+ seeds to find champions - pufferlib_market/stocks12_seed_sweep_leaderboard.csv: s51-87 results at 15M steps - s55 (med=9.38%, 5/50 neg) best of first batch — retraining at 35M - Disk cleanup: freed 110GB by removing non-champion old checkpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace manual `processed_batches += 1` counter inside the for loop with `enumerate(loader, start=1)` as required by ruff SIM113. This fixes the failing CI lint job (Fast CI / lint). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two fixes that caused pytest to be interrupted before unit tests ran: 1. Add `resolve_data_path()` to `scripts/train_crypto_lora_sweep.py`. `tests/test_train_crypto_lora_sweep.py` imported this function at module level but it did not exist, causing an ImportError during collection that triggered --maxfail=10 before any tests executed. 2. Skip jax test files in `pytest_ignore_collect` when `jax` is not installed. `test_jax_losses.py`, `test_jax_policy.py`, and `test_jax_trainer_wandboard.py` import from `binanceneural.jax_*` modules that require jax/flax, which are not included in requirements-ci.txt. This caused 3 more collection errors. Together these collection errors (4+) exceeded the --maxfail=10 limit set in the CI fast-unit-tests step, causing all unit tests to be skipped and the job to fail with exit code 2 (interrupted). Verified all 86 unit tests (marked `unit and not slow and not model_required and not cuda_required`) pass locally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Codex Infinity Start a task on this PR's branch by commenting:
Tasks and logs: https://codex-infinity.com |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18565f1330
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "tests/test_jax_losses.py", | ||
| "tests/test_jax_policy.py", | ||
| "tests/test_jax_trainer_wandboard.py", | ||
| } and not _module_available("jax"): |
There was a problem hiding this comment.
Guard JAX test collection on flax availability too
The new collection skip only checks jax, but these three test files import modules that also import flax at module import time (for example binanceneural/jax_losses.py and binanceneural/jax_policy.py). In environments where jax is installed but flax is not, pytest will still try to collect these tests and fail with import errors, so the intended CI collection fix is incomplete for that dependency combination.
Useful? React with 👍 / 👎.
Summary
Fixes the failing Fast CI (GitHub Runners) / fast-unit-tests job that was exiting with code 2 (interrupted) before any unit tests ran.
Root cause: pytest collection errors were accumulating and triggering
--maxfail=10before any@pytest.mark.unittests could execute.Two collection errors were fixed:
tests/test_train_crypto_lora_sweep.py: importedresolve_data_pathfromscripts.train_crypto_lora_sweepat module level, but the function didn't exist →ImportErrorduring collection. Added the missingresolve_data_path(symbol, data_root)function that searchesstocks/andcrypto/subdirectories before falling back to the root.tests/test_jax_losses.py,tests/test_jax_policy.py,tests/test_jax_trainer_wandboard.py: these import frombinanceneural.jax_*modules that requirejax/flax, which are not inrequirements-ci.txt. Added skip logic inpytest_ignore_collect(matching the existing pattern forpufferlib) so these files are skipped whenjaxis not installed.Test plan
tests/test_train_crypto_lora_sweep.py::test_resolve_data_path_supports_mixed_hourly_rootpasses-m "unit and not slow and not model_required and not cuda_required") pass locally🤖 Generated with Claude Code