feat: survey-aware power analysis (SurveyPowerConfig + deff)#292
feat: survey-aware power analysis (SurveyPowerConfig + deff)#292
Conversation
Connect survey infrastructure to power module: - SurveyPowerConfig dataclass for simulation-based survey power - survey_config param on simulate_power/mde/sample_size swaps DGP to generate_survey_did_data and injects SurveyDesign into fit() - deff param on closed-form PowerAnalysis methods and convenience functions - 9 estimators supported, 3 blocked with clear error (factor model DGPs) - DGP truth (Kish DEFF, realized ICC) reported in simulation results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sign - Add REGISTRY.md notes for analytical deff parameter (variance/sample-size inflation formulas, deff vs rho distinction) and survey_config simulation path (supported estimators, mutual exclusivity, protected keys) - Cache SurveyDesign in SurveyPowerConfig._build_survey_design() to avoid rebuilding per simulation iteration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment I found three unmitigated Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…tronger tests - Fix P1: basic/TWFE/MultiPeriod survey adapters now derive `ever_treated` (time-invariant group indicator) instead of using the DGP's post-only `treated` column, which caused rank-deficient design matrices - Fix P1: MultiPeriodDiD adapter now passes `unit="unit"` for proper time-varying-treatment validation - Fix P1: simulate_sample_size auto-bracketing starts from max(min_n, abs_min) and clamps early-return to abs_min, enforcing min_viable_n contract - Fix P2: strengthen tests to assert finite mean_estimate/mean_se (not just 0 <= power <= 1), add large-effect floor regression test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…et, docstrings - P0: reject heterogeneous_te_by_strata=True in survey power validation (DGP population_att diverges from input treatment_effect, making bias/coverage/RMSE metrics misleading) - P1: fix simulate_sample_size auto-bracketing hi to respect abs_min (hi = max(2*lo, abs_min, 100) prevents probing below survey floor) - P2: add survey_config parameter to simulate_power/mde/sample_size docstrings with constraints and supported estimators - P2: add regression tests for heterogeneous_te rejection and large-floor (n_strata=10, psu_per_stratum=10, min_viable_n=200) auto-bracketing branch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality No new findings in the changed code. Performance No new findings in the changed code. Maintainability No new findings in the changed code. Tech Debt No mitigating Security No new findings in the changed code. Documentation/Tests
Path to Approval
|
…_config P0: reject data_generator_kwargs with te_covariate_interaction != 0.0 or covariate_effects when survey_config is active. These DGP params make the realized population ATT diverge from the scalar treatment_effect input, which would misstate bias/coverage/RMSE metrics in simulation power. Add regression tests for both rejection paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated finding: Executive Summary
Methodology Re-review note: the earlier P0 around survey DGP truth mismatch from covariate-interaction heterogeneity is resolved by the new explicit rejection path. diff_diff/power.py:L1916-L1945 tests/test_power.py:L2398-L2423
Code Quality No new findings in the changed code. Performance No new findings in the changed code. Maintainability No new findings in the changed code. Tech Debt No existing Security No new findings in the changed code. Documentation/Tests
Path to Approval
|
…fects block - P1: reject panel=False in data_generator_kwargs for panel-only estimators (TWFE, SA, Imputation, TwoStage, Stacked, Efficient); only CS supports repeated cross-sections under survey_config - P2: remove over-broad covariate_effects rejection (it affects baseline outcomes and ICC calibration, not realized population_att) - Add regression tests: panel=False rejected for TWFE, allowed for CS(panel=False) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated finding: the prior Executive Summary
Methodology Re-review status: the prior
Code Quality No new findings in the changed code beyond the panel-alignment validation gap above. Performance No new findings in the changed code. Maintainability No new findings in the changed code. Tech Debt No Security No new findings in the changed code. Documentation/Tests
Path to Approval
|
P1: validate that CallawaySantAnna.panel matches the survey DGP's panel flag in both directions: - CS(panel=True) + DGP panel=False -> rejected - CS(panel=False) + default DGP (panel=True) -> rejected - Panel check now runs even with empty data_generator_kwargs Add regression tests for both misaligned CS combinations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated finding: Severity Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
… conflicts, docs - P1: reject estimator_kwargs["survey_design"] when survey_config is set (prevents silently overwriting injected design; use SurveyPowerConfig.survey_design) - P2: add icc/psu_re_sd and weight_cv/weight_variation mutual-exclusion validation to SurveyPowerConfig.__post_init__ (mirrors DGP validation) - P3: update REGISTRY.md survey_config note to document panel=False coupling with CallawaySantAnna and estimator_kwargs survey_design restriction - Add regression tests for all three guards Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings remain. The prior re-review blockers are addressed: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Validation note: I could not run |
- Reject non-finite deff (NaN, inf) in _validate_deff() - Add weight_cv finiteness and psu_period_factor non-negativity checks to SurveyPowerConfig.__post_init__ (mirrors DGP validation) - Add regression tests for all new edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- P1: reject control_group='not_yet_treated' and clean_control='strict' with survey_config (require multi-cohort DGP that survey path doesn't support) - P1: reject strata_sizes in data_generator_kwargs for simulate_sample_size (sum(strata_sizes) == n_units but n_units varies during bisection) - Update REGISTRY.md survey-power note to document these restrictions - Add regression tests for all three rejection paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- P1: reject control_group='last_cohort' (EfficientDiD) with survey_config (needs multi-cohort DGP, same as not_yet_treated) - P2: add psu_re_sd and fpc_per_stratum finiteness validation to SurveyPowerConfig.__post_init__ - Update REGISTRY.md to list last_cohort alongside other restrictions - Add regression tests for last_cohort rejection and scalar validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Needs changes. The prior re-review blockers around survey-path control-group rejection and Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- P1: fix _snap_n() to honor floor even when grid_step==1 (non-DDD), so explicit n_range with survey_config is clamped to min_viable_n - Add regression test for n_range=(10, 200) with min_viable_n=80 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good. The prior re-review blocker on survey Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
SurveyPowerConfigdataclass for simulation-based survey power analysissurvey_configparameter tosimulate_power,simulate_mde,simulate_sample_size- swaps DGP togenerate_survey_did_dataand injectsSurveyDesigninto estimatorfit()deffparameter to closed-formPowerAnalysismethods and convenience functions (compute_power,compute_mde,compute_sample_size)deffandsurvey_configmethodology in REGISTRY.md PowerAnalysis sectionMethodology references (required if estimator / math changes)
deffis a simple multiplicative inflation (not decomposed into clustering/weighting/stratification components) - the simulation path handles the full complexity, closed-form is deliberately approximateValidation
tests/test_power.py- 32 new tests inTestSurveyPowerclass (simulation smoke tests for all 9 estimators, rejection tests for 3 unsupported, metadata/DEFF verification, closed-form deff correctness, validation edge cases)Security / privacy
Generated with Claude Code