Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`twowayfeweights()`** — standalone helper function for the TWFE decomposition diagnostic (Theorem 1 of de Chaisemartin & D'Haultfœuille 2020), available without instantiating the full estimator. Returns a `TWFEWeightsResult` with per-cell weights, fraction negative, `sigma_fe`, and `beta_fe`.
- **`generate_reversible_did_data()`** — new generator in `diff_diff.prep` producing reversible-treatment panel data for testing and tutorials. Patterns: `single_switch` (default, A5-safe), `joiners_only`, `leavers_only`, `mixed_single_switch`, `random`, `cycles`, `marketing`. Returns columns `group`, `period`, `treatment`, `outcome`, `true_effect`, `d_lag`, `switcher_type`.
- **REGISTRY.md `## ChaisemartinDHaultfoeuille` section** — single canonical source for dCDH methodology, equations, edge cases, and all documented deviations from the R `DIDmultiplegtDYN` reference implementation. Cites the AER 2020 paper and the dynamic companion paper (NBER WP 29873) by reference; primary papers are upstream sources, not in-repo files.
- **Phase 2: Multi-horizon event study for `ChaisemartinDHaultfoeuille`** — adds `L_max` parameter to `fit()` for computing `DID_l` at horizons `l = 1, ..., L_max` using the per-group building block from Equation 3 of the dynamic companion paper. Ships:
- Per-horizon point estimates and cohort-recentered analytical SE
- Dynamic placebos `DID^{pl}_l` with dual eligibility condition (Web Appendix Section 1.1)
- Normalized estimator `DID^n_l = DID_l / delta^D_l` (Section 3.2)
- Cost-benefit aggregate `delta` (Section 3.3, Lemma 4) — becomes `overall_att` when `L_max > 1`
- Sup-t simultaneous confidence bands via multiplier bootstrap
- `plot_event_study()` integration with `<50%` switcher warning for far horizons
- `to_dataframe(level="event_study")` and `to_dataframe(level="normalized")` output
- Per-horizon bootstrap with bootstrap SE/CI/p-value propagation to event_study_effects
- `L_max=None` (default) preserves exact Phase 1 behavior

## [3.0.1] - 2026-04-07

Expand Down
34 changes: 30 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1157,7 +1157,7 @@ EfficientDiD(

`ChaisemartinDHaultfoeuille` (alias `DCDH`) is the only library estimator that handles **non-absorbing (reversible) treatments** — treatment can switch on AND off over time. This is the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles.

Phase 1 ships the contemporaneous-switch estimator `DID_M` from the AER 2020 paper, which is mathematically identical to `DID_1` (horizon `l = 1`) of the dynamic companion paper (NBER WP 29873). Phase 2 will add multi-horizon event-study output `DID_l` for `l > 1` on the same class; Phase 3 will add covariate adjustment.
Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter. Phase 3 will add covariate adjustment.

```python
from diff_diff import ChaisemartinDHaultfoeuille
Expand Down Expand Up @@ -1205,14 +1205,40 @@ ChaisemartinDHaultfoeuille(

| Field | Description |
|-------|-------------|
| `overall_att`, `overall_se`, `overall_conf_int` | `DID_M` and inference (cohort-recentered analytical SE by default; multiplier-bootstrap percentile inference when `n_bootstrap > 0`) |
| `overall_att`, `overall_se`, `overall_conf_int` | `DID_M` when `L_max=None`; cost-benefit `delta` when `L_max > 1` (delta-method SE from per-horizon SEs) |
| `joiners_att`, `leavers_att` | Decomposition into the joiners (`DID_+`) and leavers (`DID_-`) views |
| `placebo_effect` | Single-lag placebo (`DID_M^pl`) point estimate |
| `per_period_effects` | Per-period decomposition with explicit A11-violation flags |
| `twfe_weights`, `twfe_fraction_negative`, `twfe_sigma_fe`, `twfe_beta_fe` | Theorem 1 decomposition diagnostic |
| `n_groups_dropped_crossers`, `n_groups_dropped_singleton_baseline` | Filter counts (multi-switch groups dropped before estimation; singleton-baseline groups excluded from variance) |
| `n_groups_dropped_never_switching` | Backwards-compatibility metadata. Never-switching groups participate in the variance via stable-control roles; this field is no longer a filter count. |

**Multi-horizon event study** (Phase 2 - pass `L_max` to `fit()`):

```python
results = est.fit(data, outcome="outcome", group="group",
time="period", treatment="treatment", L_max=5)

# Per-horizon effects with analytical SE
for horizon in sorted(results.event_study_effects):
e = results.event_study_effects[horizon]
print(f" l={horizon}: DID_l={e['effect']:.3f} (SE={e['se']:.3f})")

# Cost-benefit delta (becomes overall_att when L_max > 1)
print(f"Cost-benefit delta: {results.cost_benefit_delta['delta']:.3f}")

# Normalized effects: DID^n_l = DID_l / l (for binary treatment)
for horizon in sorted(results.normalized_effects):
print(f" DID^n_{horizon} = {results.normalized_effects[horizon]['effect']:.3f}")

# Event study DataFrame (includes placebos as negative horizons)
df = results.to_dataframe("event_study")

# Plot (integrates with plot_event_study)
from diff_diff import plot_event_study
plot_event_study(results)
```

**Standalone TWFE decomposition diagnostic** (without fitting the full estimator):

```python
Expand All @@ -1226,13 +1252,13 @@ print(f"Fraction of negative weights: {diagnostic.fraction_negative:.3f}")
print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}")
```

> **Note:** The Phase 1 placebo SE is intentionally `NaN` with a warning. The dynamic companion paper Section 3.7.3 derives the cohort-recentered analytical variance for `DID_l` only — not for the placebo `DID_M^pl`. Phase 2 will add multiplier-bootstrap support for the placebo via the dynamic paper's machinery. Until then, the placebo point estimate is meaningful but its inference fields are NaN-consistent (and `results.placebo_se`, `results.placebo_p_value`, etc. remain `NaN` even when `n_bootstrap > 0`).
> **Note:** Placebo SE is `NaN` for both the single-lag `DID_M^pl` and the dynamic placebos `DID^{pl}_l`. The point estimates are meaningful for visual pre-trends inspection; formal placebo inference (influence-function derivation) is deferred to a follow-up. See `REGISTRY.md` for the full contract.

> **Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula to be consistent with the point estimate. Each drop emits an explicit warning.

> **Note:** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels — see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs.

> **Note:** Survey design (`survey_design`), event-study aggregation (`aggregate`), covariate adjustment (`controls`), and HonestDiD integration (`honest_did`) are not yet supported. They raise `NotImplementedError` with phase pointers see [`ROADMAP.md`](ROADMAP.md) for the full multi-phase rollout.
> **Note:** Survey design (`survey_design`), covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), and HonestDiD integration (`honest_did`) are not yet supported. They raise `NotImplementedError` with phase pointers - see [`ROADMAP.md`](ROADMAP.md) for the Phase 3 rollout.

### Triple Difference (DDD)

Expand Down
18 changes: 9 additions & 9 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,18 +148,18 @@ The dynamic companion paper subsumes the AER 2020 paper: `DID_1 = DID_M`. The si

### Phase 2: Dynamic event study (multiple horizons)

*Goal: Add `aggregate="event_study"` mode to the same class. Loops the Phase 1 machinery over horizons `l = 1, ..., L`. No API breakage from Phase 1. No new tutorial the comprehensive tutorial waits for Phase 3.*
*Goal: Add multi-horizon event study to the same class via the `L_max` parameter. Loops the Phase 1 machinery over horizons `l = 1, ..., L`. No API breakage from Phase 1. No new tutorial - the comprehensive tutorial waits for Phase 3.*

| Item | Priority | Status |
|------|----------|--------|
| **2a.** Multi-horizon `DID_l` via the cohort framework, with horizon parameter `L_max` | HIGH | Not started |
| **2b.** Multi-horizon analytical SE (same plug-in formula looped over horizons) | HIGH | Not started |
| **2c.** Dynamic placebos `DID^{pl}_l` for pre-trends testing (Web Appendix Section 1.1 of dynamic paper) | HIGH | Not started |
| **2d.** Normalized estimator `DID^n_l` (Section 3.2 of dynamic paper) | MEDIUM | Not started |
| **2e.** Cost-benefit aggregate `delta` (Section 3.3 of dynamic paper, Lemma 4) | MEDIUM | Not started |
| **2f.** Simultaneous (sup-t) confidence bands for event study plots | MEDIUM | Not started |
| **2g.** `plot_event_study()` integration; `< 50%`-of-switchers warning for far horizons | MEDIUM | Not started |
| **2h.** Parity tests vs `did_multiplegt_dyn` for multi-horizon designs | HIGH | Not started |
| **2a.** Multi-horizon `DID_l` via per-group `DID_{g,l}` building block, with `L_max` parameter | HIGH | Shipped |
| **2b.** Multi-horizon analytical SE (cohort-recentered plug-in per horizon) | HIGH | Shipped |
| **2c.** Dynamic placebos `DID^{pl}_l` for pre-trends testing (Web Appendix Section 1.1 of dynamic paper) | HIGH | Shipped (point estimates; SE deferred) |
| **2d.** Normalized estimator `DID^n_l` (Section 3.2 of dynamic paper) | MEDIUM | Shipped |
| **2e.** Cost-benefit aggregate `delta` (Section 3.3 of dynamic paper, Lemma 4) | MEDIUM | Shipped |
| **2f.** Simultaneous (sup-t) confidence bands for event study plots | MEDIUM | Shipped |
| **2g.** `plot_event_study()` integration; `< 50%`-of-switchers warning for far horizons | MEDIUM | Shipped |
| **2h.** Parity tests vs `did_multiplegt_dyn` for multi-horizon designs | HIGH | Shipped (point estimates; SE/placebo parity deferred) |

### Phase 3: Covariates, extensions, and tutorial

Expand Down
101 changes: 101 additions & 0 deletions benchmarks/R/generate_dcdh_dynr_test_values.R
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,107 @@ scenarios$hand_calculable_worked_example <- list(
results = extract_dcdh_l1(res5)
)

# ---------------------------------------------------------------------------
# Phase 2: Multi-horizon scenarios (effects > 1)
# ---------------------------------------------------------------------------

# Helper: extract multi-horizon results from did_multiplegt_dyn output
extract_dcdh_multi <- function(res, n_effects, n_placebos = 0) {
effects <- res$results$Effects
if (is.null(effects)) {
stop("did_multiplegt_dyn returned no Effects; check the input data")
}

out <- list(effects = list(), placebos = list())

for (i in seq_len(min(n_effects, nrow(effects)))) {
out$effects[[as.character(i)]] <- list(
overall_att = as.numeric(effects[i, "Estimate"]),
overall_se = as.numeric(effects[i, "SE"]),
overall_ci_lo = as.numeric(effects[i, "LB CI"]),
overall_ci_hi = as.numeric(effects[i, "UB CI"]),
n_switchers = as.numeric(effects[i, "N"])
)
}

placebos <- res$results$Placebos
if (!is.null(placebos) && n_placebos > 0) {
for (i in seq_len(min(n_placebos, nrow(placebos)))) {
out$placebos[[as.character(i)]] <- list(
effect = as.numeric(placebos[i, "Estimate"]),
se = as.numeric(placebos[i, "SE"]),
ci_lo = as.numeric(placebos[i, "LB CI"]),
ci_hi = as.numeric(placebos[i, "UB CI"])
)
}
}

out
}

# Scenario 6: joiners_only multi-horizon (L_max=3, placebo=3)
# Uses n_periods=8 to give enough room for 3 positive + 3 placebo horizons
cat(" Scenario 6: joiners_only_multi_horizon\n")
d6 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 8,
pattern = "joiners_only", seed = 106)
res6 <- did_multiplegt_dyn(
df = d6, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, placebo = 3, ci_level = 95
)
scenarios$joiners_only_multi_horizon <- list(
data = export_data(d6),
params = list(pattern = "joiners_only", n_groups = N_GOLDEN, n_periods = 8,
seed = 106, effects = 3, placebo = 3, ci_level = 95),
results = extract_dcdh_multi(res6, n_effects = 3, n_placebos = 3)
)

# Scenario 7: leavers_only multi-horizon (L_max=3, placebo=3)
cat(" Scenario 7: leavers_only_multi_horizon\n")
d7 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 8,
pattern = "leavers_only", seed = 107)
res7 <- did_multiplegt_dyn(
df = d7, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, placebo = 3, ci_level = 95
)
scenarios$leavers_only_multi_horizon <- list(
data = export_data(d7),
params = list(pattern = "leavers_only", n_groups = N_GOLDEN, n_periods = 8,
seed = 107, effects = 3, placebo = 3, ci_level = 95),
results = extract_dcdh_multi(res7, n_effects = 3, n_placebos = 3)
)

# Scenario 8: mixed_single_switch multi-horizon (L_max=5, placebo=4)
# Uses n_periods=10 for far horizons
cat(" Scenario 8: mixed_single_switch_multi_horizon\n")
d8 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
pattern = "mixed_single_switch", seed = 108)
res8 <- did_multiplegt_dyn(
df = d8, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 5, placebo = 4, ci_level = 95
)
scenarios$mixed_single_switch_multi_horizon <- list(
data = export_data(d8),
params = list(pattern = "mixed_single_switch", n_groups = N_GOLDEN, n_periods = 10,
seed = 108, effects = 5, placebo = 4, ci_level = 95),
results = extract_dcdh_multi(res8, n_effects = 5, n_placebos = 4)
)

# Scenario 9: joiners_only long panel multi-horizon (L_max=5, placebo=5)
# Uses n_periods=12 and n_groups=80 for thorough coverage
cat(" Scenario 9: joiners_only_long_multi_horizon\n")
d9 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 12,
pattern = "joiners_only", seed = 109)
res9 <- did_multiplegt_dyn(
df = d9, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 5, placebo = 5, ci_level = 95
)
scenarios$joiners_only_long_multi_horizon <- list(
data = export_data(d9),
params = list(pattern = "joiners_only", n_groups = N_GOLDEN, n_periods = 12,
seed = 109, effects = 5, placebo = 5, ci_level = 95),
results = extract_dcdh_multi(res9, n_effects = 5, n_placebos = 5)
)

# ---------------------------------------------------------------------------
# Write output
# ---------------------------------------------------------------------------
Expand Down
Loading
Loading