Methodology
Every number in this report traces to a named source and a documented method. State election administrators, federal reviewers, foundation officers, and academic reviewers will find the trail below.
The question and the approach
The report answers two layered questions:
- Diagnostic. Why don't young people vote — by race, gender, and age — and what can change, at what cost, to close the gap?
- Strategic (the Second Mile thesis). The participation-equity movement has built voter registration in priority cells. It has not yet built the vote-completion infrastructure for registered young Black and Hispanic adults aged 21-32. Which interventions close that second-mile gap? See Finding 4, Finding 5, and Recommendations.
The seven findings below address question 1; the Second Mile thesis is the strategic frame they serve.
Approach. Pair one large probability survey (1.64 million Census Current Population Survey respondents, 2000-2024) with voter-file-validated state turnout benchmarks (McDonald VEP), state election-law and redistricting-institution data (NCSL plus commission-adoption events), and a peer-reviewed intervention catalog with effect sizes and per-contact costs.
Descriptive, not causal. Where the report makes causal claims — specifically around independent-redistricting-commission adoption — it uses within-state pre/post comparison with the identifying assumptions stated.
Scope. U.S. federal general elections, November 2000-2024 (13 cycles), 50 states plus DC, all four major racial and ethnic groups. Not in scope: primaries, off-year elections, sub-state analysis, district-level individual-voter analysis, partisan realignment.
Findings index
The seven empirical findings produced by this methodology. Every numeric claim on each page traces back to a source and method documented below.
- Finding 1 — The persistent gap — CPS 13-cycle senior-youth gap; race × gender within-youth spread
- Finding 2 — The midterm amplifier — cycle-type sensitivity (presidential vs midterm)
- Finding 3 — Who didn't register, and why — CPS VOYNOTREG, access barriers dominate
- Finding 4 — Registered, didn't vote, and why — CPS VOWHYNOT, the Second Mile diagnostic (50% logistical, 24% engagement, 9% access)
- Finding 5 — Method preference: the ROI spine — CPS VOTEHOW × state availability; revealed-not-stated preference
- Finding 6 — State policy as the lever — NCSL × CPS calibrated to VEP; same-day registration + mail expansion
- Finding 7 — Institutional structure — redistricting method × competitiveness × youth turnout
- Recommendations — cell-level intervention ROI catalog (23 interventions × effect range × cost range × evidence quality)
Deep-dive interactive explorers: State Youth-Gap Explorer, Race & Demographics.
The five sources that drive every ROI claim
1. CPS November Voting Supplement (via IPUMS)
- What it is: Census Bureau biennial survey asking adult citizens whether they voted, with full demographic breakouts. IPUMS harmonizes variable codings across all 13 cycles.
- Our extract: 1.64 M respondents, 2000-2024.
- How we measure turnout: per Census P20 methodology — only explicit "voted yes" (
VOTED=2) counts as voter; all other codes (didn't vote, refused, don't know, no response, not-in-universe) count as non-voter. Denominator is the full citizen adult population. This is the method Census uses in its own published Voting and Registration tables. - How we measure why people didn't vote: two CPS variables (
VOWHYNOT,VOYNOTREG) ask non-voters and non-registrants for their reason directly. We harmonize individual codes into five policy-relevant categories:- engagement — not interested, vote wouldn't matter, didn't like candidates
- access — registration problems, missed deadlines, didn't know how
- logistical — too busy, out of town, forgot, transportation, weather
- personal — illness, ineligible, language
- other — residual
- Headline finding from this diagnostic: across 2000-2024, youth 18-29 non-voters cite logistical barriers 50% of the time, engagement concerns 24%, access barriers 9%. The dominant causes are policy-addressable.
- Self-report bias: CPS over-reports voting by 3-8 pp vs voter-file validation. This is disclosed and calibrated against McDonald VEP (source 2).
2. McDonald United States Elections Project (VEP turnout)
- What it is: state-by-state turnout as a share of the voting-eligible population — citizens 18+, excluding ineligible felons, plus eligible overseas voters. This is the voter-file-validated benchmark Census itself cites.
- Coverage: 1980-2022, all 50 states (v1.2, October 2024, hosted at the University of Florida Election Lab).
- 2024 extension: state totals from MIT Election Data Science Lab (MEDSL) official certified returns ÷ ACS CVAP denominators.
- 2020 backfill: MEDSL moved its 2020 dataset behind a paywall; we re-sourced state margins from FEC-certified state totals and cross-checked against four well-known anchor margins (GA +0.24pt D, FL 3.36pt R, AZ +0.31pt D, WY 43pt R) — all within 0.02pt of expected.
- How we use it: every state-level turnout claim is compared to matching VEP. Where CPS−VEP residual exceeds 3 pp, we disclose it. Across 50 states × 2020 + 2022 (102 state-years), the CPS mean residual is +2.6 pp.
- Hur-Achen adjusted alternative: we also publish
turnout_pct_adj— Hur & Achen (2013) state-level post-stratification that matches CPS rates to VEP by construction. Atcohort='all'state level the match is exact. Published alongside the Census-P20 value; neither is "the right answer" in all contexts.
3. NCSL state election law tracker
- What it is: the National Conference of State Legislatures' authoritative compilation of state election-administration laws. We coded six policies directly relevant to youth turnout and registration: Automatic Voter Registration, Same-Day Registration, Online Voter Registration, pre-registration for 16-/17-year-olds, no-excuse absentee voting, and universal vote-by-mail.
- Coverage: 155 state-policy rows, 6 policies, 48 states + DC, with effective dates and source URLs. Effective dates backfilled from state statute citations or NCSL historical summaries. Where a policy has been in continuous effect since before our 2000 window, we use a conservative "1990" default and flag it in the notes.
- How we use it: per state per year, we know whether each policy was in effect. Sum is
policy_score(0-6). Correlations with state youth turnout are suggestive, not causal. For the two policy reforms where the literature supports causal attribution (same-day registration, mail-ballot expansion), we cite peer-reviewed estimates directly.
4. Independent redistricting commission adoption events
- What it is: the canonical list of U.S. states that moved congressional redistricting authority from the legislature to a commission via ballot initiative or constitutional amendment: Arizona 2000, California 2010, Colorado 2018, Michigan 2018, New York 2014, Virginia 2020, Utah 2018, New Mexico 2021, Ohio 2018.
- How we use it: natural-experiment design. We compute state turnout in the first redistricting cycle after the commission became effective, and compare against the same state's turnout before the event plus matched comparison states. This is the closest we get to a causal claim in the structural analysis.
- Framing: we describe the institutional change (legislature-drawn → commission-drawn), never its partisan consequences. "District non-competitiveness correlates with depressed turnout" is the language; "gerrymandering suppresses votes" is not. The finding is equally valid regardless of which party benefits.
5. Intervention effect-size catalog
- What it is: 23 specific election interventions across six categories — policy reform, administrative improvement, programmatic GOTV, civic infrastructure, institutional reform, and (explicitly) failed-or-overclaimed. Primary source: Gerber & Green, Get Out The Vote (4th ed., Brookings 2019), supplemented by Brennan Center cost-benefit analyses and peer-reviewed follow-ups.
- Each row contains: intervention, effect-size range (low-high, with CI where available), youth-specific multiplier where documented, per-contact cost, per-marginal-voter cost, evidence-quality rating, and time horizon.
- Ranges not point estimates. GOTV literature has genuine uncertainty, and this report is more credible for acknowledging it than pretending otherwise. A peer-to-peer text campaign might move turnout 0.5-1.5 pp per contact at $40-$100 per marginal voter — not "1 pp at $70/voter" with spurious precision.
- Includes a "do-not-fund" section. We name interventions where the evidence does not support the claims: celebrity "go vote" campaigns, untargeted digital ads, generic SMS broadcasts without peer-to-peer, and brand-awareness GOTV. Credibility here depends on being willing to say what doesn't work.
Standard errors and confidence
Every published rate carries a 95% confidence interval. We use Census Bureau generalized variance function parameters (CPS November 2022 technical documentation, Tables 7-11):
SE(p) = sqrt(b × p × (100 − p) / y)
where p is the rate, y the weighted base, b the Census-published parameter for the relevant geography (state, region, division, national). This is the same method Census uses for its published P20 tables. CPS does not publish replicate weights for the November voter supplement, so Fay's BRR is unavailable — the generalized variance function is the defensible alternative.
Suppression rule: cells with fewer than 400 unweighted respondents are flagged "indicative only" or suppressed. For youth 18-29 in smaller states, this matters; we do not publish a single-decimal-point turnout rate for a 145-person cell.
What we tried that didn't deliver
Credible methodology includes admitting the paths that didn't pan out.
- ANES Cumulative Data File as a cell-level turnout cross-validator. For the single well-anchored cell (2020 White 18-29), CPS and ANES agreed within 0.6 pp — a tight validation. But for race-disaggregated youth cells, ANES sample sizes (n=56 for Black youth 2020; n=79 for Hispanic youth 2020) produce noisy rates that don't correlate meaningfully with CPS at the cell level (r = −0.16 across 73 cells). ANES remains valuable for national-level attitudinal trends (efficacy, trust, political interest) but is not a reliable turnout cross-validator for the cohorts we care about. We disclose this rather than paper over it.
- EAC EAVS state registration rates vs CPS registration. CPS runs ~22 pp lower than EAVS roll totals in 2020-2022. This is a known phenomenon — EAVS rolls include inactive records, federal-employee registrations (DC reports 124% of CVAP), and pre-removal lag — not a CPS problem. For citizen-population rate analysis we use CPS; for roll-health analysis we use EAVS. We don't mix them.
- Harvard IOP pre-election "definitely vote" as a turnout indicator. IOP intent correlates strongly with actual turnout (Pearson r = 0.944 across five cycles), but stated intent overstates actual turnout by 5-9 pp in recent cycles. We use IOP as a directional leading indicator only, not as an absolute turnout estimate.
Nonpartisan framing commitment
Institutional findings — redistricting, commission adoption, state policy variation — are expressed in terms of competitiveness and governance structure, not partisan advantage. This is a condition of 501(c)(3) compatibility and a necessary posture for a report pitched at Secretaries of State from both parties, federal administrators, and funders across the civic-philanthropy spectrum.
- "District non-competitiveness correlates with depressed turnout, particularly among voters 18-29" — yes.
- "State redistricting reform through independent commission adoption correlates with turnout increases in the first post-adoption cycle" — yes.
- "Gerrymandering suppresses votes" — no.
- "Party X benefits from these reforms" — no.
The empirical findings are equally valid regardless of which party benefits. The framing is deliberate.
Reproducibility
Every number can be regenerated from the code and data in the project repository:
- Pipeline:
scripts/process_cps.py,scripts/build_state_year_panel.py,scripts/aggregate_eavs.py,scripts/build_warehouse.py - Build-time precompute (Sprint 5+):
scripts/precompute_findings.py+ declarative manifest atscripts/precompute/manifest.py. Every chart on every finding page reads its data from JSON precomputed against the parquet sources at build time, eliminating any browser-side SQL execution. JSON files live undersrc/data/findings/{slug}/. - Validation harness:
scripts/validate_panel.py— eight checks, all passing. - Render-test harness:
scripts/test-render.mjs— Playwright headless smoke across every demo-path route; asserts SVG count + zero console errors. - Cross-validation outputs:
data/external/cross_validation/{V2,V3,V4}_findings.md - Full data provenance:
DATA_INVENTORY.md
All source data is publicly available and free. Raw IPUMS and ANES microdata require free user registration at cps.ipums.org and electionstudies.org respectively; otherwise unrestricted.
Interactive cell exploration. Free-range multi-dimensional cell exploration (arbitrary age × race × region × education selection) ships in Phase 1 of the platform roadmap. The v1 demo focuses on the cell-targeted findings indexed above; intermediate-depth interactive exploration is available now through State Youth-Gap Explorer and Race & Demographics.
Funding and affiliations
Predictive Pace operates as movement infrastructure for the participation-equity ecosystem — open-methodology, foundation-funded, free-tier-by-default cell-level intelligence for civic 501(c)(3)s, Secretaries of State, foundations, and academic partners working to close the participation gap. We succeed when the orgs we power achieve cell-level lifts they couldn't measure or target without us.
This v1 report is produced as pro bono public-interest research with no external sponsor at time of publication. A 501(c)(3) entity restructure is in progress on a 60-90 day timeline; until completed, work is delivered through Predictive Pace LLC. Should scope later be extended under a funded engagement — state Secretary of State office, foundation strategic-infrastructure grant, academic partnership, or movement-partner co-brand — that relationship will be disclosed here and in the commit history.
Comparable mission-infrastructure model precedents include ProPublica, OpenElections, Pol.is / The Computational Democracy Project, Bridgespan Group, and Results for America — foundation-funded public-interest infrastructure organizations sustained by grants and earned revenue rather than subscription SaaS.
Methodology open-source commitment. The intervention catalog, cell-targeting framework, suppression rules, and measurement methodology are published under a permissive license alongside the code. Defensibility comes from integration, delivery, and intelligence — not from closed-source IP.
Quick reference — what to use when
| Question | Primary source | Notes |
|---|---|---|
| Turnout in [state] in [year]? | McDonald VEP | CPS usable with +2-8pp over-report caveat |
| Youth turnout in [state] in [year]? | CPS (18-29 weighted) | Harvard IOP for intent, not actual |
| Why didn't young people vote? | CPS VOWHYNOT diagnostic |
50% logistical, 24% engagement, 9% access |
| Does [state policy] increase turnout? | NCSL × VEP, within-state pre/post | Intervention catalog for meta-analytic effect |
| Cost per marginal voter of [intervention]? | Intervention catalog | Peer-reviewed range with CI; no point estimates |
| Political efficacy among 18-29? | ANES + Harvard IOP + Pew | Attitudinal only; not turnout |
| 2020 presidential margin in [state]? | FEC-certified 2020 backfill | Cross-checked vs published anchor margins |
| Citizens 18+ in [state]? | ACS CVAP 2019-2023 | Single-vintage; match-year denominators a v2 task |
Anything not on this table is either not in this report or requires an explicit methodological disclosure in the section that uses it.
Last updated: 2026-05-11. This methodology page reflects the analytical state after Sprint 4 (seven cell-level findings + recommendations) and Sprint 5 Day 2 (build-time precomputation for demo reliability — every numeric claim on every finding page now traces to deterministic JSON regenerable from the manifest at scripts/precompute/manifest.py). Subsequent refinements (additional CVAP vintages, broader race-disaggregated validation, updated effect-size literature, Phase 1 interactive cell-explorer) will be noted in commit history and reflected here as they land.