A statistical technique — Iterative Proportional Fitting (IPF) — that adjusts the weight assigned to each record in a sample so that the aggregate distribution across multiple demographic dimensions matches known population totals. Used in synthetic persona analysis to ensure the persona set reflects actual US population distributions rather than any bias introduced by the generation process.
A persona-based product validation analysis is only as good as the demographic representativeness of its personas. If your 250 personas skew toward college-educated urban professionals in their 20s and 30s, every segment score will reflect that cohort's experience of the problem — not the experience of the population at large.
IPF weighting fixes this without requiring you to regenerate the sample. Instead of discarding personas that don't match target proportions, IPF assigns each persona a weight that makes the weighted aggregate match known Census marginals across multiple dimensions simultaneously: age, income, education, occupational category, family structure, geographic region.
The result: when you calculate the average PSF score for a segment, you're calculating it over a sample that, in aggregate, looks like the US population. High-income personas in that segment don't dominate the average just because the generation process produced more of them. Each persona's contribution is scaled by how representative that type of person is in the real population.
This matters most for counterintuitive findings. The segments that score highest on PSF are often not the obvious tech-adjacent demographics — they're nurses on rotating shifts, freelancers with irregular income, single parents managing part-time schedules. Without demographic representativeness, these groups get diluted by over-represented demographics who experience the problem less acutely.
IPF is an iterative algorithm. It starts with initial weights (often all equal to 1.0) and proceeds through a series of calibration steps:
The algorithm typically converges in 10–50 iterations for 5–8 calibration dimensions. The final weights are fractional — each persona might have a weight of 0.73 or 1.42 — and are applied when computing segment-level aggregate statistics.
Resampling — drawing a fresh set of personas that hit target proportions by construction — is an alternative, but it discards information. IPF preserves every persona while correcting the representativeness problem through weighting. For a 250-persona sample, preserving all records is especially important because segment-level subsets can be small.
In a raw synthetic generation run, high-income professionals with college degrees in urban areas might appear at twice their Census-representative frequency — because training data for LLMs skews toward that demographic's written output. IPF detects this imbalance: the weighted sum for that demographic cell exceeds the ACS marginal. Weights for those personas are scaled down to 0.5–0.7 range, while underrepresented groups (service workers, rural households, families without college degrees) are scaled up to 1.3–1.8.
When a habit app analysis runs against this weighted sample, the high-PSF segments that emerge are the ones with genuinely acute need in the real population — not the ones most over-represented in the raw generation. The 84-point score for shift workers reflects a population-representative signal, not a sampling artifact.
IPF-weighted sampling (Iterative Proportional Fitting) is a statistical technique that adjusts the weight of each sample record so the aggregate distribution across multiple demographic dimensions matches known population totals. Rather than discarding records that don't fit target proportions, IPF reweights them — preserving the full sample while correcting for imbalance across age, income, education, region, and other variables simultaneously.
A biased sample produces biased PSF scores. If your 250-persona sample over-represents college-educated urban 25–35 year olds, every segment analysis skews toward that demographic's experience of the problem. IPF weighting ensures the sample matches Census marginals — so the segment scores reflect the distribution of real people, not whoever happened to be easiest to generate.
Simple random sampling assumes a large enough pool produces a representative draw. For synthetic personas, the generation process may systematically favor certain demographic combinations. IPF corrects post-hoc by adjusting weights on existing records rather than requiring a new draw. It's a standard technique in survey methodology used by the Census Bureau and academic researchers.
Modest Idea uses target marginals derived from the US Census Bureau's American Community Survey (ACS) Public Use Microdata Sample (PUMS). Target dimensions include age groups, income brackets, educational attainment, occupational category, family structure, and geographic region. IPF iterates across these dimensions until the weighted sample matches ACS marginals within a tolerance threshold.
A 5-step process for evaluating problem-solution fit, with scoring templates and real case studies from 250-persona analyses.
Get the Free Guide →Explore demo analyses showing how demographic representativeness shapes PSF scores across segments — or run your own.
View habit app analysis →