The US Census Bureau's Public Use Microdata Sample — individual-level survey responses from the American Community Survey (ACS), statistically representative of the US population across age, income, education, occupation, and geography. PUMS provides the demographic ground truth that Modest Idea uses to build synthetic personas.
The fundamental problem with founder-led user research is survivorship bias. You interview people in your network. Your network looks like you — same education level, same city, similar income, similar professional background. The personas you build from those interviews are accurate for one demographic slice and completely blind to everyone else.
Census PUMS solves this by providing a statistically representative dataset of the actual US population. The American Community Survey is conducted by the Census Bureau every year, covering over 3.5 million households. The resulting dataset includes hundreds of variables per respondent — age, sex, race, ancestry, education, employment status, occupation, industry, income, housing costs, commute time, commute method, family composition, and more.
When Modest Idea samples 250 personas for an analysis, those 250 people look like America — not like a startup founder's LinkedIn network. This is what makes the analysis capable of surfacing unexpected high-PSF segments: night-shift nurses in the Midwest, rural small business owners, recent immigrants navigating a new financial system. These people exist in PUMS data, and their presence in the analysis is what makes the results useful rather than self-confirming.
Each PUMS record represents a real survey respondent (with identifying information removed) and includes variables such as:
Modest Idea's persona database is built from PUMS records processed through several steps. First, raw ACS records are filtered and cleaned. Then, each record is enriched with personality attributes (OCEAN scores) and behavioral characteristics that are consistent with but not directly derivable from the demographic data. Finally, the enriched personas are stored with vector embeddings that enable semantic search and similarity matching.
At analysis time, 250 personas are selected from this database using IPF-weighted sampling — a statistical technique that ensures the 250-persona sample matches known US population margins on key demographics. This prevents over-representation of any single group.
PUMS is demographic data, not behavioral or attitudinal data. It tells you that someone is a 38-year-old nurse earning $68K with a 40-minute commute — but not her personality, her technology habits, or her specific frustrations with existing products. Those dimensions are generated through the enrichment process, drawing on occupational and demographic research. The combination of PUMS demographics and AI-generated personality attributes is what makes synthetic personas both statistically grounded and behaviorally plausible.
PUMS stands for Public Use Microdata Sample. It's a dataset published by the US Census Bureau containing individual-level responses from the American Community Survey (ACS) — the largest ongoing survey of the US population. PUMS records represent real survey respondents (with identifying information removed) and include hundreds of variables: age, sex, race, education, occupation, income, housing, commute, and more.
Census PUMS provides demographic ground truth. When Modest Idea generates synthetic personas, each one starts from a real ACS record — so the distribution of ages, incomes, occupations, and geographies in our persona database reflects actual US population distributions, not the founder's assumptions. This is what makes the analysis statistically grounded rather than invented.
Get our free PSF Framework guide — a 5-step process for evaluating problem-solution fit, with scoring templates and real case studies.
Get the Free Guide →Explore demo analyses built from 250 Census PUMS-grounded synthetic personas across real product concepts.
View demo analyses →