Metascience

Sweat the Small Stuff

From immigration to family policy, inaccurate and missing federal data undermine major policy reforms
April 2nd 2025

This piece was originally published in Commonplace on January 21st, 2025.

Lovers of grand reform proposals have a new champion, in the Musk vehicle of the Department of Government Efficiency. On the menu: slashing administrative bloat and over-regulation, cashiering rogue civil servants, and bringing a little Silicon Valley efficiency to legacy institutions. But while these may be worthy reform goals, spare a thought for an unheralded area for reform: federal data collection. While Elon posts on Twitter about his grand plans, U.S. Citizenship and Immigration Services still uses paper records stored in caves. Not every case of federal data management is as egregious, but there are a number of straightforward fixes reformers can make. It’s too hard to collect data from American citizens, even when they want to provide it. The data the federal government does collect is often impossible to use. And there are obvious places where the federal government can do more to collect and distribute useful data.

A focus on federal data collection can solve a few problems simultaneously. Paradoxically, it can reduce the administrative burden on citizens of providing the government with information on taxes or permits (by allowing cross-referencing and user research). Much of sociology and economics relies on the federal government as the only actor with the bird’s-eye view to collect relevant data. And most importantly, it can make the federal government much more responsive to the needs of its citizens. 

Collecting better data from citizens

Good-governance wonks like to gripe about the Paperwork Reduction Act. Passed in 1980, the PRA is aimed at reducing the total paperwork burden placed on Americans by creating procedural guardrails before federal agencies can collect personal information. But the well-intentioned PRA has had a deleterious side effect: it has functionally banned agencies from doing “user research” to better understand the Americans it serves. 

In one instance, the Department of Veterans Affairs rolled out a new online portal to enroll in benefits, and later wondered why only eight veterans had used it. It turned out that the VA had accidentally architected the portal, so that it could only be accessed if users were browsing on Internet Explorer and had a particular Adobe suite installed: the particular set-up required for VA employees themselves. The portal had been built without talking to users: the agency believed it could not legally ask veterans to test it without running afoul of the PRA. Other agencies have similar horror stories. The net effect is more paperwork and administrative burden, not less. 

The Biden administration moved to make user research easier, but the new administration can go further by actively encouraging A/B testing and user research to ensure that federal forms are as easy as possible to fill out. Ernie Tedeschi, former chief economist at Biden’s council of economic advisers, told us he thinks there is more to be done: “I think we should look into new mechanisms for people to fill out flagship government surveys, e.g. apps so that participants can fill out the CPS [Current Population Survey] more easily.”

Improving existing data collection efforts

But it’s not just in its interactions with citizens that the federal government is falling short on data collection. The issue extends to all manner of fields where improving data collection would make policymaking easier.

Take crime. As the Manhattan Institute’s Charles Lehman explains, “our method for collecting data on crime is woefully antiquated.” In 2021, the FBI attempted to migrate from one standard of statistical collection to another, more detailed one. Across the country, that migration often simply didn’t happen, with police departments covering for more than a third of the country failing to submit data. Some of those departments, like the San Francisco PD, are in cities in the national spotlight for crime and disorder. San Francisco plans on finally submitting crime data to the FBI this year, for the first time since 2021. Lehman recommends Congress take two steps to improve crime reporting: the prosaic step is to link federal public-safety funds to successful data reporting, while the more exciting option proposes “establishing a national ‘sentinel city’ program to report crime data in near-real time.”

Or look at financial data. Tedeschi pointed out that, although we have a Consumer Expenditure Survey released by the Census Bureau, it relies on people manually filling out diaries of what they buy. Scanner data on what people purchase at grocery stores would do a better job at reflecting updates in consumer spending in response to inflation: do they buy less, or go from name brands to store brands? In an earlier conversation with Tedeschi on Statecraft, he also flagged the difficulty in measuring savings data with the survey put out by the Federal Reserve, which is high-quality but comes out only once every three years. Separately, Statecraft interviewee and CHIPS Act implementation head Ronnie Chatterji pointed out that supply chain data is notoriously hard to come by. More frequent input/output tables would help track the changing structure of supply chains for critical economic inputs and head off inflation before it starts.

For a third example, look at immigration, where the U.S. has multiple data blindspots. For instance, if we collected better domestic wage data, it would improve the so-called prevailing-wage determinations for the employment-based immigration system, which the nation relies upon as a proxy to confirm the integrity of U.S. employers hiring foreign-born skilled immigrants (especially those in STEM). The data from the Student and Exchange Visitor Information System (SEVIS), owned by ICE, could provide more insights into innovation and the role of immigrant scientists, technologists, and engineers, many of whom first enter the U.S. as students, scholars, or postdoctoral researchers. Much more of the SEVIS data (absent personal identifying information) should be made public without FOIA requests, so that policymakers can identify trends and policies that would strengthen the innovation ecosystem.

We should also bring back the New Immigrant Survey. After the National Academy of Sciences and the NIH recommended its existence for decades, it finally got cofunded by multiple government agencies. During its brief existence, it was our single best source of information on many characteristics of immigrants (and their children) across visa types. Sadly, it was allowed to lapse, and information we once had about immigration cohorts (including chain and familial migration and demographic characteristics) now no longer exists.

And as new issues — like family policy — increasingly motivate voters, some of the most valuable numbers we once relied on aren’t collected anymore. Demographer Lyman Stone pointed out to us that the CDC used to collect data on all marriages and divorces, but no longer does. The National Survey of Family Growth, an incredibly detailed survey of fertility and family behavior run by the CDC, is released irregularly every 2-4 years. “Since family policy is an emerging area of interest from many policymakers, expanding the NSFG to make it an annual product would strengthen its usefulness for monitoring American family life.”

Wielding the good data sources

The new administration should also double down on the places where the federal government collects legitimately excellent data. The Contingent Worker Supplement, an addendum to the Current Population Survey, was rerun in 2017 after many years skipped, giving policymakers the percentage of the population currently employed in gig work (almost 5%). 

Other data sources are extremely politically practical: for instance, this HUD database on housing permits by city is an excellent way to “name and shame” cities to build more housing. A similarly comprehensive database housed at the Federal Permitting Improvement Council, with comprehensive data on environmental review length and timelines for project litigation, could similarly create pressure for federal agencies to improve their permitting outcomes, helping reduce the huge costs to new development projects from NEPA and other regulations. And as our colleague Chris Elmendorf notes in a new paper, “Today, there is no official, annual census of net housing production.” Such a census would give policymakers a far better sense of the realities of American housing, particularly as availability nationwide comes under increased scrutiny, than current data sources, which look only at building permits.

As David Bernstein has documented in his book Classified: The Untold Story of Racial Classification in America, some of our federal categories for data were invented out of whole-cloth, the product of interest-group jockeying rather than of sober or objective observation. Yet they now shape broad swathes of federal policy. What we measure gives legibility and power to policymakers; choosing to measure something is a value judgment. As a nation, we care about reducing crime, reducing negative consequences of immigration, supporting families, and keeping inflation low. But we often don’t know the scope of the problem or the best approaches, because we so often fail at measuring them. 

All our choices to measure or not measure the measurable are indications of what we care about in federal policy. Building better measurement tools or datasets doesn’t excite the base or build social media clout: it feels boring and technocratic. But in the long run, it’s how we know if reform efforts are managing to improve what we care most about.