Metascience

The “Metascience 101” Podcast Series

A nine-episode set of interviews that doubles as a crash course in the debates, issues, and ideas driving the modern metascience movement
November 19th 2024

Introducing the series

Scientific and technological progress is upstream of nearly all the prosperity we enjoy. But where does this progress come from, and how could we speed it? Despite the importance of these questions, they have not received serious study. 

These questions are especially crucial to answer since America invests enormous resources into scientific research. The budgets of the NIH and NSF — the top science funders in the U.S. — exceed $50 billion annually. If that money can be spent more effectively, it has profound implications for the future. 

In recent years, an approach that may help us better understand scientific progress has gained prominence: metascience, the scientific study of science itself. The metascience ecosystem investigates fundamental questions through experimentation with novel funding mechanisms, research on scientific structures and incentives, and exploration into new organizational models. 

Yet despite the growing prominence of the field, there hasn’t been a great entry point for those wanting to learn more about it. 

We decided to change that. Last year, IFP brought together some of our closest collaborators to create a podcast series that serves as a beginner-friendly introduction to metascience.

The result? Metascience 101 — a nine-episode crash course in the debates, issues, and ideas driving the modern metascience movement. We investigate why building a genuine “science of science” matters, how research in metascience is translating into real-world policy changes, and how you can get involved. 

Through conversations with leading researchers, metascience practitioners, and funders like economics professors Heidi Williams and Paul Niehaus, Stripe co-founder Patrick Collison and Arc Institute co-founder Patrick Hsu, and Open Philanthropy CEO Alexander Berger and Professor Tyler Cowen, the series explores fundamental questions about scientific progress. How do we measure scientific advancement? What makes research institutions effective? How can we balance acceleration with safety? These discussions move beyond theory to examine practical reforms and real-world policy changes.

We launched the episodes via the Macroscience newsletter and podcast feed, but have collated them all here in one place. Below we have included all of the episode transcripts which have been lightly edited for clarity. Each episode title links to that episode’s Substack post — feel free to leave comments and reactions here. The podcasts are available on Apple Podcasts, Spotify, and on Substack.


Episode One: “Introduction”

Caleb Watney: This is the Metascience 101 podcast series. My name is Caleb Watney, I’m the co-founder of the Institute for Progress – a think tank focused on innovation policy in Washington, D.C. Last year, we brought together a group of scientists, economists, technologists, and policy wonks to talk about the state of science in the United States. What’s been going right? What’s been going wrong? And, most importantly — how can we do science better?

We recorded these conversations last year to understand the facts and the open questions in this emerging field. And now we want to make these conversations open to the public so everyone can catch up to the frontier of metascience. We’re partnering with our colleague Tim Hwang and the Macroscience newsletter to bring this series to you. 

We’ll talk about whether scientific progress has been “slowing down” (and whether that’s even a meaningful question), exciting new models for scientific advancement like Focused Research Organizations, how to think about the potential downsides of new scientific discoveries, and how you could make a difference in this emerging field. 

For this first conversation, my friend Dylan Matthews leads a conversation with Professor Heidi Williams and myself on the basics of metascience and why anyone should care about this field in the first place.

Welcome to Metascience 101. 

Dylan Matthews: So science is very important. That’s a kind of a trite statement, but the more you think about it, the more profound it seems. Much of the world’s prosperity seems to derive from scientific innovation, translating basic science into technology, and yet we don’t have a lot of conversations about how science is doing and how to do it better.

This series of podcasts is going to talk about ways in which science might be slowing down or falling short, and ways in which we can improve it. 

Maybe just to get a baseline. Caleb, do you want to give us a broad overview of the state of science in the U.S. and the world now? And what grade would you give it? What’s going right? What could be limitations that could be worked on?

Caleb Watney: For sure. Well, off the top of my head, we’ll go with a B minus. I think the U.S. federal government is the single largest funder of basic scientific research in the entire world. I think that’s both really important for understanding the U.S. context, but also for understanding the global context.

Other countries obviously have their own scientific ecosystems, but the United States is the largest. We’ve got a huge concentration of the world’s top scientists, of the world’s top labs. And so the science that we do is not only affecting Americans, but it really can create innovations that can spill over and positively benefit the rest of the world in terms of new medicines, new energy technologies, all sorts of things. 

Within the U.S., we have a couple of major players. First, there’s the National Science Foundation, which focuses on especially basic scientific research. There’s also the National Institute of Health which is more focused on applied biomedical research, but they also do some basic research. Between the two, we’re looking at around $60ish billion every year and that is continuing to grow. 

Obviously, there’s a range of private sector actors that also invest in research and development. We have tax credits to help incentivize private actors to invest in R&D. There’s universities that have huge biomedical labs. There are a number of philanthropists that also support science. So there’s a whole ecosystem here.

Dylan Matthews: So Heidi as an economist, you’ve studied, and your field is very preoccupied in the ways the government can help or hinder certain industries, and science is a really particular industry here.

What are the cases for heavy government subsidies and, and intervention of the kind that Caleb was just outlining?

Heidi Williams: Yeah, so a lot of why economists would make a case that this is a market where we really want the government to come in and intervene, is because we think of new scientific knowledge as being a public good.

So if I come up with a new idea for developing a drug, but I bring that to market and I will need to be selling that at cost immediately. The fact that it took me many millions of dollars to go through the process of learning whether that drug is safe and effective for people, and satisfying all of the manufacturing requirements that we have to make sure that it’s manufactured in a safe way. I’m never going to be able to recoup those expenses. And so we put a lot of structures in place that try to acknowledge that we think that the private market on its own would under-provide research relative to the level that we might want as a society. In terms of giving us the value of new innovations that’s kicking in for growth and progress in society.

And so some of that is through public funding, like Caleb was saying, through the NSF and the NIH. Some of that is through policies that try to shape investment by private firms. So, you know, he mentioned R&D tax credit. The patent system is obviously another thing that you would point to. But in general, I think there’s a sense that there isn’t just one subsidy that addresses all of the things that need to take place.

Because a lot of the basic research that happens at universities isn’t even patentable. And so it’s not really reasonable to say, well, patents are just going to solve this underinvestment problem because we’re going to provide an incentive for you to get a patent. Which essentially allows you to charge a higher price for some period, and that’s going to solve this problem – because a lot of the basic discoveries made at universities are themselves not patentable.

And so oftentimes the way that the structure gets characterized is that at universities, there’s basic research that’s generally grant funded or what you might think of as “push funded” – we’re paying for inputs rather than outputs. And then as things get closer and closer to commercialization, they tend to transition out of universities into the private sector where we still might have an interest in subsidizing and encouraging more research, but we tend to do that more through tax credits and patents. 

So the landscape of that is really very intricate in terms of what are the handoffs of when things optimally transfer out of the university and into commercial firms? What are the right policy levers at different points in the situation?

I would just say that at a high level, when we try to estimate: ‘Are we spending enough on research in the economy as a whole?’ The estimates that we have suggest that we should be spending a lot more. In the sense that the private returns to research look like they are much lower than the social returns. So as a society we want to come in and subsidize research beyond what the private market themselves are providing.

Dylan Matthews: One of the first detailed illustrations I’ve seen of this was a paper that you and some colleagues did on pharmaceuticals. Can you tell us a bit about that and how that affected your thinking on the size of science as a public good and the scale of the problem here?

Heidi Williams: Yeah. Pharmaceuticals is an interesting sector because I think you get a lot of criticism of high drug prices and you get a lot of criticism of what are often referred to as “me too” drugs, which is actually a term that I don’t love.

But, the idea is: “Are we getting innovations that are too close to past innovations rather than new breakthrough innovations?” 

So there’s a lot of people that look at the pharmaceutical industry and worry that we’re getting too much innovation because they see some drugs get introduced that they feel like private firms are making more money than they should be, relative to the social value of the things that get invented.

There are definitely cases like that, that I think you can find; that’s different from saying, in general, ‘You know, we should be getting less health research.’ And so, when I look at pharmaceutical markets, I tend to see these big swaths where we aren’t providing hardly any incentives for investment.

And to say that things are uneven, just begs, ‘well, what about these areas where we don’t get enough investment? Would it be socially valuable to bring in more investment?’ 

One of the areas that we were interested in was this idea that for new drugs often it takes a long time to bring those to market. Partly because we require clinical trials to show evidence of safety and efficacy of new drugs. And so if a drug gets discovered today, and it’s published in Nature as a potential compound, it can be 12 or 16 years before that actually reaches consumers.

But you can actually develop some drugs quite quickly. And so if you think, well, I’m an investor and maybe these are two equally profitable drugs. One of them is going to take two years to come to market, and one is going to take 16 years to come to market. That second one looks much more costly.

It’s compounded by the fact that you have to file your patents on your drugs before you start your clinical trials. So every drug patent basically gets 20 years. If it takes you two years to get from starting your clinical trials to coming to market, you get 20 minus two. So 18 years of patent life.

Whereas if it takes you 18 years to do your clinical trials, you get 20 minus 18 or two years of patent life. And so there’s this whole section of drugs that take longer to develop. We’re providing them with less patent protection, and I look at that and think, maybe we’re getting too few of those drugs.

And so we collected a lot of data on cancer drugs where it’s easy to know – even if we’ve never had a clinical trial on a given type of cancer – how long it would take to develop a drug for that kind of cancer. And we use various different statistical tests to try to say: ‘If we had a different set of incentives, in the sense that if we had a technology that would just let us have shorter clinical trials, how many more drugs do we think would get developed for these diseases where we’ve just generally had very little innovation?’

The basic answer we got was, we would get a lot more innovation. When you tab out the health consequences of those missing drugs, it looks like the number of life years that you would save if you fix that distortion are actually quite large. 

We’re kind of saying, I’m happy to acknowledge we get too much drug innovation in certain areas. I’m happy to say there’s some “me too” drugs that aren’t adding a lot of value, but there’s this other area where we’re just totally under-investing relative to the life years that we could be saving. And I think those kinds of numbers, even if they’re very approximate, do a lot to call attention to the potential value of aligning incentives in a better way for innovation and science policy.

Dylan Matthews: So I think we have a framework here for how to think about why it’s valuable for the government to be incentivizing scientific research above and beyond what industry would do. But as you’ve both been saying, there are a lot of different ways the government tries to incentivize it, a lot of different ways that the private actors and philanthropies try to do it. 

Maybe one way to think through what this looks like is if you’re starting a PhD program in a science, maybe let’s say you’re a physicist. What does funding look like for you as you enter a PhD program?

What’s the maze you have to navigate through to get funding for your work? First as a grad student, then as a postdoc, then as a professor somewhere or a researcher at some lab?

Heidi Williams: I think oftentimes at our science PhD programs in the U.S., one important thing is that we’re very lucky that right now a lot of international students want to come train at U.S. universities because we’re seen as offering a premier research environment.

In the sciences, you’re generally fully funded as a student. That takes different forms: in my own field of economics, you’re fully funded centrally, so you join a program and you have the flexibility to choose who you work with and you can change that at any point. In the sciences, it’s much more common that you might come in and very quickly be matched with a faculty mentor to work on a specific grant in their lab, and so you very quickly get exposed to what grant funding looks like because you’re being funded on a specific grant and your project is very tied to what the grant is about. You’re immediately fed into that system. You finish your PhD and often it involves making progress on a specific set of things related to that grant funding that you had. And then most of the time in the sciences, you’ll do one or more postdocs after that. Postdocs again are often more complicated for international students because they’re often not eligible for certain types of postdocs.

Many kinds of labs might have some commercially funded postdocs that are in industry-based collaborations. They might have some funding for specific projects through the PI, and there are an increasing number of these kinds of early career independence postdocs, where actually you might be choosing your own project. But your postdoc can basically be similar to your graduate experience in the sense that you’re working on someone else’s grant or it could take one of these other kinds of a postdoc that’s more of a path to independent research. 

So eventually if you continue in the academic pipeline, you’re then applying for grants for your own lab. And you know, in the biomedical sciences, for example, if you apply to NIH, the numbers on this vary, but it can be 18 months between when you apply for a grant and when you actually get money back from that grant.

That time horizon just really shapes the advanced planning that you need and how you plan out your students that you support. Because if you’re having a student on a grant that’s say five years of their PhD in, how long is the grant? And am I going to be able to support this student for that entire time?

And these kinds of mismatches between student training lengths and grant lengths and just portfolio management. Science is meant to be experimental. So what if a project doesn’t work? How do you adjust and what is the structure of how grants adjust? 

When you talk to scientists, it just comes across very clearly that it’s a very intricate balance of how to manage making sure that you have funding for projects, making sure that the people in your lab are getting paid. It sometimes feels like it’s coming at the detriment of, ‘am I working on the projects that I personally feel are the highest impact?’ Because my resources in my lab are tied to specific commitments that were made and the time allocation of my students on specific grants.

Dylan Matthews: Yeah, so it sort of seems like as you go into the profession as a PhD student, you’re setting on a course of many years where you’re primarily working on other people’s projects. It takes a while to take on independent projects of your own. And once you’re there, you’re very dependent on what sort of funders are available and what they say about your ideas. And which ones they like or don’t like. Is that a fair characterization?

Heidi Williams: Just to highlight one extreme of that, a lot of medical schools and public health schools are what are called “soft money” jobs. Essentially you’re required to fundraise almost all of your salary in addition to any research costs that you have. And so, you get the sense from people that they’re very beholden to what funders are interested in funding them, as opposed to what’s my best idea that I want to take forward.

And so I just feel like, that’s a lot of the motivation that has been behind some of the recent movements for more person specific funding or should we think about transitioning off of this “soft money” environment where you need to fundraise and that other people are choosing which topics are most important for you.

Caleb Watney: Yeah. I think it’s actually pretty interesting to trace how science has been traditionally funded. Across time and across history, a lot of historical scientists were funded through a kind of patronage system, which was in some ways connected to the work. Obviously, you would try to choose scientists whose work you thought was socially valuable or potentially even personally valuable. But a lot of it was much closer to a person-specific grant funding which can work quite well in certain cases. It allows scientists to have a much greater degree of flexibility in terms of deciding which specific subgenre of their research ends up looking the most promising.

I think it’s not an uncommon experience for scientists to start working on something, have a lot of promise about a particular path or avenue of research, start digging in a little bit more, and realize, ah, actually this really isn’t going to work. Under a lot of project-based grants, it can actually be pretty hard to pivot your research from one approach to a different one. And so these person-based funding approaches can be a lot more flexible.

A potential downside though is who gets recognized as a person who is worth funding could end up really sort of biasing the field more towards various established researchers. Researchers with a lot of background credibility.

And so trying to balance this, how do you provide researchers the flexibility they need to pursue the kinds of projects that they think have the most value? While also providing ways for up-and-coming scientists who may not have as much name recognition or a portfolio of work to draw on can still get funding.

Dylan Matthews: How much of this culture comes out of the rise of big science? And the change in what science is? Over the 20th century, you went from physicists sort of doing stuff with cathode ray tubes and like small rooms to something like the Large Hadron Collider.

I guess that was the 21st century, but where you’re spending billions of dollars to set up one destination to test various experiments. Is some of this increased complexity just a necessary aftereffect of that shift in what science is?

Heidi Williams: Yeah, so Ben Jones, who’s an economist at Northwestern, has written very thoughtfully on just how ubiquitous the rise in team science has been across fields over time.

I think for a while you could get the sense, well, maybe this is just physics, or maybe this is just biomedical, but it’s actually even in economics and other social sciences and even in the arts. He has a really nice paper that was in a volume that the National Bureau of Economic Research published that was basically pointing out that the structures we use to support science have essentially been static, even though the rise of team science has been like one of the most important changes that is going to happen to science over our lifetime.

And similarly, the lengthening of training for students just has a lot of nuanced implications for how we support early career sciences. Like what Caleb was saying. And so, in some sense it’s very natural that the structure of science is going to shift. As Ben would say, the burden of knowledge is changing the frontier of what tools you need. 

And do you need different people from different disciplines combining to work together in teams? And what does it mean to support interdisciplinary work? All of that is changing and the fact that our structures of how we fund and support science haven’t changed, I think, is itself indicative of a lot of the reasons why you might think that we could be doing better than we’re doing today.

Caleb Watney: I think one, maybe particular example is to just think through how different science is today and how it was a century ago. Over the course of a single summer Albert Einstein had, I think, a series of really cutting edge physics papers that laid down the foundation of basically most of what we know about theoretical physics today.

And he was just like one guy. He had a chalkboard, He had colleagues he was talking to. But funding Albert Einstein’s work during that period would’ve been extremely cheap. Whereas today, to prove the existence of a single particle, the Higgs Boson requires this massive particle accelerator that costs billions and billions of dollars, with thousands of scientists working in close collaboration.

So the structure of science has changed dramatically over the course of the last century. But the way that we fund and structure science through our funding institutions has remained remarkably stable across that time.

Heidi Williams: And also the way that we recognize talent. People still get tenure at universities as individuals, so your work is essentially evaluated about you personally. A lot of scientists are thinking about, well, am I going to be recognized in some way by my colleagues?

And whether that’s papers that get published in journals, those can all be collaborative. But who gets a Nobel Prize is kind of a very individual recognition. And so in some sense it’s like also our structures for how we evaluate people’s work hasn’t kept up with the rise of team science, and I think that’s also an important disconnect.

Dylan Matthews: One area where I wanted to talk through some problems before we start talking about solutions is immigration. We’ve been talking a lot about the rise of team science. Science is a big collaborative endeavor. It stands to reason that frictions in getting people from one place to another where they can collaborate with people productively would have a big effect on that. And, Caleb, I know this is something you’ve worked on a lot recently. Is the U.S. immigration system right now fit for purpose in terms of augmenting our scientific capacity and getting the smartest people to the right U.S. labs?

Caleb Watney: It seems like not as far as we can tell. Just to take a step back, you can think about what are the high level inputs to science that enable a scientific ecosystem to succeed. You have research funding, which we’ve spent a lot of time in this conversation talking through how the NSF and the NIH can choose or pick grants in different ways.

There’s the actual physical infrastructure you need. These are both the cities in which scientists collaborate as well as the lab space. You need expensive microscopes and particle accelerators. 

But maybe the single most important part is the people, the scientists themselves that make it all work.

And if we have a rough intuition that talent is distributed at least roughly equally across the globe, well, the United States is only about 4% of the total global population. And so that means that, in terms of where scientific geniuses are being born all over the world today, only a small share of them are actually being born in any one country’s particular borders. 

If you have aspirations as a country to be a scientific superpower or if we really want to maximize the impact of agglomeration effects, which are what economists call when you get a bunch of really smart people together and their work is more productive than the inputs of any one particular worker.

If you take agglomeration effects really seriously, that means that there are enormous returns for allowing scientists from all over the world to cluster in particularly impactful research clusters. And it seems like for a variety of policy and historical path dependency reasons, the United States has ended up being where a lot of the most productive scientific research is actually happening.

And you can also see this in surveys of international students and where would they like to go and study, and then practically, where do they go study? A large chunk of, especially the most promising students end up coming to the United States to study. But our immigration system is really poorly set up for actually allowing a lot of those students to stay here.

It’s a pretty common occasion where we’ll have some student come do a PhD here. We are pouring and investing a lot in their research training, and then we have actually no avenue to allow them to stay in the United States. And that seems really counterproductive in the normal American interest story, but it’s potentially just as bad from the global advancement of science and the advancements of new medicine, if we’re actually preventing the world’s top scientists from being able to cluster together.

Dylan Matthews: I’ve been reading the making of the atomic bomb, which means I’m a guy who brings that up in all conversations now. One of the really striking things is just how many scientists from small countries. There’s Neils Bohr from Denmark, a tiny country. Ernest Rutherford was from New Zealand and had to move to the United Kingdom to do his work. If they had just been locked in their small countries without peers to work with, it’s shocking to think of all the fruitful collaborations that wouldn’t have happened.

Heidi Williams: Yeah, both with the rise of team science and if you think about how technologies get commercialized out of universities and into the real world, I think in both settings having the right team is itself incredibly challenging. Then if you’re just putting on a constraint that you can’t have the people that are the best people because for some reason they can’t get a visa to come work in your lab or they can’t get a visa to come work at your startup. You feel like you’re sort of shooting yourself in the foot. It’s hard enough to find the right people. And just having fewer barriers in the way of getting the right team together, I think is something that very naturally matters a lot.

Dylan Matthews: Okay. So let’s talk through some ideas that people floated as ways to improve the funding process and right-size American policy towards science. Caleb, what are a few different ways of funding that come through the pipe and that strike you as promising?

Caleb Watney: Right. So it’s maybe worth just taking a second to talk about the current dominant model. Especially at the NSF and NIH, it is sort of this traditional peer review structure. I want to emphasize that this is almost by definition a caricature. There’s a lot of variation across specific institutes, but at a high level, what’s happening is you have scientists who are submitting proposals for promising avenues of research they want to work on. They will create a budget that roughly describes how much that work will cost. They will submit it and then a panel of their peers will grade their research across a number of dimensions: How promising does this seem? How much social impact do we think it could have? How likely is the research to work out? How much of a track record does this particular scientist have?

Then program officers at either NSF or NIH will in some sense collate these proposals and create some sort of rough ranking. In some institutes, there’s a bit more discretion on the program officer’s side to be able to rearrange them from just the pure average.

But at some level, the average opinion of your peers ends up really shaping where your proposal stacks in this rank, and then ultimately scientific grant funding agencies will make a determination based off of that and pick the ones that they think seem the most promising.

As we alluded to earlier, one alternative process is a more person-based funding approach, where you choose a particular scientist who you think their whole strain of research seems particularly promising. And we want to give them autonomy. We want to give them discretion to pick and choose which strains of their research are the most promising.

And so there are some research organizations like the Howard Hughes Medical Institute that really specialize in person-based funding. There’s also some new models that are coming up. There’s the Arc Institute which is sort of a combination of a couple of universities in the Bay Area which is trying to bet on particular biomedical scientists and allow them a lot of freedom to pursue different kinds of strains of research. I think these are promising. Just to briefly highlight a few other models.

Heidi Williams: I would also just chime in. So Jon Lorsch at the National Institutes of Health actually started a person-specific funding program at the NIH which is something that he’s very interested in learning about what’s worked well and novel about that.

But it is interesting to see that there is some precedent for that. And similarly at the National Science Foundation, they do have some career awards, which are person-specific. And so it’s interesting to think of. It’s not administratively that the government couldn’t do that, it’s just that historically we haven’t.

Caleb Watney: Absolutely. As I think both Heidi pointed out, as I alluded to earlier, there’s a lot of variation across the federal science agencies. While the bulk of the funding ends up being distributed through these big peer review processes there. There are a lot of really small projects and small programs that are trying out different approaches.

Another one that’s been gaining a bit more attention recently is this idea of using golden tickets. The basic intuition here is that consensus-oriented review processes might end up just having a bias against high-risk, high-reward research. It might be an attribute of novel research that some people really like it and some people think it’s really not promising. And maybe you actually should be looking for variation across reviewers. So one way that you could potentially try to select for this is to give each particular reviewer during the selection process, a golden ticket that they could use to basically champion a specific proposal and say: “I want to guarantee that this gets funded, or at least want to heavily tilt the selection process towards this particular one that I think is really promising.”

And while it would probably be a bad idea to have all science funded that way, it might be helpful to have as part of a portfolio of funding approaches. Some other approaches that people have at least talked about or have maybe been tried on a small scale is a scientific lottery. Here the idea is: how much do we really know about the way in which our selection mechanism is actually choosing the most promising or the most rewarding scientific projects? It might be that above some minimum quality threshold, we’d be better off. Just choosing at random, especially once you consider the huge time and grant paperwork costs that are involved in the current system.

I think it’s worth highlighting that we really don’t know how any of these systems work in a rigorous way. That’s one of the things I’m most excited about over the next 10 years is how we can build an evidence base that actually allows us to test out – in an iterative way – different kinds of funding mechanisms and then actually have very clear metrics that we’re going to use to judge them and say whether or not they were successful.

Dylan Matthews: Heidi, I wanted to ask about what we do know about which mechanisms work. There are some of your peers beyond yourself in economics, like Pierre Azoulay, who have been doing some research on how different types of funding mechanisms affect the outputs of science. What have we learned to date? And what are some of the big open questions there as you see them?

Heidi Williams: Yeah, so just to give one example. Kyle Myers, who’s a junior faculty at Harvard Business School, had a really nice paper. Sometimes the National Institutes of Health tries to shape the direction of research by saying, rather than having their normal process, which is academics propose what they want to work on and peers judge whether they think that’s a good project. They’ll sometimes put out what are called “requests for proposals” in specific areas. So we want to have more research on Alzheimer’s or we want to have more research on a specific area of biomedical, basic scientific research. And so, what Kyle looked at very cleverly was essentially: what does it cost to get faculty to change their research direction?

And in short, it’s very expensive. For me, that result really shaped how I thought about if we as a society think that a given area is neglected, what’s the best and most cost effective way to remedy that? And I think the normal model that people have in mind is this request for proposals idea, where let’s give senior academics some extra money for them to shift their research to go work in that area. 

And I think Kyle’s paper made very clear that you can do that and they will shift their behavior, but it’s very expensive. Whereas I think of it as: what if you offered PhD fellowships or postdoc fellowships for people that haven’t yet chosen their area of specialization, and you’re highlighting to them this is a socially important problem and you’re giving them the opportunity to make their decisions about what to work on.

On a more socially level playing field, in the sense that the whole idea behind why the NIH wanted to subsidize Alzheimer’s research is because they felt like there was too little Alzheimer’s research relative to the social value. So why don’t we subsidize the cost of doing that? But rather than doing that by paying senior faculty to change what they’re doing, which is really hard, you can just subsidize new entrants to come in.

And that’s actually, I think, much less expensive. These papers in isolation I think oftentimes highlight actually pretty fundamental insights that can shape how you think about science as a whole. I will also say that even though I’m a big fan of academic research, which is why I’m in academia. I actually feel like a lot of the low-hanging fruit on how to improve the productivity of science is like much more bread-and-butter process improvements.

And so, one of the problems that we highlighted was this idea of really long time lags in funding. For the NIH, maybe that’s 18 months. Even at the NSF, that’s usually six months. Although at the NSF, they do have these two programs that are much shorter turnaround time of closer to two weeks.

But I think it’s been really interesting to see post-covid more momentum around should we be doing more ‘Fast Grant’-style programs. So Patrick Collison and Tyler Cowen had an explicit program that they started during covid, which guaranteed two-week turnarounds, and actually oftentimes they were doing two-day turnarounds in the middle of the pandemic.

I think it’s just really important to think of this as covid might have exposed problems in the system and the development and piloting of new models that could themselves be back incorporated into the system. The NIH during covid actually had their own rapid grants program called the RADx program, which did get grants out the door in about eight weeks.

And so administratively, we’ve seen evidence that the agencies can do this when we need to, but to me, it feels like it must be a higher priority than we’re putting on it to have that be the norm rather than the exception. We can do this in the time of crisis, but the normal system of funding science is just going to be this very slow process.

Dylan Matthews: Another trend that I’ve noticed in government funding lately is the proliferation of ARPAs. So in the beginning there was DARPA, the Defense Advanced Research Projects Agency. And now there’s ARPA-E for energy, there’s ARPA-H for health. What’s different about that model? What’s distinctive about an ARPA? How have attempts to copy the DARPA model worked so far?

Caleb Watney: I think at the highest level the ARPA model is characterized by giving a really wide scope and autonomy to particular program managers within the agency who can then, in a much more directed way, try to push grants or technologists and engineers to work on a very specific problem.

The NSF and the NIH model are very: you apply, we’ll evaluate them and then we’ll decide what we think is important. ARPA managers are oftentimes working very hands-on with people at universities shaping the research directly, checking in very often. Same thing they do with private sector allies that they might be working with. Another thing that characterizes them is their ability to take a coordinated bet on a set of technologies at the same time. So I think, oftentimes, one particular scientific breakthrough or one particular technological tweak may not actually provide a lot of value by itself, but you almost need a whole ecosystem.

You need three or four bets to work at the same time to actually unlock a whole ecosystem of value and ARPA models, because they have these highly empowered autonomous program managers have much more flexibility to pursue that type of strategy.

Heidi Williams: It can also, just to interrupt, it can also be the opposite where if there’s four possible solutions to a problem and only one of them can work. If you’re an individual funder just funding one grant, you take a bet on which of the four it is.

But in this DARPA kind of portfolio approach, you could actually invest in all four and you’re kind of guaranteed to have the payout at the end, but also maybe the product that you get at the end will be better because you actually pursued all four simultaneously and learned about what didn’t work about the other ones along the way.

Caleb Watney: Absolutely, I think we’re still actually learning a little bit about what makes the ARPA model tick. There’s been a couple of papers or analyses trying to investigate it. Mostly I think we’re starting to see a lot of clones because there’s something about it that I think policymakers have identified that seems to be providing real advances.

I mean, a lot of the technologies that people will cite positively. Over the last 30 years have had a hand in some way from ARPA funding. Everything from GPS satellites to the internet and even mRNA vaccines, ended up getting some ARPA funding at an early stage that really helped them advance.

Heidi Williams: Yeah, and I think one thing that’s hard is that DARPA tends to be evaluated by the best projects that they funded. And that’s a very natural thing. It’s actually not even a crazy way of evaluating the overall value of the program because, if we got mRNA vaccines and two other things in that initial grant to Moderna that Dan Wattendorf did really mattered a lot.

In some sense that could justify the cost of the whole portfolio. That’s not a crazy approach, but I do feel like at a systematic level the idea that we’re going to have ARPA-H for the ARPA for health exist alongside the National Institutes of Health and not have a great idea of like what exactly is the value-added of which projects are better funded at an ARPA versus which projects are better funded at high-risk, high-reward programs at the National Institutes of Health.

I think it would be to everyone’s benefit to have sharper thinking on where this model is most productively applied. Rather than just having unbounded enthusiasm based on the idea that we can find some examples that worked well, even if those examples might well justify all of the spending on the portfolio.

Dylan Matthews: One question one might have at this point in the conversation is whether we’re disappearing too far into looking at ourselves in the mirror? That we’re navel gazing a little bit? What’s the advantage of really examining how science works? Are there case studies in the past of when we’ve looked at a process of knowledge generation or research found deficiencies, fixed them and the results were really encouraging?

Heidi Williams: Yeah, so I think in general, the idea that we can use the scientific method to learn whether something is valuable or the right thing to do is something that has a lot of important precedence. For drugs, just because I’m a scientist and I come up with the idea that you might want to put this chemical compound in your body, like we don’t immediately give it to you.

Instead, we do these very carefully constructed trials where we randomize who takes the drug and we compare you to a control group, which is either the available standard of care or just a placebo. And we actually try to rigorously test: is your health better because we gave you the drug?

And that approach to say: what we need is systematic comparisons to know whether or not we’re doing better is something that is very natural in medicine and has actually been applied really well in the field of international development. Paul Niehaus, who’s going to speak on some of our episodes in this podcast, often cites the example of his work on cash transfers which Dylan knows very well cause I know you’ve written articles on it too. But at some point people thought that giving people money was just a terrible idea because people are going to spend it in terrible ways. And in fact, we should just do direct aid like an in-kind transfer or give people what they need, like food rather than giving them money. The whole idea that the field of development set up this system, very similar to clinical trials, where we were going to randomize and compare how good our cash transfers are relative to in-kind transfers. And do people benefit more? Is their welfare better? 

Under this different system, it really legitimatized cash transfers. Energy has actually shifted where I think there’s like much more momentum around cash transfers being the best way of having a social impact and improving people’s lives relative to doing in-kind transfers.

And so, it’s just the idea that we’re going to have evidence as a basis for making decisions and that that can really make broad-based institutional change in improving the social value of these investments that we make. I think it’s just an example that I find very inspiring. 

We don’t just have to complain about the National Institutes of Health. We don’t just have to say that we don’t know how to do science better. Like we can do systematic studies and like just learn about what works and address specific problems, and it just feels like a real opportunity to try to make investments that can make progress on that,

Caleb Watney: Totally agree. I think it’s also worth saying that this is not like a crazy idea at an organizational level. Private firms do this all the time. In Silicon Valley, you have a whole range of companies that have a sophisticated apparatus for running AB tests to optimize something as small as an ad placement on a website. And if something as socially trivial as that can benefit from finding small efficiencies and knowing that it’ll pay off in the long term, how much more can we find ways to optimize our scientific ecosystem? Maybe the single biggest example of how this process works out is the enterprise of science itself. It was a revolution for a reason and applying these systematic ways of gathering knowledge, evaluating evidence and then making iterative improvements has been the main way that humanity has progressed over the course of centuries.

And so now applying that to the institutions that fund and incentivize science directly, I think makes all the sense in the world.

Caleb Watney: Thank you for joining us for this first episode of the Metascience 101 podcast series. Next episode, we’ll talk about whether science has been slowing down and how we can measure the pace of breakthrough advancements. 

Subscribe to this podcast feed to follow the rest of the series, you can find more information about this series and the Macroscience newsletter at macroscience.org. You can learn more about the Institute for Progress and our metascience work at ifp.org, and if you have any questions about this series you can find our contact info there.

Special thanks to our colleagues Matt Esche, Santi Ruiz, and Tim Hwang for their help in producing this series. And thanks to all of our amazing experts who joined us for the workshop.

Episode Two: “Is Science Slowing Down?”

Caleb Watney: Welcome back, listeners! This is the Metascience 101 podcast series. Last episode, we introduced the series with a conversation between myself, Professor Heidi Williams and Dylan Matthews with a 101 intro on “How we can do science better?” If you missed it, I highly recommend that as a starting point. 

For this episode, Alexander Berger, CEO of the foundation Open Philanthropy, is joined by Matt Clancy and Patrick Collison. They discuss whether science is slowing down, how to measure scientific breakthroughs and the role of institutions in our scientific enterprise.

Alexander Berger: Great. My name is Alexander Berger. I’m co-CEO of Open Philanthropy and I’m here today with Matt Clancy and Patrick Collison. Matt, do you want to introduce yourself?

Matt Clancy: Sure. I’m Matt Clancy. I also work at Open Philanthropy as a research fellow on metascience, which is what we’ll talk about today. I also write a living literature review called New Things Under the Sun, which I’ve been doing for a few years now about academic research on science and innovation.

Alexander Berger: Great. And Patrick?

Patrick Collison: I’m Patrick Collison, co-founder and CEO of Stripe and also co-founder of the Arc Institute, which is a new research nonprofit focused on basic biological discovery. I wrote a piece in 2018 with Michael Nielsen on some of the questions I think we’ll discuss today like the per-capita slowing in science.

Alexander Berger: Yeah, that’s right. We’re going to talk about whether science is slowing down. And Patrick, why don’t we start with you? Could you talk a little bit about that piece with Michael and what you found?

Patrick Collison: Sure. So the title of this episode is “Is Science Slowing Down?” We made the case in this article that science, as a whole, is not slowing down, but rather that per-capita science is slowing down. People broadly may not be familiar with the fact of the shift in the number of people, the number of papers, and many major metrics connected to science since the Second World War. There’s been an explosion in the number of practicing scientists, up by a factor of maybe 20x. The amount of federal funding for science is up by a comparable magnitude. The number of papers being published is up enormously. Given this explosion in the denominator, one obvious question is “Well, how has the numerator changed?” where the numerator is the actual realized scientific discovery. We made the case that it’s very hard to see how it could be true that realized scientific breakthroughs and progress is up by, say, a factor of 20 and therefore necessarily the per-capita impact is down.

I think it’s an interesting question: what’s going on with the total returns? Maybe the absolute returns or output is up by a factor of two. Some people make the case that is even in decline. We’re explicitly neutral on that. But I think an important fact about the world today – with significant policy implications – is that the per capita returns are almost certainly diminishing materially.

Alexander Berger: How did you actually think about whether the amount of progress happening is going up or down? Did you have some metric of scientific progress to look at?

Patrick Collison: Yeah. An unfortunate fact about metascience is that the ground truth that you really care about – important scientific breakthroughs – cannot be objectively measured. Typically, people fall back on various things pertaining to papers and citations.

To try to get a somewhat different cut on this, we decided to survey practicing scientists about their beliefs and their estimation of various Nobel Prize winning breakthroughs. Given plausible beliefs as to the shape of the distribution of breakthroughs, you might expect that Nobel Prize winning work would be getting significantly better through time.

When we looked at the three different fields of chemistry, biology, and physics, it’s a little bit noisy. But after surveying 1,000 or so scientists, their estimation was that breakthroughs were roughly constant in significance. And in the case of physics, in slight decline. 

If we’re working 20x harder and only producing breakthroughs that in scientists’ own regard are of roughly constant or again possibly declining quality, that seems like a significant fact that’s in accordance with this general arithmetic intuition.

Alexander Berger: Matt, you said you do living literature reviews on New Things Under The Sun. What does the rest of the literature say about this question?

Matt Clancy: Yeah. I mean, I totally agree that measuring science is this very difficult phony question. I think about it as a patchwork and dashboard of different indicators and there are a handful of indicators that say, “No, things are going fine.” That’s like the number of papers published per scientist, which is pretty constant. Or patents which are roughly constant. 

And so just counting papers doesn’t seem very satisfying. When you dig deeper, everything points in the same direction that what Patrick was saying. The average is going way down, probably because the denominator is also exploding. Nobel Prizes are one thing. But maybe the Nobel Prize is a particular niche institution. Maybe it’s not representative of broader trends.

But you can look at stuff that reflects broader trends. There’s a bunch of citation metrics like what’s the probability that a recent paper gets cited? Or what share of your citations go to recently published work? And you might think that reflects a vote of confidence network or that work is important enough for you to build on. These have been steadily declining since the Second World War. 

The share of citations to recent work published in the last 10 years has gone down to levels that were seen at the end of the Second World War when there was not a lot of science to cite from the previous five years because everybody was fighting. It’s a large magnitude.

Other stuff people have looked at is: what’s your chance of having a paper become one of the most cited papers of all time, in the top 0.1 percent? And that’s also been declining over time. It’s harder and harder to climb into those ranks. The older stuff is just sticking around and maintaining its position for longer. 

Other people have tried to do really sophisticated stuff with the citations where they look at disruptive papers, “I cite you, but you rendered everybody else obsolete, so I no longer cite the work that came before you.” People have developed this disruption index based on this idea of: do you cite a paper in conjunction with its own references or do you just stop citing its references altogether? The probability that you get cited alone without your antecedents has also gone down.

And then another citation-based metric is about patents. Maybe this is all something weird in academia and just some norm about how we cite stuff or a weird culture thing. But inventors are not playing the same game. Inventors are just looking for useful knowledge to build technologies on, then they patent them. Inventors are also citing recent academic work at a much lower rate than they used to. 50% of citations to academic work were to recent work in the 1980s, it’s down to like 20% in the most recent data.

Instead of citations, people have looked at the text of the papers themselves. If you look at the titles of the papers, are they referencing new words? Are they bringing new concepts into the scientific corpus? Are they even combining old words, but with new combinations of them? And that’s been declining too. Even people have looked at the keywords that authors attach to the papers and how many new keywords do authors attach to their papers. That’s all declining too.

I think any one of these you could be like, “Well this disruption maybe it’s a little suspect. Or maybe this other Nobel Prize thing, indexes some game that Nobel Laureates are playing to award each other stuff.” But when everything is pointing in this consistent direction, I take it as that there’s something going wrong and we’re not producing as you would expect from the huge increase in the inputs to science.

Patrick Collison: And, and just to state something that I think is maybe implicit: I think that metascience can sound like an even more arcane subfield of science and self-referential and not of obviously tremendous external significance. But I think this really matters. If you think about how our lives are better than those of individuals in the 18th century, so much of that is downstream of progress in science. Infectious diseases, semiconductors, what have you.

Today, obviously an object of significant discussion is AI and the various breakthroughs and new models there. When you ask people working on AI why to work on AI, especially given some of the stated concerns, they say things like, “So that we can cure cancer.” Given the risks, they are very reasonably justifying the pursuit of AI on the basis of possible forthcoming scientific discovery.

I think the mechanics, the dynamics, and the prospects for these discoveries are among the most central questions for us as a polity today. Some implicitly devalue it, treating it as this mechanistic industrial process. If we dump more money in, somehow linearly more output or outcomes will ensue. Science is just not that straightforward, as a lot of the data that Matt just cited reflects.

Matt Clancy: One more of these facts that I think does a good job of tying it to real world impact is a paper that looks at how many years of life are saved per 1,000 journal articles about the same scientific topic like cancer or heart disease. That too has been falling over time. So if you think at the end of the day how we want this to cash out as health gains, we’re not getting the same return that we used to.

Alexander Berger: Matt, I mean, that was the same thing I was going to say. A lot of the work that you were citing is about citations or the text of scientific papers. But what about the sort of things in the world, say crop productivity? Like are we investing more in R&D and seeing less output in engineering feats in the world too? Not just in scientific papers themselves?

Matt Clancy: Yeah. I think you see the exact same dynamics when you look at broader technological progress where you can debate and it’s uncertain. Is the absolute rate of technological progress slowing down or speeding up? I’m not sure, but it’s much less debatable that we’re pouring in a lot more R&D effort and not getting a commensurate increase in the pace of technological progress. Like you said, crop yields go up at a linear rate, but we increase our inputs exponentially over decades.

The R&D has gone up orders of magnitude. You get about 2.5 more bushels of corn per acre every year. I used to be an ag economist, I know that one well. 

People have looked at machine learning performance on benchmarks. They’re moving incredibly fast. But the amount of people working on it and compute resources going into it are going up even faster. Every industry that I know of that people have looked at – which is not a ton – but the ones where you can measure stuff well, they find the same dynamic.

Alexander Berger: So you both said it’s a little bit ambiguous whether the absolute progress per year is going faster or slower than in the past. What’s the best thing written on this? Like if somebody wants more, “Okay. We get the per scientist, per dollar invested, we’re not getting the same returns. But overall, are we learning more per year than we used to be?” Where should somebody look?

Matt Clancy: I think the best answer is maybe the great stagnation literature, which Tyler Cowen who’s here has kicked off. But also Bob Gordon has a famous book The Rise And Fall Of American Growth.

Patrick Collison: Which is one of those books that’s super long and still actually worth reading it all. 

Matt Clancy: It’s very focused on the absolute pace and less focused on the rate of return. And he makes the case that TFP growth rates, real GDP, technological progress at a very granular detail level is not keeping pace with what it was doing in the 1920-70s.

Patrick Collison I think FRED, like the economics database, is probably the best source on the absolutes. I say that kind of tongue in cheek, but at some point I think you do start falling back on GDP and things like that. I can never quite figure out what the right conclusion is. The constancy of log of U.S. GDP, that’s just a shockingly steady exponential. Now, that’s not GDP per capita and obviously that denominator has changed a lot. But somehow, if you just look at the U.S. as a whole, as a system it has been on this really robust steady exponential since at least 1870, possibly earlier, although the data gets worse when you go back.

Matt Clancy: I mean, even if things are getting much harder, we’re also trying a lot harder. You wonder if these are feedback effects where once science starts to disappoint, you start to see the Institute for Progress pop up and say, “We need to push things forward.” And people writing Atlantic articles point the problem out. Maybe there’s this endogenous response that tries to perk it up. 

Alexander Berger: Could you actually talk a little bit more about how people model these kinds of dynamics? What we’re seeing is that there is way more investment in science and scientists than there used to be in the past. But we’re seeing pretty constant, pretty much linear economic growth and other kinds of progress that we’re seeing. What model makes sense of that? And what would that model predict for the future of the world?

Matt Clancy: Yeah. So I think the canonical economic models of this are going to be by different economists named Jones. There’s Ben Jones who has this idea of the burden of knowledge. He has a model of how this would play out and generates a lot of stylized facts about innovation that seem to bear out. 

The basic idea of this model is that as a field progresses there’s more to learn. To push the frontier, you have to know more than in the past. It’s almost tautological. If you couldn’t solve the problem before, it’s probably because you didn’t know something you needed to know to solve it. You have to learn the new thing and usually the amount of new knowledge you learn doesn’t fully displace the old stuff. The burden of knowledge that you have to know keeps growing. That means that people have to spend more time getting trained. They have to assemble bigger teams of specialists to put enough knowledge onto these problems. That’s one model.

The other is Chad Jones’s which is a little bit more agnostic about what exactly is going on, but it is these growth models where it assumes that generating R&D gets harder and harder. These effects that Ben Jones pointed out are one explanation for why, but there could be additional ones too. And he relates everything down to the growth rate of scientists and shows that if the share of the economy working on science is constant, then you can get constant growth. But notice that the share of a growing economy is going to be a constantly growing thing kind of matches this stylized fact of being unsure if the pace of technological progress is speeding up or slowing down, but we know for sure that we’re putting a lot more effort into doing it.

Alexander Berger: For a long time, as Patrick mentioned, the number of scientists was just increasing really rapidly. You have more and more people being able to plug into the frontier of the global economy, due to general population growth. As those trends slow down as demographics change, does this model imply that we should expect scientific progress to radically slow down or stop?

Matt Clancy: Yeah. So that model does. The number of people is the big thing. If the population growth rate slows, you can offset that by putting a bigger and bigger share of the population into science, but at some point you run out of gas.

Patrick Collison: But that model makes homogeneity assumptions or uniformity assumptions around the per-capita productivity of the scientists. If you don’t make that assumption, then I think the bad news is your current scientists are not as productive as you think or certainly the marginal scientists you’ve been adding. But the good news is maybe things are not going to deteriorate as much as you fear.

Matt Clancy: I think one way you can think about it is like if you have a population growth rate and one in a million people is Einstein, then you’re kind of okay. But if the population growth rate is stagnant and you have to pull a larger and larger share out of the economy, you get Einstein first and then you start to have to go down to second tier scientists who we won’t name.

Patrick Collison: But I don’t know if you think this is fair. Impatient is not the right word, but something that I don’t love about this literature, despite having tremendous respect for the Joneses. I mean, they substantially pioneered this field, so we’re on some level riding on their coat tails. But something that I don’t like about these models is this either implicit or explicit homogeneity assumption. Do you know of any models that explicitly don’t make that assumption or model some unevenness there? Just because it feels to me that, absent strong justification we ought not to be making that assumption.

Matt Clancy: I mean, I think there is a paper by Erik Brynjolfsson and, and co-authors about genius being scarce. So it’s like, “Well, we got this AI explosion, we should be seeing these productivity booms,” and this paper was arguing that one thing that’s holding us back is that you need this certain rare confluence of skills. They were thinking more in terms of entrepreneurial skill rather than scientists. I think the question about the differences in skill level of scientists is something that people have thought about more empirically. Like they document that there’s these huge disparities in citation stuff.

Patrick Collison: However, we don’t quite connect it to the production models.

Matt Clancy: I think you’re right. Probably somebody has because it’s a big literature, but I’m not aware of any. So it hasn’t percolated to the top.

Patrick Collison: In my view, there are these empirical facts that the models don’t permit. One that was very influential to me was the existence of the Cori lab at the University of Washington, St. Louis. Gerty and Carl Cori, who are themselves Nobel Prize winners, trained around seven other Nobel Prize winners. If one accepts that the Nobel Prize is a reasonable indicator of intrinsic merit and not just some contingent thing about your social network, then either it feels necessarily true that there is something intrinsic to the people attracted to this lab – and it was not a huge lab – or there’s something at the treatment level happening at that lab that subsequently yields these differential outcomes. But it seems empirically the case that people coming out of the Cori lab were different to the population of other scientists.

I don’t know how to connect that to these stochastic models where all these people are just, proceeding down their independent paths and sometimes they happen to collide with another particle and a discovery exists.

Alexander Berger: Can’t these both be true though? These abstract models might treat scientists as too fungible or unduly fungible, but still seem to capture this fun fact which is that we’re investing 20 times more. So it may be the case where maybe something changed socially or our training programs got worse. And now the average scientist is half as good as they were before. 

But it just seems hard for me to explain the phenomenon of 20 times more with the same output, except by virtue of something that looks like plucking low-hanging fruit. Like do you have another story in mind that explains the decline in per-capita output in your picture, Patrick?

Patrick Collison: I think it might be very nonlinear. In systems like science, you often get these nonlinear dynamics. An obvious one to analogize science to – though, I think you can over-extend this analogy – is entrepreneurship. Both are domains where we really care about the behavior of the tails. Until very recently there were almost no successful technology startups coming out of Europe. Now there are a few, but the disparity between Europe and the U.S. was incredibly striking. 

Well, it’s an open question what exactly the reason for that was. The basic existence of the disparity is super striking. Plenty of people were starting companies in Europe. Take a model where we assume uniform propensity to produce a giant success. Even if you’re kinda pessimistic about Europe – maybe you discount it by 50% – no model like that with only constant factor or small constant factor disparities can account for the actual realized disparity with closer to two orders of magnitude.

I don’t know, but some multiplicative model where there are five terms that matter. And if each of those is a half of or a third of that, then in the counterfactual you end up with this exponential dampening. Wild disparities and heavy tailed distributions are not abnormal.

Alexander Berger: I’m sympathetic to that point, but doesn’t it route back into your point about the oddly persistent 2% U.S. GDP growth for 170 years? The picture where actually the frontier is very malleable and extremely movable. I want to run into the law of large numbers and say that might be true for individuals. But when we look across countries and when we look across GDP growth rates, it just seems like you end up back at this picture of grinding out the marginal gains.

Patrick Collison: If I had to give you a more concrete and specific model that addresses that to some extent, Harriet Zuckerman found – and I’ll slightly misquote this figure – on the order of 70%, maybe slightly less of science Nobel Prizes awarded in the U.S., maybe globally, between 1900 and 1970 were awarded to somebody who trained under someone who received or would go on to receive a Nobel Prize. In its stylized directional sense, that finding is accurate. Again, this is thought-provoking along the lines of the Cori lab. 

But then two, there is some tacit knowledge. There is some set of practices as to how to do this truly breakthrough work at the frontier that is not uniformly distributed. In fact, this is only disseminated painstakingly and through one-to-one tuition and mentorship over some extended period of time. It may not be true, but if that were true, then I think the whole picture largely hangs together. Pigeonhole-principle-wise, you can’t have the influx of new people be sufficiently trained per this definition, so you end up with a meaningful differential in the realized productivity.

Alexander Berger: A lot of my view of this is that Nobel Prizes are the results of competitive tournaments. And similarly for hit papers. Outputs that might be only like a little bit better than the replacement level contribution end up accruing disproportionate wins. 

Like a tech startup that’s 2% more productive or has 2% higher TFP than another tech startup, might just capture the whole market and end up with super extreme returns. It’s not clear to me that the realization of extreme outcomes suggests that the productivity frontier is so much further out. We do see these realizations, but I think we have plausible alternative models that make sense of them.

Patrick Collison: I think that’s super fair and maybe a big question. One that perhaps we could bring empirical data to bear on would be: how valuable is the non-frontier work that does not win the tournament? What’s the shape of that distribution? 

I think there’s some evidence that it’s not super good. There’s the former BMJ editor who says that one should assume that any individual published medical paper is probably wrong. This is the former editor of the journal, not some external critic. The replication crisis is well known. Through my private experience in conversation with scientists, they simultaneously believe that some amount of fantastic work is being done, yet not only the median but even p70 is really not that reliable.

I think you’re honing in on a good question. What is the shape of the distribution in particular, setting aside the tournament winners?

Matt Clancy: I want to offer a different perspective. I wonder if there’s an issue of what the realized marginal thing is different than the ex ante, before we know what the outcome of the research project is going to be. Was it an equally valid thing? There’s this paper by Pierre Azoulay and co-authors that looks at what happens when a particular disease area with a particular scientific approach gets a windfall of extra funding from the NIH based arbitrarily on how they score things.

Patrick Collison: This paper was interesting and it cuts against some of my intuition.

Matt Clancy: They find that it still generates new patented medical products, but suggests that even the stuff that barely made the cut is useful. A big share of that, maybe half or more, ends up being used in patents for products that are different from what it was initially intended for. That speaks to it being very unpredictable. 

Most of that value is driven by a small number of high value wins that we couldn’t have predicted. But it still means that there’s value in the marginal applicants to the NIH. That’s my pushback on that.

At the same time, I peer review stuff. I read a lot of stuff that I’m like, “When was this ever going to be a valuable contribution?” But I don’t know.

Alexander Berger: The other example that we talk about a bunch over the years is the Karikó work on RNA vaccines. For many years, it was the marginal NIH project in some very deep sense. Then, it played a causal role in getting to the point where we could have COVID vaccines when we had them. It could simultaneously be true that hits explain the whole returns of science. And that the ex-ante unpredictability of them means that the whole enterprise looks pretty good. I think Karikó wasn’t the kind of person who looked ex-ante super likely to win that tournament. But it’s a bit tough.

I want to pull back a little bit from this cross-sectional variation, though, and ask about time series. Patrick you were saying, “Look, we’re investing 20 times more than we used to be. The U.S. didn’t just become Europe, we still have a pretty entrepreneurial culture. People still are going into science to pursue new breakthroughs.” What’s your story? What’s driving the slowdown if it’s not exclusively or primarily this plucking low hanging fruit dynamic?

Patrick Collison: I don’t know. The essay we wrote for The Atlantic explicitly proposes this as an important open question. I’ve tried not to affiliate too strongly with any particular causal explanation. The main thing to observe is that the mechanisms undergirding science have changed so much. 

People’s subconscious mental model is that there’s a natural way of pursuing this stuff. A natural way for grants to be distributed. A natural way for the work to get published. Whereas, in fact as you take these snapshots over the past three decades or so, it starts to look super different. The vivid example here is that when Einstein’s EPR paper was distributed for peer-review, he was very offended. He wrote to the journal editor asking, “What the hell is going on here? Why was it not just published as submitted?” 

Second, we have very substantially professionalized and institutionalized the practice of science over roughly the last 60 years or so. In broad terms the NIH’s budget is on the order of 50 billion a year. The NSF is on the order of 15 billion a year. That itself is an important thing to know that the NIH’s budget is so much larger. The NIH is a post-war creation. So much of the progress we’ve made as a society on infectious diseases and many other conditions happened before the current funding model and mechanisms even existed. 

If you go back and read contemporaneous work from people like James Shannon under whom the NIH budget really grew in the 1950s. Shannon had pretty strong views about the importance of scientific freedom among the scientists. That they would be able to pursue their work without too much concern about what committees or other individuals might think of it. The question certainly comes to mind to what degree we have managed to adhere to those founding principles and to what degree any of these institutional dynamics are of relevance.

We ran a COVID grant making program during the pandemic called Fast Grants. A significant number of the grant recipients were not themselves virologists. They were drawn from a fairly broad spectrum across the field because so many scientists were compelled to do what they could to help avert the pandemic.

Alexander Berger: Also, they were locked out of their labs otherwise. So if they wanted to work, it was COVID or bust.

Patrick Collison: Exactly. My point is they were drawn from a fairly broad set of fields and institutions doing all different kinds of work. We asked them a question at the end of Fast Grants – not about Fast Grants in particular, but just about their lives. We asked them not if they had more money, but if with their current money, they could pursue whatever they wanted. Because NIH grants are the bulk of the funding and are restricted to particular projects. If they could spend their current money however they wanted, how much would their research program change? Four out of five said that it would change a great deal.

I don’t know how different things would be if we existed in a world in which that number was one out of five, rather than four out of five. But, it’s certainly thought-provoking. I think the basic question is how much is it about the shape of knowledge and low-hanging fruit? It’s much easier to cure tuberculosis than it is to cure cancer. How much is it about some of these sociological, cultural, or institutional considerations? 

My personal belief is it’s almost certainly at least 25% institutional. And even if it’s only 25% institutional, it’s worth fixing that. But I think you could make the case with a straight face that there are so many other benefits that we can bring to bear today that we could not in 1920. And that it’s actually 75% institutional, but within that range I’m agnostic.

Matt Clancy: How much is institutional? I was asked to give my best guess at this by somebody at Open Phil once. I came to a similar conclusion that there’s a lot of structured knowledge problems that are very difficult to solve. Maybe AI will turn out to be a way to solve them. But otherwise the lion’s share of why there’s been this 20x increase in effort without a 20x increase in result comes down to how things get harder.

The institutional stuff matters and it is something that we can do something about. Whereas the other stuff, we have to take as part of nature. It’s worth investing our efforts in finding better ways to do things. Science is a slightly weird industry. The fact that people publishing the same number of papers per person per year hasn’t changed a ton is like an outlier. Most industries get more productive over time. Most industries improve, and they get better at doing their job. The labor productivity rises.

It’s worth thinking about why we do not have that kind of process in science. It speaks to science as an unusual economic activity. It’s hard to learn new and better practices and to identify better ways to do grantmaking or organizational design. Because the results are hit-based and they take a long time to play out, so it’s hard to observe.

Even now I think everybody has learned DARPA was a good model or they’ve decided DARPA was a good model, but it was based on hits that took a very long time to play out for an organization that was founded in the Cold War.

Alexander Berger: Well, if overall productivity in the economy has gone up by 10x or something in the last 100 years – it has to be more than that. But people are producing 10 times as many hamburgers per hour. If I took the same hamburger from 100 years ago and made it 10 times faster today, that’s still like a decent product. If you take a 1910 physics paper and publish 10 of them today, that’s not a 10 times better product. The tournament nature of discovery and the structure of knowledge makes it really hard to eke out those marginal productivity gains. Because you’re trying to grow the stock rather than the normal way we do economic activity where it’s usually more about trying to produce some flow.

Matt Clancy: But also knowledge evolves and interacts with our institutions in interesting ways. The burden of knowledge means there’s more teams. When there’s more teams, that is a different way of doing science. That means it selects for people who can work in part of a team rather than being very cantankerous. Outsiders who challenge all the conventional wisdom don’t have as easy a time in science.

It may also mean that I have to find somebody with a really specific set of skills, so I’m going to collaborate with somebody at a far away university who has that set of skills. Now we’re not going to be able to chat about our project as much as we would have if we were both down the hallway from each other. 

And one more, this torrent of papers, how do you keep track of it all? This leads to the rise of the use of metrics like citations and quantitative measures. These are better than nothing, but in an ideal world everyone would read all of the papers and decide what is the best work. People don’t have the time to do that, so they use other proxies. This pulls everybody towards reading the same narrow set of papers. That’s maybe another source of groupthink.

Patrick Collison: Two points on these topics. So one, quoting from an article that James Shannon, the NIH director, wrote in 1956: “The research project approach can be pernicious if it is administered so that it produces certain specific end products or if it provides short periods of support without assuring continuity or if it applies overt or indirect pressure on the investigator to shift his interests to narrowly defined work set by to source of money or if it imposes financial and scientific accounting in unreasonable detail.” And I’ve considered calling myself a Shannonist, in my attitude and my interests.

Matt Clancy: It sounds kinda like that CIA manual for sabotaging 

Patrick Collison: Right, he goes on to describe the importance of scientific freedom as I mentioned. Second, Matt, to the thing you were just saying, a very interesting fact is the disparity and the divergence between papers as they were written in say the 1950s and 60s and papers as they are written today. You can go back and you can read stuff from the 1950s and 60s and it’s readable and it is clearly written to be comprehended. There’s often a recognizable narrative. That is obviously not true today. Papers are frequently impenetrable to people in even merely adjacent fields, leave aside lay people. It can be hard to understand if you’re not in this particular domain and there are lots of metrics that attempt to quantify this.

But I just find myself questioning why that is? Some of it is that maybe the particular things in question are intrinsically more arcane. But I’m certain that that is not all of it. Some of it is some combination of sociology and incentives causing people to write in a far more arcane and difficult-to-understand fashion. I don’t know that that has grand significance in and of itself, although it presumably slows the dissemination of some discoveries, but it’s another epiphenomenon that suggests strange things going on in these institutional and cultural dynamics.

Alexander Berger: I want to backtrack a little bit in the conversation, Patrick, go back to your “at least 25%” claim. Let’s grant by hypothesis that productivity in terms of innovative output per scientist has gone down by 20 times which is a huge, huge amount. Is the claim that things can be 25% better than they are today if we adjust to the relevant institutional factors? Which I think is a relatively small improvement. Or is the claim that 25% of that 20x could be undone and so things could be five times better than they are today?

Patrick Collison: I meant the former, but I wouldn’t repudiate the latter.

Alexander Berger: Got it. And Matt, I’m curious for your take on that. When you were saying you tried to split it up between knowledge getting harder to find – low hanging fruit being plucked vs. more institutional, sociological factors getting worse – what’s your story there?

Matt Clancy: A five times improvement through just institutional tweaks, I would be shocked.

Patrick Collison: I’d only be surprised.

Matt Clancy: On the social science side of stuff, the effect sizes are usually smaller than that. But I also think that stuff compounds. If you get a little bit better in something like science, which is the input to so much and we build on it, it’s an accumulative thing. You get a little bit better over time then you get a 5x return in a century or however long it takes.

Alexander Berger: I think this is like a huge disagreement in my experience between economists and entrepreneurs. Entrepreneurs think in terms of orders of magnitude and they’re like big wins are 10 times or 100 times bigger than small wins. Then microeconomists are like, “Man, it’s really hard to eke out 10% gains.”

Patrick Collison: Yeah. But economists are concerned with crop production.

The U.S. did essentially no science of note in the 19th century like just none. If you’re thinking with typical econometric production intuitions you think, “Well, okay. Maybe we can go from nothing to slightly, slightly better than nothing in the 20th century.” In fact, we went from nothing to most of it. I think the culture in the institutions can really do a lot.

Matt Clancy: I think that was catch up growth, to use another economic phrase. Like we bolted to the frontier

Patrick Collison: Okay. But it wasn’t catch-up growth from like 1600 to 1700 say as you saw similar gains from the adoption of the scientific method.

Alexander Berger: I was going to say the same thing as Matt. It’s like the U.S. culture didn’t radically change. We built some slightly better scientific institutions. We invested some more in our universities. Then suddenly, we’re at the world frontier. There could be a story about culture as a cross-national narrative that implies a permanence to it. I actually think that’s often a better example of how quickly culture can change and how it’s wrong to treat it as fixed, rather than a feature of where you are in GDP or global trade networks or whatever else.

We spent a lot of time bemoaning the problems and how much worse things have gotten potentially. I’m curious, what are the best solutions? Patrick, what are you excited about?

Patrick Collison: Well, obviously we’re excited about Arc which celebrated its first birthday back a couple of weeks ago. The purpose of this podcast is not to advertise Arc, but maybe to just kinda briefly describe what it does.

It’s a biomedical research institute and scientists do the work at the institution itself. They moved their labs there. Arc provides flexible internal funding to the scientists – so they don’t need to apply for outside grants – for renewable eight year terms. Second, Arc is internally building technology centers or platforms for things like functional genomics, microscopy, certain animal models or computation – all ingredients that scientists might want to draw upon in pursuit of some research goal. But universities tend to not have an obvious way to support that today because of how grant and funding models work.

We definitely don’t imagine that Arc is some panacea or that everything should work the way Arc does. Our hope is that it can be a complement to the training systems and research systems as they exist today. If the status quo did not exist, Arc would probably itself have to look very different.

The 1956 Shannon piece goes on to describe how important a diversity of mechanisms and models is. Maybe the meta idea behind Arc is that there should not just be one system. In an ideal universe, there’d be many Arcs with many different methodological approaches, premises, and beliefs. Both being an intrinsic good because certain kinds of discoveries are more and less well suited to any given model, and also would be able to learn from the successes and failures as people try different things. That’s one thing I’m excited about.

Second thing, there was an extended period where for whatever reason people just weren’t that interested in some of these institutional questions about how we fund science. Maybe it was because the explosion of federal funding had not happened for long enough for some of the longitudinal intertemporal consequences of that to be evident in the way that they now are. I think some are actually just contingent. Now there’s a very vibrant set of people like Matt who are pursuing these questions full time. But the existence of this discussion actually makes me quite optimistic.

I mean, I don’t know how science funding is likely to look different in 10 years as compared to today. But the probability I ascribe to it being meaningfully different in some respects is a lot higher than it was five years ago. On some basic level, that’s a good thing. Obviously, the existence of ARPA-H is somewhat reflective of this shift in sentiment. We don’t know yet whether ARPA-H will work, but again I think its existence is a very encouraging fact.

Third, there are particular things happening in different areas of science. It’s hard to not be excited about. I think many of the things said and observed about biology and the prospects there are correct: new sequencing technology, single cell sequencing and stuff like this, RNA sequencing, the cross-product of machine learning with biology. I mean, it’s a TED Talk cliché to invoke that, but I think there are quite meaningful and interesting prospects there. I guess my third category would be particular frontier discoveries. If you wanted to tell a story about how the next 20 years will look substantially better, in terms of real scientific discovery, than the last 20 did a priori, I think you can make that case pretty credibly and I hope it’s correct.

Alexander Berger: I see Arc as a cultural intervention. Is the target of that intervention the broader scientific ecosystem, where most of the impact of Arc flows through changing everyone else’s behavior? Or is it mostly like a “We’re going to build a really healthy, fruitful community for these scientists on this campus in Palo Alto where they’re going to really make a difference”?

Patrick Collison: We think of it as the second. The goal of Arc is to do actual work and to make actual discoveries that others judge to be of significance. I think if Arc succeeds at that then it probably will have some of the former effect. It is possible that the former effect could, through time, come to dominate. But we don’t think of it as a cultural intervention. And I think it could only possibly be an effective one if it is actually very good at what it does. All of what we think about is the second.

Alexander Berger: I hear you that it can only work as a cultural intervention if it succeeds at the second thing, but isn’t the magnitude of the returns from the cultural success like so big relative to if you think about how many PIs are there at Arc?

Patrick Collison: There are four today, but there will be in the teens in the not overly distant future.

Alexander Berger: I don’t know how many life science labs PIs there are in the country, but it has to be thousands. Tens of thousands? Hundreds of thousands?

You’re in the ballpark of probably like one in a thousand, maybe less. Intuitively, the magnitude of the impact on the culture seems like it has to be really big – again, if you succeed – relative to the direct impact of breakthroughs.

Patrick Collison: Maybe. I don’t know if that’s true. I’m not a scientist, obviously. Arc’s actual scientist co-founders Silvana Konermann and Patrick Hsu who would kill me for what I’m about to say. I’ll just acknowledge that and then, I guess, proceed to say it. But if Arc were to cure some major disease could be an enormous deal. I don’t know where those breakthroughs will come from. I don’t know whether it’ll be from Arc or not, but whatever that place is, the effect of that place will probably be primarily the effect of the breakthrough and the discovery and not whatever second order effects it has.

I don’t want to downplay the possible magnitude of some of those successes. We are plucking from a super heavy tail distribution and seven sigma discoveries are in principle possible. I don’t want to sound remotely self-aggrandizing by suggesting that I expect Arc to do that. The base case has to be that Arc won’t just with any realistic humility. But in terms of what one aspires to or grants as at least theoretically in the space of outcomes, I think the first order effects truly can be the dominant ones.

Alexander Berger: Matt, are there other solutions to the problem of slowing scientific progress that you’re especially excited about?

Matt Clancy: Yeah. The big thing that I am excited about is an effort to tackle the problem: why is the productivity of science different than other industries? You could probably pin that on a lot of things. But two that I have focused on are: one, it’s hard to get feedback loops. Stuff takes a long time. Hits matter, so it’s hard to learn. Maybe Arc, for example, is actually the best model, but it’s a hits-dominated thing and you have a run of bad luck. Then, funders pull out – that would never happen – and we would never learn that it could’ve been the way to do it. 

Second, it’s not a private sector where funders who succeed grow their pie and they take all their methods spread and capital gets reallocated towards them.

People don’t have strong incentives to change their behavior. What the Institute for Progress and this movement is trying to do is fill in some of those feedback loops. You’re not going to learn just by casual observation what works. But if you team up with social scientists, pool together a lot of different data from a lot of individual scientists, get big sample sizes, and do it really carefully. Then you can start to identify effects that add up over time. Then if you have things like the Institute for Progress and progress studies movements and others creating cultural pressure or political pressure to reform these institutions and have them work with social scientists, then hopefully we can build an engine for how this thing becomes a self-improving engine.

And then we talked earlier about how cultural changes can actually matter a lot. If we could shift the culture from “the way we do things is the best way to do them” to “the way we do things is we try new things and we learn from them rigorously and then we implement the best data,” then that’s the culture that I hope we can build.

Patrick Collison: Alexander, you’ve been exhibiting admirable moderatorial restraint, but you’re very well-informed and expert on these questions. What’s your commentary on what we’ve discussed?

Alexander Berger: I guess, I think maybe more than both of you, I put a pretty significant amount of weight on the Chad Jones endogenous growth model, where the growth in the population that’s able to do science is a pretty big driver. That makes two things stand out to me as pretty important channels. One is around immigration for scientists or potential scientists. 

Notably, I think the highest functioning part of the U.S. immigration system is the H-1B cap exemption for universities. We already have this machine around training people to contribute to the scientific workforce, but as Caleb and others from IFP will definitely tell you, we have a pretty painful tendency to send people home afterwards.

Given what we know from the literature about how much more productive people can be when they’re surrounded by other scientists working on similar things and have similar caliber, I think the global economic gains from allowing people to cluster in the U.S. and to move the global frontier forward are huge. I do see moving the people to the most productive places as one channel. Part of what appeals to me is that we often talk about culture and how to change culture, yet I have been left feeling like I don’t know how to do that. That sounds really hard.

It’s hard to change laws, but these are very fundamentally like achievable policy changes. I mean, even in these last couple of years, right? The UK changed their high skill immigration laws to be radically better in a way that it doesn’t seem beyond the pale for the kind of change the U.S. could entertain.

The second idea that I’m surprised that neither of you emphasized is the progress in artificial intelligence. When you think about a scientist-driven thesis or like a population-driven thesis for scientific progress overall, a question is “Could we just have way more scientists by digitizing or like getting artificial intelligence someplace where it could improve the productivity of individual scientists or replace a lot of labor?” I think that could be a huge boon relative to material or huge cultural interventions for individuals like frontier scientists today.

Matt Clancy: I want to say, I very much agree. Artificial intelligence has been on the back of my mind for some of this. Even with the beginning when I was talking about these Chad Jones papers, he has another paper that’s about what if you can automate some portion of scientific production. In his model, that is a way to keep your scientific workforce as if it’s growing if the share of things that it needs to do keeps shrinking. Instead of having 10 times as many scientists, if we can focus the ones we have on just one tenth of the tasks – like setting the vision for their labs and being freed from a lot of the detailed work – then that’s another way that you can de facto get the same effects.

Artificial intelligence could potentially help deal with the burden of knowledge. We already have large language models that can summarize papers. Their ability to help you search through a huge corpus of text when you don’t exactly know how to articulate the search terms is pretty incredible. That could potentially get much better if you solve all these problems with hallucinations and you really trust the results. I could imagine that being a big deal.

Patrick Collison: Agree with everything you said. Some amount of AI/ML exists beyond some predictability event horizon. I don’t even know how to kinda speculate there. But we should at least acknowledge that so much is contingent on whether those curves soon saturate or continue to compound. There’s an obvious way along the lines of what you just said, Matt, where you get large language models or the successor to the large language model or some agent as an adjunct scientist or maybe we become the adjunct and it’s charting the way forward. There’s that category and it seems real.

The second one, just to call it out, is slightly more traditional machine learning. It’s possibly the case that across many different important domains, the character of the open questions are about how we understand or come to predict behaviors in these complex systems – with non-linear emergent phenomena as the phenomena of note. Biology really has that character, but a lot of condensed matter physics does as well or quantum matter and maybe other parts of physics. Certain parts of chemistry do. 

As you go across the lines, you can maybe make the case that the things that you can derive from formalisms we’ve derived and the next things you need predictive models for. We haven’t had those to date you, and you can probably make some kind of new mathematics analogy. And I think that second story could be true even if the first one isn’t. Obviously both can be true and they probably are correlated, but I think the first category is obvious and the second category I think is also very compelling.

Alexander Berger: Could we do a round of over- and underrated?

Matt Clancy: Let’s try it. 

Alexander Berger: Okay. We’ll switch off. How about PhD training?

Matt Clancy: I think that it is probably overrated since it is seen as the only way to contribute to the scientific frontier. The reason I think it’s overrated is the exact formal mechanism where you have to do it in an exact format with a department. The idea that you need to spend a lot of time learning knowledge I think is completely true. And I don’t mean that you can just skip that part. But there may be other ways to do it. In the future, large language models that we just talked about, maybe they’ll let you tutor things.

Long story short, I think there’s nothing special about the exact institutional makeup of getting a PhD. The knowledge that you get as part of doing a PhD, however, I think will continue to be just as important as ever.

Alexander Berger: Government funding for academic research?

Patrick Collison: Well, I would kinda say on the first one, that I think the right mentorship is underrated and the PhD itself, as Matt said, it’s overrated. 

For government funding, Everyone is preoccupied with how many dollars we should give. I don’t have a strong view on that question. I think the question of how and where to give those dollars does not get nearly enough attention.

Alexander Berger: Peer-review?

Matt Clancy: Peer-review is probably overrated by the general public. If you have the idea that it is this very strong quality filter, you’re probably mistaken. 

Underrated maybe by the peer-review skeptics who think that it’s just totally garbage and it makes everything worse. 

I’ve written some stuff recently about how it’s correlated with the best outcomes we can think of measuring. This is not that surprising because science is an institution where the value of knowledge is ultimately whether people like the experts in the field think that your contribution is useful. If you poll a group of two or three anonymously and ask, “Do you think this is a good contribution?” It’s not surprising that their views are correlated with how things eventually turn out.

But that said, I think it also has some issues where it can induce risk aversion. It’s extremely noisy. It’s a lot of luck of the draw and I’m not sure that the benefits of making everyone do it are worth that cost.

Patrick Collison: The scientists I know are, to me, surprisingly positive on it, given the set of breakthroughs that happened in its absence. But I will note, the scientists I know are actually relatively positive.

Alexander Berger: How about great man theories of discovery?

Patrick Collison: Probably overrated, relative to great scene theories of discovery. And I want to better understand the Cori lab phenomena and its existence suggests to me that maybe any single citing phenomena of interest and any single individual might be somewhat overrated.

Alexander Berger: Citations?

Matt Clancy: Again, this is like one of those audience-specific things. The circles I run and people hate citations because they think they’ve taken over everything. But I think that they’re probably understood as a vote of confidence. Well, sometimes you’re just citing because, “Hey, we have to cite everything relevant to satisfy our peer-reviews,” but there’s signal in there. If I’m going to build on your work, I’m going to cite your work. And that’s sort of, I think, what we want science to be. We want it to be this cumulative enterprise and citations are going to be correlated with it working well. 

If this cumulative enterprise is working well, we should be seeing citations. I wish there was a way that we could filter the ones that are really signaling “I’m building on this guy’s work.” 

There are surveys people have done on the question “Is this citation one that is really influential or not?” Like 20% of them are really influential, so 80% is some kind of noise. But still, you have a million papers and you’re looking at the citations and 20% of them are sending you a strong signal. I think that’s valuable.

Alexander Berger: Corporate labs?

Patrick Collison: It seems like it’s been a very good couple of years for corporate labs. 

Alexander Berger: How are they rated? This is always the question for over- vs. underrated.

Patrick Collison: They’ve had an anachronistic vibe for the past couple of decades. Those that persisted had these ignominious declines. Parc ended up part of Lucent and petered out. I think the rating was not that high and I think the last five years have changed and it’s not just the AI stuff. But I think you can look at pharma as this phenomena and I think over the last five years pharma has actually done pretty well.

Alexander Berger: And Matt, DARPA?

Matt Clancy: DARPA? Ooh, that’s tough. Possibly correctly rated. It’s hard to argue with their hit successes. It’s tough to know the counterfactual. Probably, I bet, we would eventually had internet without them. I don’t know if my critiques add up to like a big picture to say it’s overrated. My critiques are like, “Well, they can classify all their failures, so we don’t hear about them.” 

Also underrated about them is maybe there’s nothing special about program managers and the whole field strategist. Maybe the secret sauce is just a mission that people really believed in: “we had to save the country from these existential threats.” We got super intrinsically motivated people who are really talented to come work and then we just trusted them with money. And that’s the secret sauce rather than a specific institutional or organizational setup.

Alexander Berger: It’s a bullish hypothesis.

Why don’t we end on this question: how special is science? So we’ve described this general decline in scientific output per person. What about other kinds of creative endeavors? Are arts, music, and film also declining in the same way? Or should we be looking for science-specific explanations?

Matt Clancy: So I think there’s a lot of commonalities. Like there’s like we’re trying to pull things from the unknown. The jobs of artists and entrepreneurs and scientists have some commonalities. There’s obviously differences with the role of government funding. But I think to the extent you think the burden of knowledge is important, you could see some commonalities. 

I also think that this issue of a torrent of content in the media landscape and science exhibits similar problems. In science, we follow citations because it’s too hard for everybody to read all the papers. Even if you read a random subset and Patrick reads a random subset, as there’s more and more papers the chances that we overlap gets smaller and smaller. We need a coordinating device and we use citations. 

I think in popular media we’re using coordinating devices like franchises and TV shows. They can build up this word-of-mouth. And that’s in some sense of franchise, right? Every week is another version of the same thing that came before. Or anchoring on superstars, everybody knows we’re all going to listen to Taylor Swift this week because she has a new album, rather than any of the thousand people in the same genre releasing stuff.

Patrick Collison: I mean, that’s a great question. And I haven’t thought about it before, but I think we can probably take hope from looking at some of these other domains. 

Something totally unrelated to science that I find interesting is that no nice neighborhoods have been built essentially anywhere in the world since the Second World War. I’m sure exceptions exist, but I have been asking for nominations for exceptions on my website for several years and I’ve gotten remarkably few. Building a neighborhood is a collective pursuit with some parallels to the practice of science. There’s presumably no upper limit on the number of good neighborhoods we could have. Who knows if there’s even an upper limit on how nice the neighborhood could be.

I think it’s a grim story with respect to our urban fabric that we’ve gotten terrible at this. We’ve lost the technology, so to speak. Nobody thinks that’s a story of low hanging fruit and exhausting the nice neighborhoods or things about the intrinsic structure of the urban landscape. Instead, I think it’s necessarily a story of zoning and culture and tastes and who knows what else. 

But in some way the existence of that phenomenon is hopeful because if we have examples of domains where the regression is demonstrably, necessarily cultural. It should elevate our priors that this is the causal agent in some of these other complex, collaborative systems. I think it’s a good question, but I think on that it makes me hopeful.

Alexander Berger: Thanks for joining us.

Caleb Watney: We recorded these sessions with several other workshop guests in the room listening in. After this conversation between Alexander, Matt and Patrick ended, Tyler Cowen, economics professor at George Mason University, jumped in to share some thoughts.

Tyler Cowen: Hello, this is Tyler Cowan. I’ve been sitting in the room listening to this discussion. It’s hard to just have to sit and listen.

I thought it was great, but I have a few comments and observations. My first is the general frustration that when people talk about the productivity of scientists, they don’t look closely enough at the wages of those scientists. It’s not a perfectly competitive market, but there’s lots of bidders. All the hypotheses that imply that scientists are simply more afforded by obstacles seem inconsistent with the general data. In most areas, real wages of scientists have been going up.

I don’t think that’s the phenomenon. I think just their marginal productivity is going up. Now whether it’s a marginal productivity in creating science, I think one can challenge that. It could be more and more we deploy them to create status. The status may or may not be positive. So but that’s one thing I would add to the discussion – to look more closely at wages.

The second point, I think you’re all under rating the low hanging fruit hypothesis. 

Patrick Collison: Alexander’s not.

Tyler Cowen: No Alexander’s not.

Matt Clancy: They can’t see. The listeners can’t see but Alexander is shocked and waving his arms.

Alexander Berger: Mm-hmm.

Tyler Cowen: If you look at biomedical areas, there’ve been phenomenal advances lately. But we all know the sector has become far more bureaucratized. It can’t be that all of a sudden we started doing something right.

So maybe just the new general purpose technology, some notion of computation writ large, as expressed through the internet and what was behind mRNA, and now it’s leading to many innovations. We hope over time it raises morale and the cultural profile, and gets people more positively psyched, and that will be a positive feedback effect. I see anecdotal evidence for that. But it seems to me the most likely hypothesis is that in some – but by no means all areas – we simply got this new general purpose technology. I’ll call it computation. 

All of a sudden these major bursts – not explicable any other way. At some point, they’ll be over. But in the meantime, we’re now going to have this long run. And not Alexander, but I think the rest of you didn’t assign enough weight to that as a possibility. I invite responses.

Matt Clancy: All right. I’ve got two responses. On the first point about wages, I think that one way to think about this is that the return on R&D is another economics question. Why invest in R&D if everything’s getting so hard, and it’s getting 20 times as hard? The classic answer here is because the economy is getting so much bigger. Marginal gains become so valuable, and then this equates things.

Patrick Collison: Which I think I was going to say the same thing. I think that was an important point and you can totally have a model like in the Bloom, Jones, Web and Van Reenen paper. They make the case on the basis of what’s going on in the ‘A’ parameter. If it were the case that the marginal productivity of a scientist was constant with respect to the size of the economy, then you would expect to see comparable exponential returns in their real wages. But to get near monotonically increasing real wages, I think you can totally get that in a world where the realized productivity is exponentially declining. But it seems

Alexander Berger: That seems why we should believe – with such a strong prior – in the pluckability of fruit dynamic, right? Because you just read it off of the natural observation in the world of 2% growth rather than singularity growth. Or growth that already went to zero centuries ago. It seems like the problem getting harder and the world economy getting bigger have to be in a competing race that are canceling each other out. In order to explain this weird, weird to balanced phenomenon that we’ve observed so far.

Tyler Cowen: It seems to me we have a lot of scientists who work only locally in firms. We may call them engineers. But they’re scientists of the sort. Their wages have also been going up. It seems unlikely to me that the larger world economy is off-balancing the greater obstacles. So I think the private productivity of scientists hasn’t been rising at slower rates. But the thing that has changed is what Patrick called the scene. Maybe we have fewer scenes. And in the scenes, the social and private returns diverge so dramatically. How much did Picasso really get paid for cubism? Well, he was in fact quite rich, but not compared to the total value of helping to create modern art.

So periodically, you get these scenes. The question is then – this is consistent with some of Patrick’s points – why do we have fewer scenes in some areas for extended periods of time? That’s a very different question than focusing on the obstacles facing scientists.

Matt Clancy: I wanted to make one other point about the wage returns of science. The decline of the corporate lab is another indicator that firms were not seeing the much value from investing in science for their own. I think that’s consistent with a low-hanging fruit argument, or just something going wrong with the returns to science.

On the question of low-hanging fruit, my answer is really boring. I think it’s very important, but I didn’t have much more to say about it. The share of the conversation talking about it was more like the share of the conversation about problems that are maybe addressable, rather than the share of what I think is actually the big picture behind the scenes.

Tyler Cowen: So maybe there’s this multiplicative model. You need a scene, you need cultural self-confidence, you need a new general purpose technology in your area. So you need all three, maybe a bit more. If you try to measure the marginal product of any one of those, you’ll be baffled. It often will look like zero. Sometimes look near infinity, or super large. But maybe the purpose of policy, broadly construed, is to bring together the factors needed for the multiplicative model to operate. Figure out in a particular setting which is scarce. If you’re setting up say art, you figure which of those are the things we need. 

In general some of the comments, they seem to me to be underrating demonstration effects. I think that you can demonstrate something. In my mind is more powerful than how I felt you all were describing it. And if Arc does say fix Alzheimer’s, apart from the direct benefits from that, I think the impact will just be phenomenal. Going back to Alexander’s cultural question, you see this in culture. Like The Beatles do something and everyone chases after that. What OpenAI did, you now see so many people chasing after things like that. 

A final comment, I would up the importance of these demonstration effects, and we should think about those when choosing what to do.

Matt Clancy: Thanks, guest star.

Caleb Watney: Thank you for joining us for this episode of the podcast series. Next episode, we’ll dive into the three core inputs to the scientific production function: the funding we need to run experiments, the minds who come up with those ideas in the first place, and the infrastructure to make it all happen.

Episode Three: “The Scientific Production Function”

Caleb Watney: Welcome, listeners, to this episode of the Metascience 101 podcast series. 

In this episode, Kelsey Piper, a writer at Vox, leads a conversion with Adam Marblestone and Professor Paul Niehaus. Adam is the CEO at Convergent Research, working to launch new science institutions using a model called “Focused Research Organizations.” Paul is a professor of economics at UC San Diego and a cofounder of GiveDirectly, a nonprofit focused on ending extreme poverty. Together, they explore what makes science tick, including the funding ecosystem, the labor force, culture of scientific labs, and the fundamental search for important questions.

Kelsey Piper: Adam and Paul, you both work on science and the process of how scientists pick which questions they work on, who they work with, how they get funding, how they get other access and resources. Paul, you work on this mostly as an economist in social sciences, and Adam, a lot more in the life sciences. 

What we’re really excited about here is comparing notes about the process of science: what’s holding it back, what it would look like to do a better job of deliberately producing valuable scientific research, and how that differs across fields. 

Paul Niehaus: We have been excited about this conversation. Adam and I both sense that the issues of problem selection – of deciding what to work on – are really big and important ones. I’m hoping we get into that.

Then having chosen a problem, questions of how you execute on that, how that’s changing, how the requirements and the skills needed to do that are changing, and how the funding models do or don’t support that. These questions interact with what you choose to work on in the first place and whether you feel prepared, equipped, and resourced to tackle a problem. 

Kelsey Piper: Do you want to speak to how you see that playing out? How do scientists pick which questions they work on? What are the big driving factors?

Adam Marblestone: Sometimes people think about this as grants and peer review constraining people in terms of what problems they can propose. I see it a bit more meta than that or a little bit more as layers of structure. Two observations: one, individual scientists might want to do things that somehow they don’t have a structure or mechanism or incentive to do. And two – this is the one that I’ve been more focused on – if you sort of take a macro analysis of a field and you say, “Well, what would be the ideal thing to drive progress in that field?”

There’s a question first of all whether scientists are able to work on that thing. Maybe that thing requires a different shape of team, or requires a different level of resources to actually go after the biggest leverage point in a field. Maybe they’re not even incentivized to spend the time to write down and figure out what that leverage point even is in the first place.

Paul Niehaus: You’re talking about having something like a product roadmap in a company. Having an analog of that for science and being able to map out a longer term vision for where the thing needs to head and whether people actually have the time, resources, or incentives to do that.

Kelsey Piper: When people don’t have that, when there’s not a larger roadmap, is that mostly a lack of a person whose job it is to build a large roadmap? Is it mostly a problem of short term grants that incentivize short term thinking? Is it about turnover? What’s going wrong?

Adam Marblestone: I’m really curious how this differs across different fields. Something that I saw in neuroscience, for example, is that there are several big bottlenecks in the field that are sort of beyond the scope of what an individual scientist can just go after by themselves. Scientists are often rewarded for technical excellence in certain areas, but those areas are selected for. Those areas are themselves selected for areas where individual people can have technical excellence in that thing. 

Maybe you need the equivalent of a project that’s more like building a space telescope or something like that. That’s not something an individual scientist can do. Then you might say, “Well, if they can’t do that for their next grant or their next project, are they even incentivized to think about the existence of that? Or whether that thing should exist in the first place?”

Kelsey Piper: The ideal of tenure as a system was somewhat that it would help with that. You get away from the pressure of keeping a job and can think about big questions. Having demonstrated that you can make intellectual contributions, can make the ones that seem to you like the ones that really matter? Is tenure a system that functions to achieve that?

Paul Niehaus: There is an implicit theory of change. In general, the accumulation of more knowledge is going to be good, and it’s hard to know what kinds of knowledge are going to be useful. So it’s just good to let everybody be curious and pursue whatever they’re interested in. That’s the unspoken theory of change that a lot of people absorb when they come into grad school.

I think of it as optimal search. If you want to search a landscape and find the best alternative, there should be some degree of noise in that process. You could think of it as having some people that just do whatever they’re curious about. Because I might sit here and say, “That looks like the best direction,” but I could just be finding some local optimum. That could be the wrong thing and I want to search globally, so it is good to let some people be curious and explore. Sometimes you’ll have some of the biggest hits come through that.

But it is also good to have a big part that is more directed, where you have a pretty thoughtful theory of “I think this will be useful because it can create this type of value.” I don’t really see much of that type of thinking: no frameworks and no teaching or training in that. That’s really sorely missing in the social sciences that you described. 

For me as a development economist and as a founder of GiveDirectly – which does cash transfers to people living in extreme poverty – an example of a very motivating and focusing question is: how much money would it take for us to end extreme poverty? That’s actually a tractable question in that we’re close to having enough evidence to come up with pretty good numbers. 

In the past, people have tried to do these things, but they’re based on a lot of very tenuous assumptions about what the impacts and the returns of different things are going to be. But I’m talking about a relatively brute force approach to it. I’m saying, “let’s find everybody and figure out how much money they need to get to the poverty line and get them that much money.”

That’s the assumption I need, but there is actually a bit more to it than that. I need some statistics for the targeting of this that don’t really exist yet. Clearly, I need to start thinking about the macroeconomic effects of this kind of redistribution on this scale. For example, what would happen to economies and prices? 

What I find exciting is it populates a whole roadmap – a research agenda that we can split up. Different people with different technical skills could work on different parts of it. We all understand that what we’re working on feeds into this broader whole, which is this vision of being able to tell the world this is what it would cost.

By the way, I think it’s something that we could do. It would cost us a fraction of a percent of our income if we all donated it. How motivating would that be? 

I think it’s great to encourage people to think about exercises like that. Imagine that you want to solve this problem or you want to make this decision, even if it’s not something you’re doing today, what would you need to know to do it? Then, build a research agenda around that.

Adam Marblestone: Do you think that that will spawn other questions that would actually lead to us being able to give those people that money? It seems like the obvious first step is that you have to know this. This is kind of the beginning of that roadmap: “let’s quantify, what’s the machine I need to make to end global poverty?”

Paul Niehaus: Yeah.

Adam Marblestone: What comes next? 

Paul Niehaus: Part of my theory is definitely that if I told you the number and it was low, you would say, “I’d be happy to do my bit.”

Adam Marblestone: Mm-hmm.

Paul Niehaus: If you’re telling me that if everybody gave 1% of their income, we could end extreme poverty, I will sign up to give 1%. Because then I’ll feel like I’ve done my share. Yes, I feel like that could be a powerful motivator. To get there, we have to have a number that we believe and that’s well backed by science. It’s fun to figure out what that science would need to be.

Adam Marblestone: Is there an obstacle to you going and starting this thing? Is it whether you can get an NSF grant?

Paul Niehaus: That’s a great question. I think it’s time. You’re right that with a little bit more time and with flexible funding, you could build a team around that. That’d be really exciting. 

Adam Marblestone: On the idea of tenure, my guess is that it works better in some areas than others.There are certain fields where the core of it is basically, “what does that professor have the freedom to think about and to teach their students about?” Then, the students are absorbing the intellectual tradition of the professor and that’s the essence of it.

Some factors make it not as simple as that, though. In biology, it’s pretty heavy in terms of needing a labor force. The students, postdocs, and trainees in biology are also effectively the labor force of doing experiments. If you’re the professor, you need to be able to get the next grant, which supports that next batch of students and postdocs. The students and postdocs need to be successful enough that they can get a grant of a certain size that would support, in turn, their own trainees one day. And on this first grant, you need to get preliminary data for the next grant.

There is also this need to not mess up your students’ careers – if that makes sense – by choosing something too crazy. This has a pretty strong regularizing force on what people can choose to work on. Students will potentially select against professors that are doing something that’s too far out there, even if that professor has tenure. 

Paul Niehaus: This feels to me like something that social sciences and economics needs to be somewhat worried about. There are all these things that have changed in the last couple of decades, which I see as super positive: that it’s gone from being primarily a theoretical discipline, to primarily an empirical discipline.

By the way, for people listening, if you took undergraduate economics, you might still think that it’s primarily a theoretical discipline, but in fact, what most people do now is get huge quantities of data from the real world and analyze it.

I think this is great. We’re more connected to reality than we were in the past. At the same time, it takes more money and it takes more people. We’re starting to look more like hard science disciplines where all the dynamics that you’re talking about come into play, but I’m not sure economics is thinking about that and about the impact that’s going to have.

Adam Marblestone: I don’t see this as inherently a bad thing. It’s okay if projects become more resource intensive or more team intensive. It makes sense as you deal with more and more complex systems, right? 

On the other extreme, not necessarily that you want each individual neuroscientist having to learn how to – it is not quite this extreme – in the olden lore, blow glass to make your own patch pipette to talk to that neuron. And you’d write your own code, you’d train your own mice, you’d do your own surgeries, and you’d make your own buffers, reagents, chemicals, and everything like that. There’s this sort of artisanal tradition.

It’s a good thing, potentially, if there are more teams and more division of labor. But it does mean that it’s more expensive. The model of how you achieve that labor is still stuck in this world where it’s more modeled on the theorist – where the primary goal is to transmit a way of thinking to your students – or modeled on apprenticeship, where students learn lots of skills, as opposed to what does it take to get that job done? What are the resources that I need? 

You have a lot of people that are working in this very labor-intensive, very capital-intensive system, where they’re nominally learning an intellectual discipline, but they’re also kind of participating in this economy.

Paul Niehaus: Yeah. On the funder side, I feel like there’s very little. We’re in this world now where sort of research capital matters a lot, and what things happen is largely a function of what things can get funded. But at the same time, I don’t feel like there’s much return to being good at allocating the capital. It’s largely seen as a chore. 

I get invited to serve on these committees where we decide who’s going to get a grant for a project. It’s something that you do out of a sense of professional obligation, but nobody is thinking like, “Wow, I could have such a huge impact and this could be like a big part of my legacy to be the person that picked a project that ended up being transformative.” 

The same way that if I were like a VC, I’d be like, “Yeah, there’s going to be one entrepreneur that I pick and bet on that’s going to make this firm, make this fund, make my reputation.” 

There isn’t anything like that, so I do it as quickly as I can and then get back to my own work. But maybe I should be incentivized to really invest in that and figure out how to get good at it.

Adam Marblestone: Yeah. I would go much further and say that there is a role for strategists, architects, and designers, in terms of what we need in a given field. 

I’m curious where this lives in economics and social sciences. But it’s definitely a problem that we’ve come across in the course of thinking about how to map neurons in the brain or something like that. 

Well, it turns out what I need is a microchip that has billions of pixels, switches billions of times a second, and is transparent. I need something that actually requires an engineering team. It needs a totally different structure than myself or my students or my postdocs or what a biology researcher would have. 

You may identify that that’s what’s needed, but then you forget that thought quickly because there’s no way you’re ever going to be able to control a big division of a chip company in order to make that thing.

So you go back and say, “Well, what’s the next actionable step I can take?” Ultimately, that starts to really shift things. You’re no longer on the basis of what’s the actual best thing to do. You’re talking about what’s the best thing to do within a very limited context of our action space, assuming that all the other actors have certain incentives.

Paul Niehaus: I like that. We should have a ledger somewhere of ideas that died a quick and sudden death in Adam’s brain because he didn’t see them as viable. Maintaining a list of these things is what we’re missing.

Adam Marblestone: Or maybe it’s that they take too much design and coordination. People say writing grants is a sort of tax on people’s freedom, but I actually see writing grants as a time when multiple researchers are incentivized to coordinate. They can go in together on a funding opportunity which actually causes them to spend however many hours, brain cycles, and conversations co-designing some bigger, more targeted set of actions that are more coordinated with each other.

That’s only up to the level of writing a grant of a certain size on a certain time scale and with a certain probability of success of getting it. Instead of three labs in different parts of biology or physics or engineering coordinating to write this grant and then we can get a million dollars, what if we’re actually trying to find the very best one across the entire community and then that thing gets a billion dollars. What’s the right scale of coordination and planning?

Planning on these different horizons is seen as something the NIH or the NSF is doing, but then they delegate a lot of the decision making to peer review committees that are much more bottom up saying, “What did we get in? Which is the best one?” rather than what’s the ideal, optimal thing to do at a system level.

Paul Niehaus: One thing I’ve seen a lot of – which has really struck me – is that a lot of universities have this sense that it’s important to stimulate and encourage interdisciplinary work. You mentioned collaboration between multiple labs, but also working with engineers or people in other departments. The standard reason for why we want to encourage that is because we think that the social problems we want to speak to are getting more and more complicated, and that no one discipline has all the tools that you need to address that. 

You’ve given some examples that are consistent with that. But we sort of realized and we’ve talked about this at UCSD. None of us really knows who are the right people to go to in computer science about a particular thing that might come to mind.

When we try to sort of artificially stimulate that by having joint hires or mixer events where everybody comes together, that just relies on serendipity and it really doesn’t seem to work very well. The hit rate is not very high. I’ve been interested in this idea that what we actually need is to articulate some problems that help define who should be in the room to help to solve them. 

Not “I’m going to hang out in computer science for a bit and see if I meet anybody interesting,” but more like, “Here’s a problem that I’m really motivated to solve. I know I need a computer scientist, and I have to figure out which one would be the right one. Then, we write a grant application together.” To me, it’s putting the cart before the horse to say we need interdisciplinarity to solve social problems. You start with a problem and figure out how to put them together.

Adam Marblestone: I think there is value in random collisions. But there’s value in this very circumscribed space where the best outcome is a couple of these people writing a grant together. But what you really want is an industrial-grade level of coordination, planning, and systematization. That’s not to say that there isn’t a lot of serendipity and things bubbling up there as well. But it’s interesting that we both see this planning or coordination gap.

Paul Niehaus: When you say industrial grade, what do you mean by that? A lot of people get into the profession and academia because they really cherish the freedom to work on whatever they want to work on. They don’t want anybody to tell them what to do. 

As we’re discussing, there are actually a whole bunch of constraints that really limit and narrow what you can do. So that’s all still there, of course. But I think a lot of people are very resistant to anything that feels like somebody is telling you what to do your research on. 

At the same time – as you say – in order to get the right teams together to tackle these big complicated problems, it’s actually really critical that somebody is thinking about this. Who would be the right people? Maybe there’s a soft leadership of getting a bunch of people excited about a shared project or vision because they can see the social value that it could produce.

I don’t think many of my colleagues see that as part of their role, but that could be an exciting role for someone to play .

Adam Marblestone: Well, I think there’s a question of non-academic skills as it applies in research. Who’s the best person to collaborate with in computer science – there’s a lot of assumptions behind that, right? 

There’s an assumption that the person is a professor who’s studying a research question in computer science, and they have the labor force that is their students. What if the best person to collaborate with in computer science is a 20 person software engineering team or something? I don’t know. 

I guess my interest in this is: what are the processes that lead to identifying the best, ideal actions that could be taken within the space of the next steps in research? Then, can we work backwards from that in some way? Who articulates that? Whose job is it to articulate that? 

And you may be wrong and a huge amount could be serendipitous. It’s not that there’s one dictator that describes that. But is there a process of figuring out what this research field should do that goes beyond local actors?

I mean, it’s interesting to me that you see the same thing. I’ve often thought of this as well. If you think about neuroscience or biology as my home field, the brain is just so complicated. We need so much engineering. We need so much to deal with it. It sounds like some of what you’re seeing in social sciences has a similar character

Paul Niehaus: I don’t know that it’s a function of the complexity so much. I think that the interfaces between the university and the outside world play this really critical role in sort of guiding us and giving us a sense of what is actually worth working on. That happens right now in a fairly haphazard way, at least in my discipline. 

There are individual people who are super motivated to engage with policymakers or with business leaders, or with nonprofit leaders. They build these relationships and learn about what kinds of questions people are having. They end up becoming the arbitrageurs who bring those things back into the field and communicate about them to other people. But it doesn’t happen in a very systematic way. Especially for a young person who doesn’t have those relationships yet, maybe hasn’t had a lot of training or background in the skills that would be needed to build those relationships, it’s tough. 

I see a lot of people start out and they quickly feel overwhelmed by just the volume of stuff that they have to learn in graduate school. “Oh my god, I just need to do something that has not been done before to prove that I’m competent and get a job and then get to tenure.” That totally resonates with me. And I get that. 

But I think it’s exciting to think about how to design universities and departments, where there’s more intentionality in these things – where there’s a systematic effort made to help connect young researchers to the people in the outside world that they need to be talking with to get a sense of what problems are likely to matter and make a difference. That could be part of my role as a mentor, an advisor, and an institution builder, and not just something that we leave to chance. 

For example, I had a student recently that really brought this home to me. He has a really beautiful job market paper on FDA regulation and medical device innovation, which I thought was a great project. I asked him, “Who are you talking to in the medical device space about this stuff?” because we’re in San Diego, which is a hub for medical devices. And he said, “Nobody.” It really stuck with me. He’s a very entrepreneurial student by any standard. It’s not a knock on him. Nobody sees that as part of our function to make sure that you’re engaged with the part of the outside world that’s relevant to you. That seems to me like such low hanging fruit.

Adam Marblestone: At some level, the research has goals and it is a question of how is it doing relative to those goals? This idea of a totally bottom up, totally creativity-driven research. But in some sense, a project like that has some societal goal. 

Part of what you’re saying is just inevitable, right? I mean, a graduate student needs to find a bite-sized project that they can hone their skills and prove their abilities, right? That’s just core to what grad school is about. 

Kelsey Piper: I feel like it keeps coming up that this is a system that no one would have designed for the purposes it’s serving for. Partly because what it does has changed over the last couple decades, both in economics and in the life sciences. Another part of it is that no one was really designing it. 

I’m curious, pie in the sky, if you were designing it, what would it look like? Not just making some of these changes, but if you were trying to design a good academic science process, what would you do?

Paul Niehaus: Lovely. I think we’re trying to field that out. At least for the social sciences, I think that one thing you’d have is much more intentional investment in the boundaries between the university and the outside world.

Right now when people come into graduate school, they get really good, high-quality training on what we already know, and then are left to themselves to figure out what we don’t know that would be worth working on. Those two things would be at least roughly equated if you’re designing a program from scratch. You’d have people start thinking and talking about it from day one.

We give people two years of training on tools before we expect them to start doing stuff. I think what you do is: from day one, we’re going to be talking about what’s important and what we need to know. They’re constantly iterating and thinking about that, and the tools are somewhat more on demand. More like, “Once I figure out that this is the problem I’m going to work on, then I know I need to go and learn how to do this or how to do that.” I think this would be much more flexible. In terms of pedagogy and the way you’d structure it, I think it would look a lot more like that. 

There are people who think that we want to change the incentives in deep ways as a way of getting at this. Instead of tenuring people based on how many publications they have in top journals, let’s tenure people also as a function of how much real world impact they’ve had. Let’s look at some of these other metrics. There are some efforts underway in this direction and I think it’s interesting. There may be some scope there, but I have some doubts about it. I have pragmatic doubts that all that much is going to change.

My deeper question is that this stuff is really hard to measure, and I think it can open the door to a lot of politicking and a lot of favoritism. One of the things that’s nice about our system – imperfect as it is – is that nobody disagrees about how many publications you have in the top journal because that’s how many you have. It’s a little bit harder to bring in your friends and things like that. 

My instinct is actually to worry too much about that, but to focus on the real challenges of figuring out good problems to work on. It’s a really hard problem.

Adam Marblestone: A couple of interesting observations there. One is that there’s something that wasn’t ever really that purposely designed. There were some principles in it, but a certain number of institutional structures or incentive structures have ended up getting scaled massively over time. When you come back and you say, “Well, what if we design it differently?” that feels like top-down interference now. The thing that has scaled is something that has a lot of role for peer review, for the individual. I mean, you get to choose what grant you submit. And other people who are your peers will be on a committee and they will review it. It won’t be some program officer or some person from industry or some philosopher who says, “No, you actually have to do this thing because this is better for society” or something like that.

Who else can judge what these biological scientists in some really niche field can do except other biological scientists in that really niche field? That makes sense that that has emerged. You can kind of understand why this thing has scaled. It’s kind of democratically very appealing. If someone else is interfering with you, you say, “No, no, no.” But if it’s your peers, “Okay, that’s all right. They can interfere.”

What I would design is not really one thing, but it’s just much greater diversity of different tracks, structures, and incentive pathways, within science very broadly. Certainly, there’s a role for the form of training that emphasizes technical excellence in certain areas and emphasizes finding your own niche that’s very different. Your PhD thesis, by definition, is something that’s different from somebody else’s PhD thesis and represents your own skills. 

There should be a track that is what we have been discussing like a field strategist track. That’s more about the road mapping or problem identification. There should be tracks that are more entrepreneurial of how you grow and build a larger non-academic structure that’s meant to accelerate science or that’s based in science in some way. 

I think some of that is emerging organically, and some of it less so. Y Combinator, deep tech startups, and the startup boom has had a huge influence in terms of how students and postdocs see their careers. One option is that you go into academia, the other option is that you go and co-found or join a biotech startup. And that’s a very different mindset.

You do see that filtering back. When you are that grad student, you’re thinking about that pathway, and you potentially prioritize what you’re doing differently. But maybe there should be many, many more of those types of tracks or niches. Maybe there should be certain systems that are more optimized for very early stage fields and very early stage discoveries where peer review looks very different. Then, a different structure is put in place for more mature fields, where you’re filtering out the noise versus generating any signal in the first place.

It’s a diversity, a much greater diversity of structures would end up being designed. They would circumvent this problem of “Oh, there’s this dictator saying how science works or how individual scientists work.” It is more that you have enough different ponds to swim in that you can choose. 

Paul Niehaus: Could I pick up on the dictator thread? Also what you said earlier about peer review and thinking about funding particularly. We’ve been talking a lot about the way you could design a university or journals or gatekeepers differently, but the funders are obviously an important center of power for all this. 

One slightly controversial view that I’m coming to is that peer review is something that makes you feel safe that your opinion is never all that consequential. Nobody actually has to take responsibility for the decision. Another word for peer review in the rest of the world might be “decision making by committee.” 

Is there room for funding models where individual people take on more responsibility for the decisions and are identified with the success or the failure of those decisions? They’re free to do things like, “I’m going to make a bet on this person because I think the kind of things they’re doing are great.”

Adam Marblestone: I think this is a huge issue. 

Why is it so hard to design these? Why hasn’t the world just emerged with lots and lots of niches and program structures and incentives? I think part of it is that funders are also in their own kind of evolutionary struggle. If you’re a new funder and you come in and say, “I want to do something different,” well, who judges that? If you’re funding this area of science, there’s no notion of expertise other than the luminaries in that field. If you don’t have endorsement for your program or who you funded from the luminaries in that field as they exist now, you as a funder will not have legitimacy.

You have to have something that has enough horsepower or strength to bite that bullet and say, “Look, we’re making a choice. We think this person has a vision, and we’re going to let them do this.” By definition, there will be peer reviewers that will say, “This is not as good as what you could have done with a hundred R01 grants or the more traditional structures.” What is it that allows you to survive that shock either as a funder or as an individual program officer?

The system has proliferated and it is judged by its own members. And there’s also no obvious alternative to that. Science is so intricate that you couldn’t really ask a product manager to judge what the scientists are doing… unless you could, right? DARPA kind of does that with program managers.

Paul Niehaus: This is a core issue for economics as well. I’ve really been struck by the lack of diversity in funding types. Most funding is at the project level, but we’re moving towards a production function that is much longer term and requires larger teams. You want to set things up so that I have an incentive to invest in the culture of my team and to invest in training younger people because they’re going to be with me for a long time. And that there’s room for them to grow within the team and take on more responsibility. All those things that you think of as part of building an organization.

But the funding models don’t support that. The funding models are like, “Well, this looks like a good project.” And so, we might spin up a team, do the project and then wind it back down after a year or so.

Adam Marblestone: What would be your ideal, if you want a way of doing this type of research? Would it look more like something where students don’t cycle out or grants don’t cycle as often?

Paul Niehaus: Yeah, so what we’ve said. We’ve been able to raise some funding like this for some of the things that I’ve worked on. I think you want a diversity of different types and different models. 

You want to have some that can be on an initiative basis with some sort of broad agreement about the scope of things that the research team is going to tackle and the kind of team you need to put together to do that. Then also some ability to be reactive to opportunities or ideas that come up. In my own personal work, for example, what that looks like is that we do a lot of work in India, typically working with the government on potential reforms to large scale social programs that impact people living in extreme poverty.

This is super policy relevant work. I’m very motivated to do it. It often depends on these windows of opportunity where you get a particular government that has decided they want to prioritize a particular issue, and it’s not too close to an election. They’re willing to try some things and that’s the right time for us to partner with them. We need to be able to react to that, which means we need to have the organization and the capital already in place. At that point, we can’t be going out and filling out an NSF application and waiting for three to six months to hear back from them.

We have been able to get funding support like that, but I think most people have not. It’s not an established model. Idiosyncratically, we found foundations that have been willing to back that approach.

Adam Marblestone: I’ve, I’ve heard of the need for that in some other spaces too. Like if a volcano erupts, you need to go and study that volcano.

You need to be able to immediately get people and sensors and get data collected. That means you can’t be applying for an NSF grant that will take another six months or a year to come through and then hire and train that student. You have to actually deploy quickly. That’s an interesting niche example where the systems aren’t set up super well to do that. We have government agencies that operate that way, but do they have the exact right scientists that need to be involved in that.

Paul Niehaus: Yeah. We have had things like Fast Grants. Tyler and Patrick experimenting with models where the money can get out the door faster. But if there’s still a long lead time from getting the money to putting your team together and building infrastructure and so forth, there’s a class of problems where the money needs to have already gone out the door quite a long time ago for the research team to be able to execute on the opportunity when it comes up.

Adam Marblestone: Right. Then how do you sustain that? Is that sustained based on novelty or tenure? What is the driving incentive of that team or institute to exist?

I think it’s amazing that certain types of larger infrastructure exist. Let’s say in physics or astronomy, you have the Hubble Space Telescope. In principle, if some supernova goes off here, we could re-point the Hubble Space Telescope. There might be so many other areas where you need that.

Kelsey Piper: What funding options do you have if you’re trying to do something that’s outside the scope of a normal NSF grant – or outside the scope of the normal grant options in economics which I know less about? Is it individual philanthropists, individual people with a blog? What’s the space there?

Paul Niehaus: For economics, that’s right. There’s a set of very well established sources that everybody knows about. You can apply to the NSF. In development, there’s a fund, the Weiss Fund, which funds a lot of randomized controlled trials and is great. That’s an obvious go-to source. 

Then, I think if you want to do something that doesn’t really fit into the box for those kinds of funders, there’s this long tail of private philanthropy that a lot of students and young people are just not even thinking about. They really need to be told, “Look, you’re going to need to be more entrepreneurial.” The decision making process is going to involve more in-person interaction with people. It may not be standardized like just filling out a form. It’s going to be different. It’s going to be like raising money for a thing. They’re out there and I think helping make those connections is something that we focus on a lot now with the students in our program. I think it is super important.

Adam Marblestone: There’s a pretty wide spectrum of different shapes of gaps. If you think of the small group of students and postdocs working on technically intensive and deep problems but within a five-year grant scope with preliminary data as the bread and butter, biomedical science is doing really well with NIH R01 grants. On either side of that there are pretty big gaps. 

One gap is the ‘unknown next Einstein’ who has some totally new ideas that don’t really have preliminary data. They don’t really have experiments, they don’t really have pedigree, it’s maybe more synthetic in some way. How do you support that person?

On the one hand, that’s really hard because it doesn’t work super well with peer review. But on the other hand, sometimes those people are just like blogging, so it takes a relatively low cost to support that person and let them think. I think we could be much better at funding the gaps in new ideas or new theory. In some ways, we’re lucky that the world has progressed to the point where, as long as they have an internet connection and an apartment, they can do that work. 

The other end of that gap – and the one that I’ve been a little bit more obsessed with or concerned about recently – is where you need that larger, more professionalized engineering team or you need industrial grade, or maybe you need this rapid response capability.

You need something that’s not really the same speed and scale that you would associate with the academic traineeship or apprenticeship model. For that, it’s a hard thing because even speccing out such a project might take several people a few years to even define what the engineering roadmap is for that. How much does it really cost? Who’s going to be the leaders for that? It’s more like creating a company. In the company space, there’s the equivalent of a seed round that gets you there and everyone is incentivized. What’s the equivalent of a seed round for the next Hubble Space Telescope? That doesn’t really exist.

Paul Niehaus: One model that I like for funding at UC San Diego, we have a center where it sort of pairs this funding problem with the sort of problem selection question that we started with earlier. What they do – and I’m interested in seeing more experimentation like this – is once a year they bring in ten of the top fund managers, like pension funds, and ask them “What are your big questions?” They agree on a few of those and say that those are the top priority questions, and then attach funding and have a request for proposals, an RFP, linked to that. The theory there is that you’re providing funding to work on things that have been pre-screened and selected precisely because they matter to somebody who is going to make a decision. 

Adam Marblestone: I think RFPs can go a long way because there are these self-reinforcing positive and negative feedbacks. 

If you imagine – well, there’s no such thing as a seed round towards the Hubble Space Telescope. On the other hand, if you were to give someone such an RFP and say, “what Hubble Space Telescope would you design?” As long as it’s not completely suicidal for their career to spend six months answering that question, then you do get a proposal for the Hubble Space Telescope. 

Now the funder can go back and say, “Okay, actually that’s what I want,” and so now offer you more money for you to spend more than six months and more than one student on this. You could actually bootstrap these things because the knowledge production does have a lot of positive feedback. On the other hand, everyone is always sort of doing that at their own risk. What if there isn’t the next RFP that will take you to the next level? Then they say, “we did this crazy thing, but we’re never going to be able to do it again.”

Kelsey Piper: I feel like this is a vision for how funders could solve a lot of the problems you have been talking about, almost unilaterally via a broader scope of proposals, more kinds of proposals, and more options to fund things. Is that basically true?

Adam Marblestone: Pretty much, yeah. Each one has to have a pathway. You imagine the person, what’s the journey you want them to go on? 

You want the person who designs and is ultimately the entrepreneur who creates the next Hubble Space Telescope. Or maybe you want one person to design it and then find the entrepreneur who then creates that. Or you want something else like you want someone who creates a new field. 

At any given point in that process, they have to have something that allows them to take the time and effort to ask that question. If they need students or postdocs working with them, those people need to be able to do that. You need a series of steps that would ultimately lead them to the right place.

Everyone is always in competition. They’re always working really hard to do the next thing or get the next grant or have the next result. They don’t have time really to sit on their own and just design the Hubble Space Telescope. You need to help them get to that point. If you do that though, then there’s a lot of room for directed funding structures and programs. That’s just very underappreciated. It’s hard to build consensus on whether any one of those should be done or is the best thing to do.

Paul Niehaus: Yeah, in brief, I agree. I just feel like there’s enormous scope for people to experiment with funding research in different ways. Anybody who has the capital and wants to experiment, those experiments are super valuable because they teach us about the kinds of research output you get from these. That would be wonderful.

I think it would be cool to talk a bit about culture and sort of cultural subgroups.

Kelsey Piper: Like culture in science?

Paul Niehaus: Yeah. Like I feel there’s a subgroup of economists who think about the world the way I do and care about the same things. So when I’m with those people, it’s great. I feel like other people may care more about other stuff, but who cares about them. I think that’s really powerful. 

I’d be curious to hear what Adam thinks about that.

Adam Marblestone: Yeah, no, absolutely. That is one of the real strengths of academia writ large in the huge diversity of it. It not being all that top-down means that these research cultures emerge. That is why it is in many ways different than, “Hey, we’re going to go form a startup that solves economics.”

That’s not how it works, right? You need a person who thinks in a different way to train a generation of students. Those students think in a different way and they perturb and challenge each other.

You build these cultures and that’s a longer term development. But the more subcultures that you can support that way, the more paths there are for ideas to flourish and succeed, even if they’re otherwise different – those people will become the reviewers that will legitimize a body of research that might in some other culture be not okay.

Really core to everything is that there are these medium-sized subcultures of very, very deep training, apprenticeship, and shared value formation. That’s one of the huge strengths of academia, as opposed to the transactional nature of just going and doing something, hiring people and then firing them. 

That’s part of the key to it all. What’s the level of diversity and richness of that culture? What actually sets that? There are definitely some fields that have ended up tabooed for whatever reason and they don’t get to have a mutually supporting culture to nurture them. 

Paul Niehaus: Oh, what gets tabooed?

Adam Marblestone: Just to give you a little bit of an off-the-wall example. A sort of obviously great thing to do would be to freeze organs. Say, I want to be able to freeze my kidney. I want to be able to unfreeze it then I have infinite transplants.That field – like large volume vitrification of organs – has been sort of very marginalized because it’s very close to the idea of cryonics. A mainstream cryo biologist said, “You know, don’t think about that. You know, we can think about freezing sperm and eggs and doing basic science studies, but we shouldn’t think about freezing entire giant hunks of matter that are the size of your body.”

Partly as a result of that, you can’t really go to an engineering department and you say, “I want to freeze the entire kidney or an entire brain and then unfreeze it.” That’s not really something you can go to a biomedical engineering department most of the time and say I just want to go do that. It’s too close to cryonics.

Paul Niehaus: I would never guess that. Does that also mean there’s more of a role for individual courage in all this too? I don’t know what your thoughts are on this, but I think a lot of what drives people in science is the quest for peer recognition, to feel like other people value what you’ve done and respect you and your contributions. 

I think that’s something to be excited about because I think it is very malleable. Getting papers published in good journals is certainly one marker of that. But it’s very easy to create other communities. I definitely feel like I’m part of communities that value all the other things that I do, even if they’re invisible and unmeasurable. In other ways, those are the things that people respect about me the most. I think there’s a lot of scope for that. 

At the same time, sometimes people are like, “Oh, my career incentives, blah, blah, blah.” I’m just like at some point just decide what your life is about and do that, you know what I mean?

Adam Marblestone: Yeah.

Paul Niehaus: Like stop crying about the incentives. If something’s important, just do it. 

Adam Marblestone: I don’t know where it comes from but: “the way to get tenure is to not try to get tenure.” Try to ignore those forces and then if you’re a maverick enough and you still survive, then you’ll actually do well. But if you just try to really follow the incentives, and that actually ends up being pretty boring. 

There is some of that dynamic and I don’t know what allows that dynamic to exist, but that dynamic that makes it actually healthy is that the mavericks can still succeed. But what is it that determines that?

Scientists are pretty smart sometimes, so maybe it’s that they actually see the value in something that’s new.

Paul Niehaus: I think there’s that version of it that’s like, “don’t worry, things work out in the end.” Even if right now everybody thinks you’re crazy, in the long run, being a maverick is a good career strategy. People will eventually recognize the importance of what you do. 

I think that can be true, but sometimes you may do a lot of good and other people don’t value it. And you just have to be willing to have the strength of mind to live with it.

Adam Marblestone: Yes. I think that you need that. That’s part of what tenure does allow us. A lot of people don’t like what you’re doing anymore, but you can. It’s not so much that you’re doing it. It’s that you’re encouraging other people and you’re creating that culture.

I think this is a pretty subtle thing. What is this trade-off between self-censorship or the peer review element of things and the maverick, ignoring convention aspect? Maybe some of you have studied this, I don’t know.

Kelsey Piper: I would expect that the optimal career strategy for maximizing your chance of securing tenure or a prestigious role is not the same as the optimal career strategy for impact on the world, right?

Adam Marblestone: Right.

Kelsey Piper: You can maybe affect how stark those trade-offs are and you can also maybe affect culture, where you affect whether people are willing to make that trade-off. Like whether people are the kinds of people who will say, “Yeah, I am trading off some odds of tenure for some good accomplished, because guess what? There’s a lot of poverty.” But, there’s probably always going to be some tension.

Adam Marblestone: Yeah. It’s pretty complicated because people can realize that. The committee that’s supposed to judge you can realize, “This is not the kind of thing that people are going to like, so therefore we should hire this person at our university because it’s not going to be something that other people will buy into, but we understand.” It seems like it has a lot of complexity and feedback and this is exactly why you don’t want a top-down product manager to determine what happens. You want a scientist to balance these trade-offs.

Paul Niehaus: Yeah, that’s a good point. I want to add to that. On a positive note, I do think I’ve had that experience, personally. 

I’ve spent my time on things that have not maximized my academic output, but that other people in my profession have valued that. I’ve had professional opportunities open up to me – that maybe could have gone to somebody with more publications – because people respect the way I’ve spent my time. 

Adam Marblestone: From that perspective, the thing that’s a little bit scary or a little bit more dangerous in the system – it’s not what necessarily happens – imagine you do all this work, and then at the end, the wise people that are on your tenure committee will make the decision. 

It’s that you never actually did the thing because you had this peer pressure and you’re afraid that they never would. Maybe in the end, they always would’ve been like, “Yeah, this totally makes sense. You did this different thing, this is what science is for, and we understand it.” But then all of your fellow students or whatever would’ve said, “You should never do that. This is never going to work. They’re never going to pass you.”

Kelsey Piper: It does seem like a lot of censorship functions on the level of people not thinking about doing that or people throwing the idea out there, but don’t seriously commit to it. Rather than on the level of “you seriously committed, you went and did it and like then you lose out career-wise for it.” But that’s still like a very powerful force.

Adam Marblestone: It’s very powerful, but does that reflect a system that’s really broken at the level of its basic decision making? Or is that a system that’s messed up at the level of social transmission of what those decisions are?

Kelsey Piper: And if you go and say, “Oh, you have to do nothing but get published. That’s the incentives.” Then maybe you’re actually making the censorship worse compared to what I think you were just saying.

Adam Marblestone: What we should say is: “Hey, it’s actually great, just do whatever you want, and you will always be successful.”

Paul Niehaus: To your point Kelsey, there was a survey recently done within economics about what economists think we should do more or less of.

And there’s fairly broad consensus. People would generally like to see more in terms of real world relevance and impact. And they’re open to the possibility you might have to give up some other things – some degree of rigor, for example, which is something we really prize. There’s not uniformity on that, but actually creating common knowledge around that is very powerful.

Adam Marblestone: It’s also different in different stages of fields too. There is a point where rigor is really important. As fields sort of scale, there’s just more people, there’s more opportunity in that field. You’re going to have more things that are failing on technical grounds. You did your statistics wrong or something like that. As a field develops, you need to have standards and metrics, but at the beginning of a field, that’s really hurting it.

I’m trying to create a totally new form of AI or something. Well, it doesn’t pass my metric in terms of this loss function or something. Well, who cares, right? 

You need to be applying these rigors at different phases. Part of the problem is you go, “I’m in a psychology department. Okay, well which rigor standard should I be applying? Should I be applying the ‘statistical analysis of fMRI in extreme detail’ level of rigor? Or should I be applying the level of rigor you would apply to a totally new idea or theory?” which kind of mixed together at the level of journals and theses.

Kelsey Piper: I think there’s something grounding there about trying to solve a problem. If you’re trying to develop a drug that works, then I think that sort of answers for you how much rigor to go for. You want enough rigor that you won’t waste your time if it doesn’t work and you’re not trying to convince people beyond that.

Adam Marblestone: Yeah, that’s an interesting thing. It’s maybe that some of these more industrial systems strike that balance better.

Kelsey Piper: I don’t know very much about the culture of industry, but I do feel like there’s something healthy about the thing you’re aiming for with rigor. Like getting the right answer with as little data as you need to get to it, but not less.

Adam Marblestone: Right. Sometimes when industry comes into a field, it can have a clarifying or healthy effect.That’s something that has changed positively I think over time. It used to be viewed as a universally corrupting influence if you have capitalism getting mixed into your science. But it can have a lot of positive effects, including the fact that an alternative to going on the tenure track is to join industry. In that case during your PhD, you might actually be more crazy because you’re not worried about what the tenure committee thinks. You’re just worried about whether you have enough rigor to go to industry. 

Kelsey Piper: You were saying earlier about how the option of industry is maybe good even for the people who stay in academia. Because they’re more experimental, they’re more ambitious, and they feel less like it’s all or nothing.

Adam Marblestone: Yeah, exactly. It’s very much what we were talking about. Don’t worry, you’ll always be okay.

Paul Niehaus: I’ve always felt that way. When I decided to get a PhD, I was deciding whether to get a PhD or go do something that was more like doing. The most influential conversation I had was with somebody who said something very simple: “it’s easier to go from research into doing than the other way around.” And I was like, that’s a good option value argument. So I did it and that really paid out for me in my own career. That knowledge that it is a viable option is very liberating.

Adam Marblestone: Yeah. We should also create more ‘doing into research’ paths as well.

Kelsey Piper: Yeah. I think it has got to be common for people who are trying to do things to run into some fundamental theoretical questions that they would benefit from having an answer to for the work that they’re doing. And it’s very hard for them to go study them because you need all of this experience to be a good scientist. Also partly because there’s no mid-career go get a PhD to answer this question you’ve already spent half your life on. That’s like a rare thing.

Adam Marblestone: I think there’s maybe a good selection in some way for people that are incredibly bored with anything that anybody already knows how to do.

You make an incredibly great car company or something like that, but at least there’s somebody else who already knows how to do that. Nobody understands the brain, so I’m just going to focus on understanding the brain. On the other hand, you want those people that know how to build a car company to come back and help us do neuroscience. 

Paul Niehaus: Yeah. Very specifically, if there are people listening who are like in that situation where you’re like, “I have this problem. I feel like I would need a PhD sort of research training to be able to answer it.” I want to talk to you.

In fact, what I want to do is build an economics profession that wants to talk to you. Because we need you in order to find good problems to work on, as much as you need us to solve the problems.

Kelsey Piper: Man, I have this interaction quite frequently. In tech, there are all these people who are trying to figure out things like AI and the progress of automation. They’ll be trying to answer these questions that feel to me like labor economics questions, but they don’t have a labor economics background. 

I’m not blaming them for trying to work on those problems without the background. And I’m not blaming labor economists for working on better defined problems that don’t rely on having access to secret models or whatever. But I’m nonetheless like, “Wow, I wish there was a way to use this knowledge that our society has to answer these questions that our society has a stake in answering.”

Paul Niehaus: There’s a gap.

Adam Marblestone: You talked a little bit about what the ideal structure you would have. Maybe you’d have more continuity or maybe you’d have more industrial push. What would be the ideal project you want to apply that to in the social sciences? If you didn’t have any funding constraints, if your students were empowered maximally to do what they want, what’s most important?

Paul Niehaus: One way I’ve been thinking about this is that it’s good to be engaged, to build these relationships, to be listening to people outside the university when they tell us what problems they’re dealing with and in some cases to be responding to that. But I also think that you do not want to be entirely customer driven and end up building a lot of faster horses to use the old metaphor. 

It’s also great for students and for researchers to feel free to say, “What is a broad goal that I would like to see accomplished in the world? What would I need to know to do that?” Go through that exercise yourself and sort of work backwards and I think that would end up looking a bit like one of these road mapping exercises.

Kelsey Piper: I think another advantage of roadmaps like that is that a lot of people think of science as bottomless pits into which a lot of resources go. And it’s unclear how that corresponds to when problems get solved. 

As a science reporter, you run into a lot of people who are like, “Oh, I heard that cancer got cured like 20 times.” That’s like a bad way to relate to the public, which is ultimately funding all of this. If there is a roadmap and it’s like, “We’re going to do these things and we’re going to – by solving those problems – get these results.” 

I think that does a lot for trust. I think it does a lot for buy-in. A lot of people are willing to spend a lot of money when they understand how that money produces results. There’s not a lot of clarity on that as a product of how the current system works.

Adam Marblestone: Yeah. The brutally honest roadmap that also takes into account that you could take some pretty non-traditional actions to get the thing done. It’s not the worst case roadmap that it’ll take forever to cure cancer or something. But you also don’t want to say, “Well, we’ve done it already.”

Kelsey Piper: We’ve made progress on cancer. But if you had said in an upfront roadmap that these particular childhood cancers, we can cut mortality by 90%. We’ve basically done this for many childhood cancers. Then it’s clearer to people where our effort is going, what these brilliant researchers are doing, and how it’s changed the world. Which is just hard to see otherwise.

Adam Marblestone: Sometimes we would struggle in certain areas of science to do that all the way to the end goal. But you could say, “solve the bottleneck that’s holding back these cancer researchers.” So the problem for the cancer researchers is they can’t see the different cell types inside the live tumor, whatever it may be. You would do a roadmap for that and be very clear on that.

Kelsey Piper: Yeah, I think people can understand that there’s lots of steps that might not seem directly on the road but are indirectly on the road. But when there’s no visibility then it’s quite hard to see where we’re headed.

Paul Niehaus: This old heuristic that floats around in economics that you should be able to explain to your parents why what you’re doing is interesting. Which is not a terrible heuristic, but I think a better one might be, “you should be able to explain to a taxpayer why what you’re doing is important.”

Kelsey Piper: I think we should fund science more, but I think part of that is making a stronger case that by funding science more we will be getting more things that really matter to everybody in the world.

Paul Niehaus: The one caveat I’d add is what I said earlier. I do think it’s good to have some degree of noise in the process. People who are free to pursue any wild idea that they think is interesting because directed search will tend to get us to the local optimums, but we’ll tend to miss out things that are not within our field of view. 

I think that’s harder to explain and to rationalize. Maybe to people who are used to numerical optimization algorithms, I can explain it to them, but to the broader public, it’s harder. I guess that’s your job, Kelsey. You gotta figure out.

Kelsey Piper: Well, step one is to convince the broader public of the numerical optimization algorithms.

Paul Niehaus: I genuinely believe that it is good to have some people in the world who are free to pursue whatever they think is interesting. But there should be more emphasis on stuff that’s justifiable, rationalizable.

Kelsey Piper: One thing that stood out to me from earlier is that some people want to go do their pie in the sky thing that has no particular social benefit. Probably, to some degree, we want to let people do that. A lot of people – if they’re doing things that have low impact on the world – are not doing these things because they don’t care about impact on the world, but rather because they don’t actually see a route to have that high impact.

Paul Niehaus: Yes. There are many people like that. And it’d be so straightforward to help find things that would have more impact. 

Adam Marblestone: Sometimes those may not be straight shots. Sometimes they may be very indirect and it’s in your optimization algorithm. You’re going after this because it is the greatest uncertainty or the greatest state of confusion, and we want to resolve that state of confusion. But then that state of confusion is actually so big that you can justify it to your grandma. There’s no way I’m going to be able to create aligned AI or whatever, unless I can understand something about how the brain does it or how consciousness works.

I think that the big scientific questions in my mind are not that hard to justify them relevant to the applied outcomes, if you’re ambitious enough about it. 

Caleb Watney: Thank you for joining us for this episode of the Metascience 101 podcast series. Next episode, we’ll wander into the details of the renaissance in new scientific funding models being explored including ARPAs, FROs, Fast Grants and more.

Episode Four: “ARPAs, FROs, and Fast Grants, oh my!”

Caleb Watney: Welcome back! This is the Metascience 101 podcast series. In this episode, we explore the broad range of scientific funding institutions, with a special focus on exploratory new models: ARPAs, FROs, and Fast Grants, oh my! Here, Tamara Winter is in conversation with Professor Tyler Cowen, Patrick Hsu, and Adam Marblestone, all of whom are knee-deep in innovative science funding ecosystems. 

Tamara Winter: There are dozens of us in the new scientific institutions community, and I feel very fortunate to have three of the people at the forefront here today. My name is Tamara Winter, and I run Stripe Press, the publishing imprint of Stripe.

Joining me is Adam Marblestone, CEO of Convergent Research. Adam is currently working to develop a strategic roadmap for future FROs. FROs, or Focused Research Organizations, tackle large scale, tightly coordinated nonprofit projects.

We also have Patrick Hsu, co-founder of the Arc Institute and — okay, this is a mouthful — Assistant Professor of Bioengineering and Deb Faculty Fellow in the College of Engineering at the University of California, Berkeley. Arc gives scientists no-strings-attached multi-year funding so they don’t have to apply for external grants. It also invests in the rapid development of experimental and computational technological tools.

Finally, we have Tyler Cowen. He is the wearer of many hats. For the purposes of this conversation, he is the founder of Fast Grants, which was spun up remarkably quickly during the early days of COVID-19. Fast Grants provided $10,000 to $500,000 to scientists working on COVID-19 related projects, with decisions made in under 14 days, which is pretty remarkable.

To start this conversation, why do you all think these new scientific models are emerging now? It’s interesting because you’ve all been working on this for years, so maybe it doesn’t feel sudden to you, but to me, it feels like one of those “slowly, and then all at once” moments. Why do you think the idea of new institutions for science has caught on so quickly?

Tyler Cowen: I’d say there are three factors. First, in the realm of ideas, a number of individuals — Peter Thiel, myself, Robert Gordon — kept pointing out that something is broken with science and productivity. This idea eventually gained consensus. Second, private foundations became increasingly bureaucratic. People within these systems saw how difficult they were to deal with and grew frustrated.

Finally, COVID came along. It was a true emergency, and emergencies tend to mobilize America. You had some ideas in place, with people who had lived experience saying, “Hey, things really are screwed up.” Government funding agencies may not have gotten worse, but they weren’t doing very impressive things to get much better. This was a perfect storm, and then you have mimetic desire, contagion, and all these fascinating experiments.

Patrick Hsu: Another way to frame this is to ask: how do scientific breakthroughs happen, and why it often seems like a relatively small cluster of labs is working on important problems at the same time? Often these are dense, overlapping, competitive periods of productivity.

I think all of the principles that Tyler just outlined apply here, along with the general pressure of institutional bureaucracy, or sclerosis, if you will, which has created a certain pressure that eventually we had to blow the top.

Maybe my hot take is that innovating on institutions, or the structures by which we work, isn’t a new idea per se. Any scientist going through training can tell you that, while there is something incredibly powerful and enabling about this system, there are also many fundamentally broken problems. 

Science hasn’t always been this way. If you look at incredibly productive times in the history of science, they occurred under very different organizations, priors, ways of funding things and ways of working in the labs. Now, we’re seeing a group of people who have experience running and building organizations start to apply that ambition, not just in the commercial tech startup sense, but in scientific institutions themselves.

Adam Marblestone: Yeah. I think there is a big role for the recent history of startups and for the discourse between voices in science and voices that come from the startup or VC world, the Silicon Valley ecosystem, who are thinking more about organizations. Scientists are feeling more empowered, thinking “Hey, maybe I could start an organization.”

Tyler Cowen: In the sense of startups — why can’t we do it this way? So many startups rethink a process, product, or web service from scratch. The notion that you could apply the same thinking to scientific funding has proven to be contagious.

Patrick Hsu: One of the interesting things about the scientific enterprise in academia is that a lot of professors start labs, right? There is a narrowly stereotyped process where, after finishing your postdoc and training, you’re going to start an academic group. There’s a huge amount of know-how and precedent on how to do this. But then we do it within the broader context of a funding, tenure, and university system that we’ve always taken for granted.

What’s special about this moment is that people are asking, “Does the system have to be this way? What if we started organizations with a mindset of rethinking the larger system from first principles?”

Tamara Winter: I want to get into your specific organizations, or initiatives in Tyler’s case. Adam, you’ve been writing about expanding what you’ve called the “base space” of scientific exploration for years. A lot of that thinking was realized in Convergent Research. So, what is a Focused Research Organization? What does Convergent do? How does it differ from the traditional small group approach to doing science?

Adam Marblestone: A Focused Research Organization is, in many ways, an incredibly simple concept. It is a critical mass of scientists, engineers, and managers all working on a single, well-defined problem — usually to build a specific tool, system, or dataset that will, in some way, benefit science or technology. It’s something that couldn’t be built in another context, like a VC-backed company, and requires a critical mass of resources and time to do that exact thing.

That is such a simple concept. You might imagine, “Well, don’t we already have many such vehicles?” The surprising thing is that many other ways of doing science or contributing to science are tied either to individual academic career paths or other structures — maybe you are starting a for-profit company or you are working in an existing National Lab project.

But there really isn’t a general mechanism for identifying which things need focused teams to build — sprints, if you will. And then, how do we match those teams with all the things that they need? A startup needs leadership, it needs a technical roadmap, it needs funding. It needs to build out a handpicked team that’s inherently cross-disciplinary. You are hiring the people who are not necessarily who’s right around you.

There just isn’t a general mechanism for doing that. Because of this, scientists don’t go around thinking, “Hey, I’m going to propose how I would have 20 engineers and project managers to work on some big lift and build something.” Instead, they’re thinking about other scales and scopes of work: “What’s the next grant that I should write? What should my next few students be working on?”

They are not thinking about this particular structure. As a result, philanthropists and government agencies don’t know what problems researchers would tackle in that mode of a Focused Research Organization. So Convergent’s job is to be a Schelling point, bringing together all the necessary elements to coordinate those projects. That includes approaching the research community and saying, “What are the problems that you think are most important, that are biggest bottlenecks in science, where you need a new tool, or system, or dataset that requires a larger-than-usual coordination or a sprint to build?”

It means putting in place the venture incubation and creation aspects, as well as some legal and operational aspects to make sure that those projects can happen. And all of this just creates more confidence in this model being a viable path.

Tamara Winter: I mean, we’re having this conversation on Stripe. We want to abstract away the complexity of starting a business. You’re doing the exact same thing for the entire scientific enterprise.

Adam Marblestone: Yeah. It’s a bit like the Stripe Atlas idea or the Y Combinator idea.

Tamara Winter: Exactly.

Adam Marblestone: Why aren’t there more of these focused teams building particular things that are needed and not buildable otherwise?

Tamara Winter: Why aren’t there more?

Tyler Cowen: I like the idea that it may be temporary: you achieve the end and then the institution dissolves. But people aren’t used to that. Many institutions prolong themselves, carrying overhead, keeping friends and associates in jobs. There’s something pernicious about that, but it does, in some ways, make it easier to hire basic talent. If you’re going to do it differently, you have to be more innovative. You need a strong soft network to bring people in, and you need to pay them pretty well. I think that’s part of the problem: people aren’t used to the model of potential temporariness.

Adam Marblestone: This is where the startup inspiration goes a long way. Because we’re not creating permanent institutes where the questions are “Who’s the director who can run this for 100 years? What’s the biggest theme? What does this mean in 30 years?” It’s just this problem. 

It doesn’t mean that everybody disappears afterward. Often, these problems are highly catalytic. Maybe they will spin off companies, maybe they can donate themselves to a larger nonprofit; they can create contract research organizations of longer-term nonprofits. There are lots of possibilities of what happens afterwards. 

Part of the problem selection process is doing it in a way that there is a notion of an exit or a scale-up event that will happen at some point — because of the value that you are creating, or how you will plug into the ecosystem, generating demand and activity afterwards so it’s not just a bye-bye at the end. 

But part of that is understanding that this is the dynamic. They have to create this thing and pave their own way. The future is not completely predictable: who those partners are going to be, who those customers are going to be, where it’s going to land. 

That dynamism is something somewhat borrowed from how startups think. We’ve been happy to hire talented people, but they are people who are thinking differently about what they’re doing. 

Patrick Hsu: Arc, as we’ve built it today, is a convening center. We bring together scientists from across three partner universities (Stanford, UC Berkeley, and UCSF) in a single physical space to collaborate. The question is, if we assemble this caliber of thinkers and researchers, what science can we do when we simply unlock them to work on their very best ideas?

We’re including faculty, PhD students, postdocs, and professional research scientists beyond the training period. We’re taking elements from both academia and industry, but with the idea that long-term science requires long-term thinking and infrastructure for execution. Unlike many other fields in STEM, biology is very slow. It’s messy, it’s noisy. And also, fundamentally, it’s always a moving target.

We have a few pretty prosaic things that we’re trying to put in place at Arc. The first is — if we provide folks with long-term funding, can we remove the need for short-term optimization, like chasing rapid publications or short funding cycles? We believe in papers and scientific products, but are you truly working on the most important thing? We provide a structure both in terms of funding, but also, broadly, across the various platforms that we are going to have at Arc. 

As modern biomedical research becomes increasingly dependent on really complex experimental and computational tools, you need to have a place that has the institutional know-how that holds this together. It’s amazing how simply having the trains run on time effectively doesn’t happen even at the very top biological labs. Simply putting that into place, building technology centers where we have larger teams of research scientists work in a larger, cross-functional, modular way, like you’d see in biotech or pharma for these more complex research workflows and processes. What kind of science can you do when you start to need to tackle bigger types of projects?

Tyler Cowen: So, let’s say it’s eight years. Let’s say you knew, almost all the time, in year two whether or not you were going to renew people six years later. Would that change what you’re doing? So, you have to hold onto the lemons for six years and pretend everything is fine. Does that create strain?

Patrick Hsu: There are a few different ways to think about why it needs to be renewable, right? Other places have baked-in “up-and-out” models where you come for a temporary period, maybe three, five, or seven years, and then the expectation is that you leave. What ends up happening is that folks end up hyper-optimizing for things that will allow them to show milestone-driven productivity, momentum, and trajectories.

We think that, while trying to push work out quickly isn’t necessarily a bad thing, if you fully optimize for that, it’s net bad for science. We have annual reviews and formal eight-year appointments, but we also have a range of formal and informal check-ins to make sure that folks are happy. 

Ultimately, we judge our initial success by two factors: First, are we able to hire some of the very best people? And, second, when these people come to Arc, do they feel they’re working on their best ideas? Are they happy with the type of problem they’re tackling? For the folks that aren’t happy, who don’t like the model, they tend to be incredibly talented individuals who can find jobs anywhere they want.

Tamara Winter: Tyler, it’s been a couple of years since you started Fast Grants. I feel like much ink has been spilled trying to understand where Fast Grants succeeded. Of course, Patrick, you also worked on Fast Grants. In what ways did the research that Fast Grants catalyzed differ from what would have happened in traditional institutions? 

At some point, I’d like to talk about counterfactuals. How do we actually know that these new scientific institutions are producing new kinds of research? But let’s start with Fast Grants.

Tyler Cowen: I think there are maybe three different kinds of grants we made. One was for research that likely would have happened anyway, but now could happen much more quickly. As you mentioned, we funded many people within two weeks — actually, a lot of people we could fund within two days. And when you’re in a pandemic, with so many Americans dying each day, accelerating progress, even by a small amount, is worth a lot.

There’s then another class of people, which is harder for me to judge. There’s then another class of people that is harder for me to judge. They might have wanted to do the work anyway, but there was a discrete switch and they needed to know within some certain timeframe if they could get the money. If we hadn’t stepped in, maybe they just wouldn’t have done it at all. That I find much harder to judge.

Then, there’s a smaller number of projects where we put up grants larger than average, and I strongly suspect those projects wouldn’t have happened if we hadn’t funded them. For example, the fluvoxamine interferon trials were a tough, risky proposition. There was follow-up funding, there was some initial interest, but it wasn’t clear to me that without us that could’ve happened at all. And that’s probably turning out to be quite important.

Beyond these grants, there’s also a demonstration effect. We showed the world that science funding can be faster, and institutional responses can be faster. Government agencies and private foundations can take away lessons from this. There’s not a decline in quality, in my opinion. I think the quality actually goes up. When you are forced to make a decision right away, the notion that a piece of paper sits in someone’s inbox and gets passed around and it takes you three months, four months, nine months — it’s not that there’s some genius in the meantime pondering the whole thing and arriving at a smarter answer. You just need to prioritize getting the decision made now, and you’ll do just as well. And I think we showed that that is possible.

Tamara Winter: There is something you said that I find really interesting, and I want to hear from you two. How much of the success of something like Fast Grants can we attribute to allowing people to take advantage of, or be responsive to, changes in the outside world? 

There was a meaningful number of people who got Fast Grants who were working on something else and then suddenly had the permission and funding to work on the most pressing issue in the world. How much of the success of these kinds of models is about letting people be responsive to what’s happening in the world? Because, typically, if you get a grant from the NIH, it’s not always the case that you can switch the grant to work on something more pressing.

Tyler Cowen: No, we let people switch. But there were a lot of preconditions, including on the Mercatus side, where Fast Grants was housed. Mercatus had already been running Emergent Ventures, which was non-COVID related, but the philosophy there was to get people the money within a few days, less than a week. We had over a year of practice with that, and the finance team, the reporting team, my assistant — everyone knew exactly what to do. They were operating at A, A+ levels. To increase the size of the numbers and send the checks or wire transfers to different places wasn’t very hard.

We also had my board, who trusted me to run this based on previous experience, and that’s actually incredibly scarce. I think it’s a big, under-discussed problem — trust within nonprofits — so that a board will just say to the person doing the work, “Look, you just do it, we trust you.” I think that’s the hardest factor to replicate.

Patrick Hsu: On the scientific review side, which I was deeply involved in with Fast Grants — first, the infrastructure and the systems that Tyler and Mercatus developed were fundamental to the success of Fast Grants in making the awards. There’s a huge amount of plumbing that goes into place to wire the money as quickly as the universities can receive it. It was amazing that, in many cases, the money was just sitting with the universities because they didn’t know how to accept it yet.

Tyler Cowen: Or they would slow us down — the people receiving the money felt the need to slow you down.

Patrick Hsu: Yeah. It’s an interesting observation, for sure. But what we also showed was that, on the scientific review side, the process can be focused, efficient, with rapid handoffs. We got applications across an incredible diversity of immunological concepts, new types of vaccines, new clinical trial proposals, and new diagnostic concepts — non-human primate studies, for example. We had to find and corral top scientists with deep domain expertise across each of these diverse areas. We also built a software portal…

Tyler Cowen: And the Stripe talent did that. We had the best programmers in the world building a system within a few days that, from my point of view, worked perfectly. That’s something you can’t take for granted. So on the reviewing side, we had social media to get the word out, Stripe engineers building the software, Mercatus handling processes, and Patrick and I as leaders and fundraisers. Really, a lot of different pieces, each of which was essential.

Tamara Winter: It really is an incredible feat of coordination, especially given how it had to happen since you couldn’t be in the same room. One thing I loved about it, especially on the Stripe side, was how much praise and status were given to the folks working on it, some of whom were in Australia — so, you were doing this across time zones as well.

Adam, I want to take it back to Convergent for a second, because the grants you make, Patrick, are renewable eight-year grants. Fast Grants didn’t necessarily have a time bound. Is the most important thing, when you’re making a grant to a team, that it’s a team of a certain size? Is the thing you care most about the timeline — is it five to seven years?

Adam Marblestone: Mm-hmm.

Tamara Winter: What is the most important element when evaluating a new team, project, or potential area to explore?

Adam Marblestone: Honestly, it’s heavily about the question of the counterfactual: is this something these people could organically self-organize to do? Each person in a focused research organization could, in principle, go off and write their own grants and then they could collaborate. They could do things in a more organic way to head in the same general direction. 

Then there is the question: what’s the delta between what would happen if they did that versus what would happen in the FRO? That differs significantly across fields, too. In some fields, the level of technology — maybe neuroscientists need a new microchip, but neuroscientists aren’t the people who make microchips. So, to what degree do you need that industrialized push with a different structure of labor, a different structure of the staff, and a different structure of the focus and coordination inside the group, relative to what the field has available through any number of mechanisms like the NIH or philanthropies?

A big part of it is the counterfactual. Another level of that counterfactual is understanding how important is this thing that we’re building. Of course, we can’t ever know for certain in advance. It might be that we have an FRO developing a new method for proteomics or measuring proteins in cells. Maybe there’ll be some other way of doing proteomics that’s completely better, that leapfrogs the FRO — maybe just one postdoc did that, without a team of 20 people. You can never be sure that that will be the case, but how big of an unlock do we think it will be, and how much need is there for it?

In our case, we do verify it through peer review. We have a lot of peer review of scientists saying, “If you build this, it’s not necessarily about high-risk, totally unpredictable ideas. It’s much closer to the Hubble Space Telescope or the Human Genome Project — these things are doable, but heavy lifts.” So part of the evaluation is: how significant is the unlock if we make that lift?

And the other one is the willingness and readiness of the team. It is an entrepreneurial founding team, effectively, that then goes and hires the rest of the people, and they have to be willing to do something non-traditional. They have to be willing to be completely focused on this for that period of time, they have to have both the human skills and the scientific skills on that team.

Between those factors, we get to a relatively short list at any given time, although there are many more projects than we have funding for at the moment. 

Patrick Hsu: The technology centers at Arc are, in many ways, trying to tackle a similar set of challenges. We have a similar intuition for the FRO concept, which I’m a huge fan of — that you need larger teams, more diverse types of talent. You can’t rely on a single-channel type of person with core training only in molecular biology and genetics to tackle something that might require product integration, or something that’s multimodal across instrumentation, imaging, and molecular concepts. All of these different pieces require coordination and focus in a broader sense. A lot of what we do with our technology centers is bring together folks in an industrial-style research organization, embedded within the broader Arc umbrella, but highly focused on developing things like organoids, better cellular models, or better technologies for multiomic profiling of cells, or better approaches for genome and epigenome engineering at scale.

We have preselected, to some degree, five technology centers that, in many ways, work together in a coordinated fashion. It’s like that ‘90s cartoon Captain Planet, where you need earth, wind, water, and fire to get Captain Planet. These centers coordinate to run an end-to-end cycle for finding better targets for complex human diseases.

A lot of the ways we’re building them involves interdisciplinary talents. How do you actually operationalize this in a focused and efficient way to bring everyone together? There’s just a certain latent amount of time that it takes to build a lab in the first place, get a critical mass of high-quality thinkers, to get quality, physical logistics working properly. And a lot of that — we think about all of that in a centrally efficient way.

Tamara Winter: It’s so interesting. One of the things I love, that Heidi Williams always talks about, is that the conversation about new ways to do science is so focused on new ways to fund science. But so much of what all three of you are talking about are these infrastructural or scaffolding challenges that really do meaningfully impede how quickly you can do science, or the kinds of research you’re able to do. This, to me, seems very interesting — I hear Heidi often talk about it, but it’s great to hear how this happens in practice.

I want to go back to the counterfactual question because it seems like a problem that people who are focused on metascience — and maybe something the Institute for Progress or Open Philanthropy can work on — don’t have rigorous ways of assessing counterfactuals. He’s not in the room with us right now, but Matt Clancy touches on this a bit at New Things Under the Sun. He will identify these natural experiments and say, “Okay. There is a field and it has these properties. And this field is like it and shares similar properties. What might they learn from each other?”

But it seems like if you are at Convergent, or Arc, or Fast Grants, or even Emergent Ventures, what you want is not to be able to look at entire fields, but at the individual FRO level or experiment level, and say, “This thing wouldn’t have happened without our intervention.” But we can’t really do that right now. Is that a problem, or is it just me?

Tyler Cowen: I don’t agonize over counterfactuals. I think it’s a bit like friends. You get a friend and, if you get some good friends, you get more good friends. Even if you’re funding something where you’re not decisive about that particular project, it will bring more good projects, better deal flow, and hopefully expand the popularity of your model in a positive way. You’re never going to figure out counterfactuals in many cases. You shouldn’t do obviously foolish things, like making a grant to Google so they can expand their work in artificial intelligence — that’s clearly silly because of the counterfactual.

But within the realm of the reasonable, it’s like so hard to find a truly high quality thing, person, institution to support. I say just do it.

Patrick Hsu: One of the fascinating things about, for example, the practice of science is you can talk yourself out of any damn experiment. You have a sufficiently challenging problem, you have sufficiently analytical people, there are always going to be equally compelling reasons why something will work as it won’t work. Maybe many more reasons why it won’t work. 

So you can end up in decision paralysis or opportunity cost paralysis, and end up never actually doing anything. There’s a huge advantage in simply trying things in an operationally effective way — just doing the experiment, starting the organization, raising the funding, and giving it a go. In general, the universe trends more toward entropy and a lack of focused effort.

Tyler Cowen: The best way to protect against funding projects that would have been funded anyway is to be weird yourself — be credibly weird and signal that you’re different. You can’t control it, but you will attract projects that are not just mainstream, like Aspen Institute material or “IBM would’ve done this.” Nothing against those institutions, but they are very mainstream.

Adam Marblestone: I think there’s something nice, in a few ways, about having these new models, and having them be weird in that sense. On the one hand, Tom Kalil, our board chair at Convergent, likens one aspect where we see counterfactuality in the FRO process is that people wouldn’t have written these grants in the first place. It takes a long time to even spec out what you would do with $30 million or a 20-person engineering team. That’s actually not something that you can just think about in your daily course of doing your thing.

Patrick Hsu: And no one is trained to think about making that size of proposal.

Adam Marblestone: They are not trained. So, the people designing your technology centers — that’s a very specialized and intricate long-term endeavor, an engineering-and-design endeavor. Not everyone can do that. But not only that. The way Tom describes it, most people don’t spend months of their lives spec-ing out a detailed plan for what they would do if they won the lottery. That would be a waste of time because there’s no way they’re ever going to win the lottery, right? They’re just wasting their time.

And in a similar way, if there’s no grant, or no mechanism, that is shaped like “Now you have a 20-person engineering team building a tool that’s cross-disciplinary and focused in this way,” people don’t spend the time to think about it. So one counterfactual is: you get weird ideas that people haven’t talked about before but may have been latent. The people who are going to come up with those ideas, almost by definition, are pretty frustrated early on. They’re the people that were thinking about what they would do, despite there not being any immediate incentive or way for them to get the money to do that.

If they’ve already got those ideas brewing, those people are pretty weird to begin with. We see some interesting selection effects, along with the fact that there just isn’t a mechanism shaped like this. So, we know there wasn’t a foundation that would have funded this before.

Patrick Hsu: There’s something really powerful about simply framing the opportunity. One of the things they talk about at ARPA-H — the ARPA-H director, Renee Wegrzyn, mentions that many people are good at coming up with million-dollar ideas, which is a standard five-year grant size. But very few people are good at coming up with $30 or $100 million ideas, as Adam has been saying multiple times.

A lot of what they’re doing in their search for a program manager to administer tens to hundreds of millions of dollars is finding people who have the experience and taste and judgment to assess things at this scale, where you have very low end, very few reps, very little experience on how to frame and organize and judge what should fit in this space. 

A lot of what this general conversation is doing is simply outlining a possibility, and then building in public so that people can see it’s possible, these things do get funded. Then, we can scientifically track and measure the outcomes — the things that worked, the things that didn’t work, and the wins and losses.

Adam Marblestone: Maybe over time, it will become less weird. I think it’s probably a trainable discipline to teach people to think as ARPA-like program managers for $30 or $100 million systematic engineering programs, division of labor, and these types of things. But it’s not something that many people are doing in the current system. So, these agencies are starved for this program manager phenotype that could have the vision and coordination behind a DARPA-like program. Similar for FROs. So, we do see a selection effect, where we get some pretty wild stuff.

Patrick Hsu: I just want to quickly touch on where we go in the longer term from here. When Convergent, Arc, or Mercatus spend a billion dollars, at the end of the day, this is a drop in the bucket compared to the NIH’s annual expenditure, right?

Tamara Winter: What is it? Did you say around 44 billion?

Patrick Hsu: Yeah, about $42 billion a year, increasing to maybe $50 billion in the congressional budgetary request. That’s a huge amount of money that we’re spending on basic health sciences on an annual basis. One of the things that has been so amazing to me with Fast Grants is the number of people who have said, “Fast Grants is really cool, let me just clone this model” — for longevity science, for climate change, and other areas.

It seems to be effective. People are able to do important things with the money they got at a very important and sensitive time. “Can I just clone that?” because we’ve outlined a protocol and a precedent that they can operationally implement on their own.

Tamara Winter: It’s interesting because, similarly, we were talking earlier about how one of the underrated contributions of these new models is that people are building the infrastructure. And, similarly, you can replicate that, even if any one project doesn’t succeed, you’re thinking in a totally different way, almost like a portfolio approach. And if the model proves itself enough times, then people just want to try things. I don’t see how that can be a bad thing.

You all are talking about the type of person that finds themself applying for a Fast Grant, coming to Arc, or leading an FRO. I wonder if you have thoughts on which models are most advantageous for people at different stages of their life. If you’re an ambitious teenager, probably you’re not going to be running an FRO, but if you’re a grad student or someone who is midway through a career looking for a change — do you have opinions on which models do you think are most appropriate or advantageous for people at different stages?

Adam Marblestone: Well, I think it’s true that the FRO model leaves a bit of a gap for people in the early stages of their career or training. It’s less about that exploration and that discovery and more about building this thing in a really professionalized, systematic way. So that does leave out some of the early development of creativity, early development of deep knowledge and deep knowledge transfer, which is where academia shines in many ways.

But for FRO founders, roughly speaking, the ARPA program manager phenotype is something that we look for. It’s not the same, necessarily, as a startup founder who wants to scale something to billions of users, but there’s some elements: there’s the systematic analysis of a gap and how do you coordinate people, how do you divide labor, how do you divide disciplines to build a complex project.

We have everything from straight-out-of-PhD to “this is one of the last projects they’ll do before they retire,” in terms of our FRO leaders. There’s a whole spectrum in between. Some people come from academia, some people have more industry experience. We have a whole spectrum and then we try to form a founding team that has a combination of scientific, operational expertise, and different types of personalities. The common denominator is this frustration with the status quo, a concreteness of what they want to do, and a willingness to build a team.

Tyler Cowen: In virtually all institutions, we should be taking more chances on quite young people, giving them more authority, in general. My background is quite different from the rest of you at this meeting. I spent a big chunk of my career studying the financing of the creative arts, economics of the arts. That’s always my mental touchstone. When I hear about Focused Research Organizations that expire when the project is over, I think of Hollywood movies. We’ve been doing that for a long time.

You can almost always find parallels in the arts, which makes you much more optimistic about what you can do. Rapid patronage was a big thing during the Renaissance, and it worked really well. I knew when we started Fast Grants, “Oh, we can do this” because of historical examples.

And when you think of young people running things — well, who ran the Beatles? There was George Martin and Brian Epstein, but the Beatles ran the Beatles. Paul McCartney had to figure out the recording studio. We don’t call that science, but that was an extremely difficult scientific project that had never been done before. And this guy, who hadn’t gone to college, at age 23 starts figuring it out and becomes a master. When you see those things happen in the arts — frequently, they happen — you become way more optimistic. “How many people can do this? How can we scale it? Can super young people contribute? Can this all work?” 

You are not saying it’s easy — most projects in the arts fail, too — but you think, “Yes, yes, yes, we can do this.” And you do it, or you try to do it.

Patrick Hsu: I think building an infrastructure where folks can shoot their shot is really critical. And I think a lot of what this conversation is about, is creating those opportunities for people, not simply operating within the system. It’s about where you focus your ambition. If you’re narrowly told, “Do your best science, but figure out how to do it within the system,” people hyper-optimize for that.

If you show that you can actually innovate on the system itself, that’s one of the most important things that Silicon Valley has pioneered. The seemingly impossible or irrational idea of founding a company and scaling it to billions of users — it’s not something most people normally imagine they can do. But showing that it is possible, meeting the people who have literally done this, creating an entire educational process — an entire alternative educational system — for how to found a company and how to scale one is an important cultural inspiration for what we’re doing here.

A lot of senior colleagues, professors, and university leaders ask me, “How did you come up with the idea for Arc?” One of the funny things, and it’s often hard to answer this way, is that I don’t think it’s a crazy idea. It’s maybe not even that novel, like Tyler’s saying.

Tyler Cowen: A lot of precedent in the history of the arts. Take eight years, 16 years, do your thing. Here’s some money.

Patrick Hsu: Just do it. It’s the Nike slogan.

Tamara Winter: Is OpenAI an FRO?

Adam Marblestone: Not exactly. I think there are elements of it that have certainly been inspirational to us. It is interesting that they started as a well-funded nonprofit that had a focus on a certain scale of infrastructure and a critical mass of team. But it was not felt that they would get the same outcome if they were a product oriented, traditionally VC backed company.

Tyler Cowen: Why isn’t that just a yes, though? Yes, they’re an FRO.

Adam Marblestone: I would say the first few years had some FRO-like characteristics. But I also think that in some ways, it’s something a little bit different. They were exploring more divergent, different directions in the beginning. 

If you think also about DeepMind, it has done things internally, like the AlphaGo project, to solve Go playing, or the AlphaFold project on protein folding. Those looked to me like the way that we’re doing FROs: 10, 15, 20 person team, extremely well-defined outcome and finite specification of that problem, go after it. Whereas DeepMind as a whole is something that is both organic but also very well resourced. Maybe DeepMind is more like the Arc Institute. It has these shared engineering platforms and researchers with the freedom to self-organize. Sometimes they create FRO-like projects, and sometimes they don’t.

If you imagine OpenAI early on, doing many a bunch of things — some stuff in robotics, some stuff in reinforcement learning. There were a few creative people trying to do this transformer language model thing, and it ended up being the thing that took off. OpenAI was a bit like the Arc Institute, at the beginning. 

It certainly has some characteristics — the mentality of it, the professional team, the bounded yet technologically intensive problem space, a non-academic but still basic science approach. A lot of the magic sauce in the first few years. Now it’s like “Okay, now we’re going to scale up these LLMs.”

Patrick Hsu: And maybe a key point is that OpenAI did not have, when they started, a clear end, which is a critical part of the FRO model, it sounds like.

Tyler Cowen: Wasn’t it to create AGI? And can’t the ability to evolve and be flexible be part of the FRO model? In that sense, I just want to say yes. They’re an FRO, and they’re great, and they did it.

Atdam Marblestone: I can agree with that. If you want a take home message for policy or a take home message for institutions, the finite nature of the FRO is not necessarily the most important thing. It serves certain functions: it weeds out people who want to make a giant, permanent institute with more of an academic cultural feature. It weeds out someone who doesn’t have any milestones or any clear goals that are concrete within it. So, it has a certain filtering function, but it’s a bit artificial.

In that sense — Sam Rodriques has been talking about this as well — if we’re talking about professionalized moonshot research environments, very technological, optimized around the goal, and less optimized around the historical structures of training and credit in academia, very well-funded, visionary projects — then OpenAI has all of that.

It started out as a nonprofit and now is a for-profit, but I think those things are not the essence of it.

Patrick Hsu: So FROs will grow into OpenAIs.

Adam Marblestone: Yes, a successful FRO could grow into something like OpenAI. I think with the right funding and the right people behind it, you could have FROs that have more flexibility, looking less like a single DARPA program and more like building AGI. There’s a continuum.

Tamara Winter: This is just a great reminder to reread the whitepaper that you and Sam Rodrigues wrote — was it in 2020?

Adam Marblestone: Mm-hmm.

Tamara Winter: Speaking of startups, I think there’s one area where I would like to see new scientific institutions take inspiration from startups. 

In many ways, starting a startup is still risky. But if you fail, and you fail in good faith, it’s not true that your career is over or there’s nothing else you can do. Michael Nielsen talks about this, and I think he calls it the “shadows of the future” problem.

Let’s say I get a grant from you, Tyler, for two years to do something. I’m an academic, and I’m choosing to switch paths. It’s not true that I’m going to be making decisions in a vacuum — I’m going to be thinking about what happens afterward. And maybe that does end up constraining me in some important ways. So, it’s not as risk-free or as de-risked as you may hope it would be.

If I finish my FRO, Adam, and, at some point, hit one of these choke points in academia or science where you need to produce a result. If I don’t, what do I do? I’ve already defected from the regular system. Am I going to go to ARIA? Am I going to go to Arc? What do you do next? 

Adam Marblestone: I think you just answered it.

Tamara Winter: You just go to Arc.

Adam Marblestone: But this is one of the reasons why it has been hard for this stuff to get going before. There is an ecosystem-level phenomenon — there is not a single institution that can solve this.

Now, with FROs, it is planned. It’s this engineering project, and you have a transition plan you’re working toward. You can spin off companies or spin off nonprofits. So, you can plan it to some extent. But some things do have risks. There’s execution risk, technical risk, to different degrees.

Certainly, with some of the things we’re talking about, where we’re giving someone eight years to work on a project, the most exciting ideas — the ones with the greatest potential — are often super unlikely to work. Some people will take on projects that are quite unlikely to succeed and won’t optimize for their career in the traditional sense. Then, where do they go?

Maybe they’ll start an FRO, or become an Aria PM, or, after doing one FRO, create a technology center at Arc. Donate ourselves to Arc. There’s a lot of options, but only if the ecosystem is being stimulated. Then the question, in part, is, “How sustainable is it? How much can philanthropy do? How much can the government do?”

Tyler Cowen: I think one has to liberate academics and scientists from the notion that the background level of risk should be zero. Once you start living that way, you actually accumulate risk, to some extent — the risk of becoming irrelevant becomes extremely high. It’s a hard leap to make. People in the arts all know they face very high risks, and most of them fail. In many ways, it’s a much healthier background for experimentation.

Patrick Hsu: And Tyler, how much can we blame tenure for this?

Tyler Cowen: Well, I view tenure as an endogenous outgrowth of the process. In schools that have gotten rid of tenure, whether you think that’s a good or bad idea, faculty behavior in terms of risk-taking isn’t all that different. Most of them stick around and do what they were doing before. So I see tenure as a pernicious side effect of a broader malaise.

Adam Marblestone: Yeah, it’s interesting with FROs, right? It really depends. If you have an academic audience, we say, “Oh, it’s only five years.” But if you have someone working a software engineering job in Silicon Valley, it’s more like, “Well, I’ve never stayed anywhere for more than two years. I’m always looking for the next coolest opportunity down the street.” So there is this different philosophy. Part of that is going to differ in different fields; in some fields, the skills are more or less transferrable. 

Even in an FRO context, I think we do need to think about FROs also as a certain training environment. Maybe it’s a training environment for team science or systems engineering as opposed to individual science. A PhD is training for individual science, but what is an FRO? A FRO is a training for these other things. I think that’s important. 

Patrick Hsu: Going back to the Silicon Valley inspiration, one of the really powerful cultural imprints is that if your first company fails, you are not a failure. VCs will back you for your next play under the right conditions, and with the right idea, the right team. You don’t have that scarlet letter of, your previous project didn’t work out, you burned through five to ten million dollars on the ground.

But it’s actually a fundamentally optimistic take — that you’ve learned something about how to create the impossible, run a company, set a vision, hire people, develop a customer base. This idea — can we train people to do team science and have folks who know how to exist within that ecosystem?

People often frame this incorrectly as a basic science and industrial science divide. “In industry, we have team science, while in academia, it’s more about individual science.” But I think there are significant cultural elements we can really draw from industry.

I had dinner last night with a senior colleague who spent the first couple of decades of their career at Bell Labs. They left and went to a university to become a professor after Bell Labs shut down. For them, the idea of having a guaranteed job for five, eight, ten years is something that’s unheard of — no one has that expectation. Just like artists don’t have this expectation that they’ll be able to be funded or work on their best idea for infinite periods of time.

Adam Marblestone: There really is that training. People might say, “If you haven’t done research, you don’t know how to run your own research group.” That may be true, but similar things happen, let’s say, within FROs. We have academic scientists that come in and initially they’re like, “Wow. There’s too many meetings. Why am I coordinating so much with these other people on this team?”

But by the end, they really know how to coordinate effectively, plan something for a longer term basis or larger-scale basis. They’re doing all sorts of things that they weren’t doing in the academic setting. That’s going to serve them really well in all sorts of future dimensions.

Tamara Winter: What are some cool, interesting areas of science or technology that you think are currently underinvested in? Tyler, you just gave us some.

Patrick Hsu: What’s underinvested in right now? I think most of biology. I was at an AI in biology dinner the other night where we were talking about how the model performance, these days, of LLMs, of transformers generally is incredible, even with vanishingly small amounts of data. The important thing about the data is that it needs to sample enough about the behavior of the system. The thesis of the expert biologists in the crowd is that we just measure too little of biology.

The question is, what data are we missing and how do we get it? And there’s no clear consensus on it. It often revolves around measuring more of the central dogma at the single-cell resolution — measuring more DNA and RNA and proteins and developing single cell technologies. But there’s this broader idea that we think about in my research group: biology has always been a measurement discipline. We’ve really been focused on things on what we can look at, whether that’s a microscope or sequencer.

But fundamentally, and this is something we appreciate a lot in microbiology, the single cell may not really be the right fundamental unit for biological function. We understand quorum sensing, and biofilms, and community behavior in the microbial context. But in the mammalian or human context, we talk a lot about measuring single cells because it’s easy and cheap and you get a lot of richness of information. But we don’t really have technologies that look at interoception, cell-cell communication, long range effects, things at the organ or tissue scale.

There’s a fundamental lack of technologies that allow us to peer into and measure what’s going on at the higher level of hierarchy. Maybe that’s the missing data per se, but that will require fundamentally new tools.

Adam Marblestone: Just how deep the basic physics, basic measurement technology gaps are in biology, when you get to these 3-dimensional systems interacting, multi-scale — there’s such a big gap. That’s another reason why this is happening: you need state-of-the-art photonics and state-of-the-art biochemistry and computation to do that.

With AI right now, it is very exciting. The possibility that, in the end, the description of biology is much less a list of things or a static representation like, “This protein is located here” or “This is the sequence of this organism’s genome.” It’s more like an embedding space, a machine-learned representation rather than something biologists understand. This is going to be the description of that cell or that tissue could become the new way to describe a cell or tissue. That is a possibility, but that will require this upscaled approach to data generation for sure.

Tyler Cowen: One way to approach it is to go to a typical university and see which departments are small but not totally irrelevant, and look for opportunities there. When I did my podcast with Richard Fromm, one of the most prominent ornithologists, he was telling me that the last 10 to 15 years have been an incredible revolution for ornithology. We now have data on everything, and before we didn’t have data on anything. But there is a scarcity of people to do the work. 

Even in areas like biomedical, you could imagine an advance in something like metabolism coming through ornithology and not just direct biomedical research. It has happened so often in the history of science, that lateral applications come from seemingly distant areas. Quantum mechanics are behind computers. Who would have thought that, right? There are so many opportunities, but talent is scarce, and money is scarce. But you can have a really big impact just by having a degree of daring in yourself — which is more scarce than IQ or even money.

Tamara Winter: About introducing the concept of an FRO, what is the ideal interaction between governments and these new scientific institutions? It’s interesting to watch ARIA spring up, and SPRIN-D in Germany and of course DARPA, ARPA-E, ARPA-I, ARPA-H, and IARPA — all the ARPAs.

Tyler Cowen: Try them. I would say resist nostalgia for the past. I get a little nervous when I see people looking back at early DARPA and thinking, “Oh, that worked great. So now we’re going to keep on cloning that.” It just doesn’t feel quite right to me. But we are seeing way more experimentation, and we need to let those models evolve as well. 

I’m quite optimistic. There’s such an intense, vibrant debate about science policy with actual institutions in play — from the private sector, foundations, corporations, and governments. It’s pretty phenomenal, in a small number of years.

Adam Marblestone: You don’t want an exact clone, but I think the ARPA model is insanely powerful. Very, very powerful, because whatever the institutions look like at a given moment, that ARPA program manager is going to go and play the piano between those different institutions and form that central coordinator role, for these findings that are too big of a risk for individual organizations. 

The ARPA model is very powerful, but exactly cloning DARPA — I don’t think you want to do exactly that either. Maybe you actually want to include more FRO-like things, more OpenAI-like things. Maybe the best thing a DARPA program manager should do is nucleate an Arc Institute. It’s unclear, but there should be an expanded playbook of ARPAs rather than restricting it down. But the ARPA model is super powerful, super general, and it makes sense that we have ARPA-I, ARPA-H, and so on.

Tyler Cowen: This may be my arts background coming out too much, but I see cultural self-confidence as an absolutely essential input, and it’s scarce. There’s no guarantee that it’s there. Many parts of a country may not have it, or maybe none will. But when it is there, that’s when truly wonderful things happen. With institutions, you can ultimately only do so much work, but you need that magic in the air, and you need to be ready for it. That’s a far more intangible thing, but it’s not impossible to steer or nudge it. You need to try that too.

Patrick Hsu: One of the really powerful effects of cloning the ARPA model is the idea that a moonshot ambition is baked into the mission of that agency. Having a governmental process for, operationally, creating more agencies with moonshot ambition as their literal reason for existence is really powerful. At the same time, I would like to see the government do more structural innovation beyond the agency level. There are lots of opportunities that could happen at a lower level of hierarchy.

But I agree with Adam’s point. For example, one thing FROs do is think systematically about the gap between what universities and corporations can do. What Fast Grants has done is to think about the gaps between large government-funded systems and individual philanthropists making individuals grants. Arc thinks about this at the intersection of universities, or basic science and industry, or biology in the technology sector. 

There are these huge holes, and one of the long-term win-modes for Arc will be that people try to create more of these. That relative to the monolithic university or medical school research model, people will think that several hundred-person research institutes could be cloned, are effective models for doing breakthrough science, and should happen in multiple places.

Tamara Winter: I’m interested in — it’s still early days for all your organizations. Fast Grants has wound down, but we’re still seeing what will come of all the research that’s been done. I’m curious — what areas were you most optimistic about, or what interesting results are you starting to see? Why should I, a laywoman, be excited?

Patrick Hsu: Internally, we thought it would be a shame to bring together scientists of this caliber and simply have them work in the same building on what their labs were going to do anyway before they came to Arc. So, we think a lot, institute-wide, about how we can build better collaborative models to do bigger team science. The two major focuses for now are Alzheimer’s disease and predictive biology.

One of the interesting things about biological research is that our ideas are often much bigger than what we’re actually able to implement in the lab. It tends to be subscale relative to the vision. A lot of the reasons for that are remarkably prosaic — it’s because you have two postdocs doing it, or simply are only able to include an experimental component but you can’t get top computational people for whatever reason.

For us, building the infrastructure so that you can tackle a problem as complex and diverse as Alzheimer’s, with the cutting-edge technologies in each core area — how do you make the perturbations? How do you make human organoids with all the different cell types of the brain? And how do you read out and computationally analyze what’s going on? That type of thing is the reason why senior labs can grow to 30, 40, or even 60 people — it’s essentially to own internal platforms. We’d like to centrally operationalize this.

On the virtual cell side of the house, one of the interesting things about AI is that neuroscientists have been making fun of computer scientists for decades about the concept of neural networks, having neural layers, and neurons in an ML model. But the funny thing now is that computer scientists seem to be having the last laugh. With enough scale of data and with the right kind of attention — if you can predict something from any arbitrary series of tokens and generally have very accurate predictions on what to think, say, or do next — that seems to be remarkably close to intelligence. 

Even if you don’t accept that this is intelligence, prediction of any set of tokens seems to pretty much mean you can do most things. For us, we simply need better model interpretability. We need to be able to make biological datasets with scale and order that have generationally been impossible, and are only possible now. This is a unique opportunity, and we’re building a team to tackle it in a best-in-class way — both across how we generate the data and how we build the models to understand it.

Adam Marblestone: I totally agree with that. That’s super exciting, and I think it’s going to redefine all the different cell types or organisms. It’s all going to become part of this huge data structure.

With FROs, there’s probably two big things that we’re excited about. One is that we’ve now had the first teams running in labs for a bit over a year. We’re seeing some of these theoretical questions, “the shadow of the future”: Can you hire good people? Can relatively junior people manage teams? Can they work together? These things are going, on the whole, really well. The teams have a cohesion, and they seem really channeled and streamlined toward their goals.

That’s maybe the thing we’re most excited about. That is allowing us to create a better interface with the FROs to essentially say:What does the life cycle of an FRO look like? What are the things you need to be doing after month six? What are the things you need to be doing after month 10? How do you get your lab space set up? How can we help them with hiring? Some of the infrastructure is getting better.

But I think the thing I’m most optimistic about is — we’re seeing a bubbling up of ideas for this. It is unlocking this creativity of scientists. We’re getting proposals in areas… Again, we started in things like neuroscience. That was closer to stuff I had experience with. I was quite confident that neuroscience had multiple FRO ideas in it. I did not know that climate, measurement, and data, and agriculture, math, and epidemiology would have FRO-shaped problems. It definitely seems like they do. So we’re excited about that.

Tyler Cowen: Two things I’m really excited about, on somewhat of a different plane from those last two answers. 14- to 19-year-olds: I don’t think we, as a society, have emotionally internalized how well-educated they are — the smartest ones, who are self-educated. I’m very excited about the Schmidt Futures idea to fund a lot of them. Emergent Ventures tries to do this as well.

If there’s one thing I would have everyone try to do more on, it’s targeting that age range — people who haven’t yet decided if they want to be scientists or not. Partly to get them to be scientists, or maybe entrepreneurs who contribute to science, whatever it will be — get them excited, get them into networks.

And the other is the nation of India where I now visit frequently. It seems to me India will be or already is a major talent source in the same way that Central Europe was in 1900, 1910. You just have these historical periods, Italy in the Renaissance, France in the 19th century, where things blossom. The place isn’t always rich. There’s ambition, there’s aspiration, there’s competition, there’s enough English language there, internet connections are good enough.

The importance of India in scientific progress or intellectual worlds — we, here in North America, are barely beginning to figure out, and I think we should all be a lot more clued into that. That will be a third or more of the world’s top talent. And that would be my number two pick.

Tamara Winter: One of the things I want to congratulate all of you on is asking more interesting sorts of questions — in your research, but also at your institutes and in the course of doing science. So, I’m curious about the next set of questions you’re asking yourselves and your institutions. Where are we going next?

Tyler Cowen: For Mercatus, one big question we face is: What can we do next in India, and what should we do next with India? The answers are highly uncertain. India is quite distant. Our goal isn’t to preach anything to India, our goal is to learn from India and have good working relationships with people there. 

Other parts of the world, we’re always looking at how we can attract better, more creative, and more ambitious students to our own projects. At any point in time, we support about 70 graduate students and 10 undergraduates — it’s roughly 80 people. It’s a lot of people. It’s the core of what we are, what we do. How can we make it even better?

We’ve been doing that for over 40 years, so we have a lot of experience. But experience is a trap, too. The world has changed so much over those 40-plus years, so we try to keep ourselves on our toes.

Patrick Hsu: For Arc, I would say we are a newcomer in a very long and illustrious history of American biomedical research institutes, starting with Rockefeller University in 1901, and coming out of this explosion of creating Institut Pasteur, Charité, and Berlin Hospital, to the Salk Institute and Scripps in San Diego in the ’60s and ’70s, the Whitehead Institute at MIT in the ’90s, the Broad Institute in 2004. Each of these has been unbelievably successful places that have done incredible breakthrough science, but they were also created in a time with very specific historical and medical circumstances.

For Arc, in 2022 and 2023, we see biology changing rapidly — it’s clearly accelerating even compared to 5 or 10 years ago. The types of experiments we can do now — single assays to interrogate every single gene in the human genome, when just a few years ago you could get your PhD for knocking out a single gene in a mouse and studying what it does. We’re able to increase, by multiple orders of magnitude, the scale of science that we’re able to look at and measure. Biology is going to dramatically change in the decades ahead, to move beyond a list of parts to understanding the embeddings.

So, we think about what types of unique technical and cultural capabilities we need to bring together to tackle unique, specific challenges today. And then, more broadly, how can we try to clone these model concepts, working as part of this exciting community.

Adam Marblestone: A lot of what we’ve been doing has been being pretty heads down, in operational mode. I have an amazing operational co-founder, Anastasia, and a lot of the things we’ve been trying to figure out are operational efficiencies of various kinds. For example, how do we boot up an FRO pretty quickly? Exactly who presses which button in the payroll system? Who signs the offer letters? These types of operational details are how you have multiple organizations that have some economy of scale to them, relatively autonomous but also relatively quick to set up, and amenable to people who want to mostly be focusing on science.

So, part of it is very operational. And then part of it is about finding the right balance between the internally driven nature of the FRO, very finite milestones and goals, and its external connectedness. How do we form effective scientific advisory boards for them? How do we involve industry experts that are feeding into how the FROs end up spinning things out of the project and generating impact from the project? How much does that ecosystem around the FRO, and the between them matter?

Long-term, there’s scaling questions both on the demand side and the supply side. On the supply side, there’s the question of who are all the ARPA-like program managers? Should all the people who don’t go to ARPA-H end up finding FROs? And they don’t do that, should they go to ARIA or Arc Institute? What’s the talent pipeline on the input side?

On the other side: How do we get more predictability in the process? Right now, we are a matchmaking organization that takes good ideas and teams and helps refine them and match them with individual philanthropists or combinations of philanthropists. But what’s the way to do an FRO competition? The best 10 ideas in 2025, can we just do all of them? That’s a huge scaling and funding question.

Tamara Winter: If people who are listening to this want to help you or get involved in some way, what is it that you all need? I think, Adam, you just told us what you need. But what about you all?

Tyler Cowen: Just email me, Tyler Cowen. My email is online. I respond to all emails.

Tamara Winter: Very quickly, I might add.

Tyler Cowen: Whatever advice, ideas — anything — please just write.

Patrick Hsu: Research institutes live and die by the quality of talent that we are able to bring together and our ability to vision-set and coordinate that talent to do amazing science. So, anyone who is interested in this mission or this shared set of challenges, feel free to email me as well, my email is patrick @ arcinstitute.org. I may respond less quickly than Tyler, but I’ll do my best.

Adam Marblestone: Right on. Email me too.

Tamara Winter: Excellent. Thank you all so much. This has been really fun. Cheers.

Patrick Hsu: Thank you.

Tyler Cowen: Thank you all.

Caleb Watney: Thanks for joining us for this episode of the Metascience 101 podcast series. Up next, we’ll zoom in for a practical how-to on experimentation and evaluation in metascience.

Episode Five: “How and Why to Run an Experiment”

Caleb Watney: Welcome to this episode of the Metascience 101 podcast series. Professor Heidi Williams, Professor Paul Niehaus, Emily Oehlsen, and Jim Savage discuss “How and Why to Run an Experiment.” 

Emily is the Managing Director of the Global Health portfolio at Open Philanthropy. Jim Savage talks about his experience as the Director of Data Science at Schmidt Futures, another science funder.

Together, we’ll zoom in for a practical “how-to” on experimentation and evaluation in metascience with a special eye for relevance for policymakers.

Heidi Williams: We wanted to do an episode talking about how to do research on how we do research. Research can mean a lot of different things to a lot of people: qualitative research and interviews, novel sources of data collection, trying to understand something that’s happened in the world and going back to evaluate what we learned from it, or prospectively designing a randomized experiment. 

I want to emphasize that by research, I don’t mean just narrowly research papers that are primarily intended to be published in prestigious journals. What we are talking about today is research in a very broad sense: a way of learning about what’s working that can inform making things better.

In particular, today we’ll talk about cases of research done within organizations to improve how they were accomplishing their goals – how organizations use research to try to better accomplish their goals. Sometimes this research results in traditional, published academic papers. But to make clear at the beginning, the intention of our conversation is that we’re not talking about research only for the basis of publication, but rather trying to accomplish some other goal in the world.

When thinking about the economy, we do that in a lot of different settings. When we have a new candidate drug compound and we want to know whether that saves patients’ lives, we do a very intentional series of tiered research investments. 

You do a phase one trial that’s usually done in animal models. You learn something about that from a safety perspective. Then, you move on to a phase two trial, which is more expensive. If something looks promising in phase two, then you move on to phase three trials, where you’re looking at efficacy in a larger human population which costs more. 

In generating that tiered set of evidence, the hope is that we’re going to take an idea and move it toward something that could have social impact at scale. I think that this framework of piloting an idea and moving it through a funnel to get a serious evidence base, where we’re comfortable making scaled organizational decisions, is one that we will come back to at various points. It has much wider potential than how it is currently used in the science space.

The other thing I wanted to preview upfront is: even very thoughtful people are often extremely skeptical about whether research on how we do research is even a feasible possibility. Oftentimes we think, “Well, what we want is to fund high-quality science, but people can’t even agree on what it means to measure high-quality science.” 

I think the right place to start off here is with a thoughtful example around measurements. I tend to be an optimist because I feel there are actually a lot of opportunities to make progress in a very concrete way. Rather than talking in the abstract, I wanted to have Jim start by talking about talent identification, which is one of the areas that people think of as very hard to measure. 

How do you find talented people? Jim, you worked on leading an innovative program, the RISE program. An exciting thing about your work was taking seriously that it is a hard thing to do, but tackling it as a question that you could invest in research to learn about how to do it better. Could you talk a little bit about that?

Jim Savage: Sure thing, Heidi. Let me just start with a little bit about RISE and why it was an important thing for us to spend some time trying to learn about. 

RISE is the cornerstone of a billion dollar commitment by our co-founders, Eric and Wendy Schmidt, to support talent for good. It is a global talent search for brilliant teens aged 15 to 17, whom we try to find, support, and challenge as they go and create public goods at scale. We make a huge amount of support available to them via needs-based scholarships, summer camps where they get to spend a bunch of time with all the people who are like them, other forms of support, and then career support as they go and hopefully do good for other people.

Now, we kicked this program off about three years ago. It’s a very large program. We have tens of thousands of applicants applying for this program, and only 100 winners every year. When we started it along with our partners at Rhodes Trust, our principals, Eric and Wendy, and our CEO Eric Braverman gave us this challenge. The challenge was that we needed to come up with a way of finding brilliant youngsters that was open to anyone in the world who could apply. We should have lots of different pathways to apply. 

Because most people will miss out, it should also be a program that benefits people in the very act of applying. So although most people miss out, they have still gained something from having applied, which is kind of a bit different to many university applications or scholarship applications. We were given this challenge of finding a scalable way of measuring talent and identifying people who are brilliant and empathetic, and have high integrity, perseverance, and some calling. How do we find those people at scale in such a way that it benefits them? It’s a really hard problem.

We went and read the research. Our team interviewed dozens and dozens of scientists on this – people who’d done studies over many decades. We read all the papers. We worked with a really great team that spun out of Khan Academy that did some interesting design work on how we might measure talent at scale. Then we had a product review with Eric Schmidt. Now, product review maybe has the vibe of – I’m sitting with a bunch of economists in here – like an economic seminar. It’s where your principals will challenge you and really test how much about this you understand. 

Let’s just say, we didn’t last very long in this product review. After a few minutes, Eric stops us and he’s like, “You know, you’ve obviously done a lot of work, but this is a real investment. We need to understand whether we can identify talent at scale using this method. You haven’t shown me the experiments that you’ve run. You haven’t shown me whether what you’re proposing is a good way of identifying talent.”

He called us up afterwards and said, “Okay, I’m giving you air cover here to go and do the trials. Go and do the experiments so that you can show us whether this works.” We pulled together a team. Now, I’ve never run a human trial study before, so this was very new to all of us. We worked out a couple of different models for how we might test this: how do you go and measure integrity in teens at scale? 

And we came up with an interesting design. Imagine we could take a population of brilliant youngsters where we know that there are pre-existing outliers, which we know from very expensive-to-collect data. Then, we have that population of youngsters go through a mock application process. A good application process should identify those people who you already knew to be outliers.

What we did is we recruited 16 classrooms from eight different countries around the world: United States, Hong Kong, South Africa, Zimbabwe, and a few others. We sat down with these gifted and talented classroom teachers who had spent at least a year with their students. We asked them to roughly score all their students against intelligence, empathy, integrity, and those sorts of things. The hunch here was that those traits might be observable by people after having extended exposure to each other in rich context.

Then, we sat down with many of the students in each of those classes and had them nominate the top three most empathetic peers in the class and the top three most persevering kids in the class. It turns out there’s a lot of agreement between people. People are talking about real constructs here. If you construct a Gini index where zero is everyone guessing at random and one is everyone naming the same top three, we’re talking like 0.4 to 0.6. It is a fair degree of agreement.

This data is very costly to gather. There’s no way that we could get this at scale. With this data, we then had the question: are there interview questions, tasks, exams or other things that would identify the people in this group? 

Now, these kids were already in gifted and talented programs. They’re already pretty sharp to start with, and we know their classmates and teachers could identify outliers. Can we identify those people? So we tested dozens of interview questions. We had 23 wonderful volunteers, mostly PhD students at Oxford from all around the world, who volunteered as interviewers to have interview panels with these youngsters. They had 45-minute structured interviews where they tried to get the sense of whether the person demonstrated evidence of having high integrity or empathy. We gave people questions from the LSAT and other aptitude tests. We gave people divergent reasoning tests like, ‘How many uses for a T-shirt can you come up with?’.

We had the youngsters record selfie videos, and we recruited some volunteers from that same age group to watch these selfie videos and grade on whether they thought they exhibited high intelligence, high empathy, or integrity. After all of this, we learned a couple of really shocking things that changed how we built RISE. 

The first was that many of these questions that I have been using in interviews for years don’t work very well. At least they didn’t identify the outliers that we knew about. It was a big hit to me – I haven’t used any of my old interview questions since.

Second, we learned that there was very little relationship between the structured interview panels and the very costly data that we gathered from classmates and teams. When we decomposed that error, it was an error that systematically favored the mock candidates from richer backgrounds and systematically penalized candidates from poorer backgrounds. That was really something.

We went back to all those interviewers and told them this, and they said, “That’s really interesting. I want to know who were the people that we were making mistakes with?” We found one candidate who was nominated by 80% of his class as being the smartest kid in the class, and yet when the interviewers interviewed him, they rated him as the second worst on that measure. Why is this happening? The three interviewers agreed – they had very high inter-rater reliability. We went to the interviewers and they said, “Well, he didn’t answer any of the interview questions. He just did very, very poorly in the interview and didn’t give us any evidence.” We used a lot of that.

We still do use a small amount of interviews at the end of the application process for RISE, but it is not a hurdle. You can have a fairly weak interview, and as long as the rest of your application for RISE is really strong, you can still get through.

We now spend a lot of time preparing people to make sure they really are able to put their best foot forward when interviewing. We also only use questions that we know have been validated using this sort of mechanism. Now, the delightful thing about this was that we found certain questions to be very strong predictors of whether the classmates and teachers thought highly of candidates. When we rolled out the live application, the candidates who did well on those questions had much higher rates of completing the application, which involves working on a project for seven weeks. In live data, we saw it validated that we could get some data very, very cheaply that was predictive of real world behavior.

Heidi Williams: One thing I love about this example is that oftentimes when people think about research, they’re like very narrowly focused on impact evaluation as opposed to validation of measures. 

Paul, it’s interesting because it is very similar to how you talk about some of your data and measurement validation work. I know that came up a lot in your work on GiveDirectly and other things too. But I want to transition and ask if you could kind of talk about research that’s more traditional, like impact evaluation. How do you think about what the key steps are that you need to bring together for that to be a meaningful, high potential investment?

Paul Niehaus: Yeah. I always tell people that there are three hard things with impact evaluation. One is to be clear conceptually about what you’re trying to achieve. A second is to think about good metrics for that, which I think is what Jim has just shared a great example of. Those two are obviously very interrelated and are things that organizations typically need to do anyway for many other purposes. Sometimes running an experiment can be a good forcing function to get you to do that if you haven’t already.

Then the third key thing is counterfactual reasoning. The essential thing about impact is how the world is different as a result of the thing I did compared to the way it would have been if I had done something else. People will sometimes say in a sort of loose way, “Oh, we did this thing and you could really see the impact of it.” But if you take the definition seriously, that’s not true. There’s no sense in which you can ever literally see the impact of something you’ve done because you can never see how that alternative world where you did something else looks. 

The really exciting and challenging thing about impact evaluation is what are good ways to make inferences about that counterfactual which we can not see. That’s what experiments are all about and why I think they’re very powerful.

With an experiment, we take a group of people – kids, perhaps, who want to enroll in Jim’s program – and assign a group of them to get in and a group of them not to get in. Then, we look at how their lives evolve after that. When we compare those outcomes, if they were assigned randomly to those two groups, we can be really confident that the kids who didn’t get it are giving us a pretty good counterfactual for what life would’ve looked like for the kids that did get it, if they had not. That’s the power of the method and the experiment.

Why is it so important? There are lots of other ways that we can go about trying to measure these things that seem appealing or intuitively right, but they can turn out to backfire or not to work in the way we expected. A very common way to look at how things are going is comparing people before versus after they get help. You’ll often see situations where people opt into getting help at times when they need it and when things are going badly, and then afterwards things get better. We’re tempted to say, “Ah, things got better, because the thing that we did to help them is working.” Whereas in fact, some of that is just because when things are really bad, there’s nowhere to go but up. Things tend to get better after that. That’s been a common issue in a lot of program evaluation. 

For example, when looking at ways to help people who are unemployed, people who are having a hard time finding a job opt into some sort of help finding a job, and then low and behold, they do find a job. But we don’t know how much of that is just because they would have found a job anyway.

That’s the power of experiments. There are also other ways of trying to draw these counterfactual inferences that can be useful – times when you can do something that’s very close to an experiment even if it’s not exactly an experiment which is within the parameters of the decision-making structures you already have in place. 

A common thing that we do in economics is we might look at a system where there’s a cutoff. Maybe like Jim’s program, if they’re above some threshold, people get into the program. We can say, “Well, let’s look at the people that are just above that and compare those to the people that are just below that threshold.” They’re slightly different, but those differences are pretty slight. So we’d feel pretty confident saying we can attribute different outcomes for those groups largely to the impact of the program, as opposed to other factors.

So experiments are not the only way of drawing these kinds of counterfactual inferences, but they are very powerful. They force us at least to think hard about that question of “how am I confident that I can see what the world would’ve been like, if I hadn’t done this thing that I’ve done?”

Heidi Williams: Yeah. I want to kind of transition to talk more about experiments directly. Before we do that, Emily, I would love for you to say a little bit about how Open Philanthropy thinks about using impact evaluation and counterfactual evidence in your decision-making – just to give a lay of the land before we get into kind of more specifics on experiments.

Emily Oehlsen: Yeah, absolutely. By way of quick background, Open Philanthropy is a philanthropic organization that gives away a couple hundred million dollars a year, and we aim to maximize our impact. We think about that in a pretty evidence-based and explicitly expected-value-maximizing way. There are two sides of our organization: one that focuses on potential catastrophes that we might encounter over the next couple decades, and the other focuses on ways that we can make the world better in the near term – often in much more concrete and legible ways. 

A key distinguishing feature of that second piece is that we are often trying to compare outcomes – not only within causes but also across them – to try to optimize our overall portfolio, which we take to mean equalizing marginal returns across all of the different areas that we could be working in.

Heidi Williams: To be concrete in thinking about this, how do you put health investments and education investments in a similar unit?

Emily Oehlsen: So some of the areas that we work in are scientific research and global health R&D. We do some work on the health impacts of air quality. We do some work on farm animal welfare which makes the comparisons quite difficult because you have to think about the suffering of animals and people. We do work on global aid advocacy, and a few other areas. 

There are lots of things that we care about, but as a simplifying principle, we often try to think about the health impacts of the work that we do and the way that they affect people’s consumption. So far, we have thought as hard as we can about how to compare those two units and use that as a disciplining force to think about the marginal thing that we could do in each of these areas.

I really liked Paul’s taxonomy. We try to think hard about what we care about and the metrics that we can use to measure those things in the world. There’s tons of complexity. Even just taking health impacts, we rely a lot on the IHME and the WHO to think about the life years lost to different health conditions. And there is tons and tons of complexity embedded into that. We are avid consumers of experimental evidence, as we try to evaluate different opportunities that we could pursue.

I’m particularly excited about the work that we do thinking about places where we can innovate how we use experiments in social sciences and in public health. 

One example from Open Phil today is that our science team is exploring the possibility of funding a controlled human infection model (CHIM) for hepatitis C. Hepatitis C is a particularly good candidate for a CHIM because there’s slow natural progression of the disease and a relatively low intensity of transmission – even among high risk groups – which make classic field efficacy trials extremely slow and difficult to conduct and make the possibility of a human challenge trial more exciting. I don’t know where that will go, but I think it’s interesting to push the frontiers of places where we can use experimentation.

Heidi Williams: Yeah. That’s a great example. Like Paul brought up, what is the experiment you would run with RISE? 

If you were going to fund 100 kids, let’s choose 200 kids that you would most like to fund, and you randomize funding for 100 of them and not for the other 100. Then, you want to track how their life is different. That sounds like something where you’re going to structure this 20 year study, where what you care about is their earnings when they’re older. So, these studies come across as feeling very infeasible. 

You mention an interesting example of how we can learn more, and more quickly – innovating on the research side of that. 

Paul, I was curious if you could say a little bit about when people understandably say, “Isn’t that too expensive and going to take too long?” What are some of the ways that you bring to people when they want to use experiments for more real-time decision-making?

Paul Niehaus: Experiments really run the gamut from extremely fast to very long-term, from extremely cheap or free to very expensive. Concretely, like at GiveDirectly, which is an NGO that I started, we’ve run around 20 studies that have ranged from five years long from the initiation until having results and cost hundreds of thousands of dollars, to four weeks from the beginning to having useful data back and cost nothing to run. Just to have some sense of the range of possibilities. 

What drives that? Randomization per se is not expensive. I mean, if we just want to randomize something, we can do that right now in a Google spreadsheet and it costs nothing at all. Picking things using a lottery is free. The thing that is typically expensive and possibly slow is the outcome measurement.

At GiveDirectly, for example, the expensive and slow trial that I mentioned was where we wanted to see the impact on local economies if we bring in a huge amount of money. To do that, we have to do this very extensive measurement of what’s going on with households, what’s going on with firms, what’s going on in markets with pricing, and what’s happening at the local government to get this comprehensive picture of how an economy reacts when there’s a big influx of money. That takes time and it takes a lot of resources to go measure all those outcomes and then analyze the data. That is to some extent intrinsic to the thing that we want to look at.

At the other end of the spectrum, the very quick and cheap study I mentioned looked at whether a little bit of liquidity before people decide how they want to structure their transfer changes their decisions. The beauty here is that this is an administrative outcome which we’re already collecting anyway. We have people that are already asking, “How would you like to structure your transfer? If I give you a thousand dollars, would you like it all at once? Or would you like it in 12 tranches?” If we want to see what happens when they have a little bit more cash in hand when they make that decision, we get the data back for free already, so it only takes a few weeks to do that and it is very cheap to do. It’s largely a question of the thing that you want to look at.

In terms of the longer term, sometimes we really do care about how the world will be different in 10 years. There’s a part of the answer here that – whether you’re doing an experiment or measuring impact in some other way – if you want to know what things will look like in 10 years, you just have to wait 10 years. That’s not a feature of experiments, that’s just a feature of life. 

But I would also say that there’s an interesting frontier in statistical analysis looking at surrogates, essentially things that we can observe now that we think are good predictors of what the world might look like in 10 years. They can at least give us some leading indicators of whether we’re seeing the kinds of changes that are indicative of the longer term as well. I think there are ways to be smart about that.

The last thing on the cost of experiments is that sometimes there is a risk of being penny wise and pound foolish when sizing them. The issue here is that you want to design an experiment that’s big enough to give you the degree of confidence in a statistical sense that you need to be able to make decisions. You want to be thoughtful about that. I have participated in things where later I think, “Actually, we should have done this with twice the sample to have more confidence in the result.” There’s a whole art and science around sizing those experiments.

That’s the one place where you want to be careful not to cut corners. Part of that is because we are in this bad habit as social scientists of saying, “If we can’t be 95% sure that something happened, then we’re going to treat it as if it didn’t.” There’s this pathology in how we interpret so-called “null results” that makes it even more problematic and makes me err on the side of having a larger experiment to make sure it doesn’t get misinterpreted in a way that things often get misinterpreted.

Heidi Williams: Emily, a different concern that people often bring up with experiments other than feasibility is external validity. 

Say you do a study in one setting. GiveDirectly does an experiment in one country like Kenya. What is the external validity consideration? What should we learn about that if GiveDirectly was going to expand its giving in India, for example? 

At Open Phil, you seem to use experiments a lot internally. Open Phil has a really great practice of often publishing the reasoning behind their investments, so one can get a lot of insight into how they used research in making their decisions. It seems like you think a lot about that. I was wondering if you could give a few examples of where you’ve seen that work well.

Emily Oehlsen: Yeah, definitely. Two responses come to mind. So one, Heidi, you talked earlier about how in the biomedical sciences we have a clinical trial process with different stages that have different costs associated with them. We’re willing to invest as we get more and more information that a particular drug, diagnostic, or therapeutic is potentially effective and looks promising to get widely distributed. There’s an equivalent in the social sciences too. 

The main example that comes to mind for me is Development Innovation Ventures, called DIV for short. Within USAID, they’re a special division that makes smaller investments often in riskier and earlier stage projects, where there’s the potential for high impact. They have a similarly staged process. I think it’s stages one, two and three where the dollar amounts scale up. There’s an initial pilot phase where you might run a small experiment to get some preliminary data on a particular intervention. As you become more and more confident that that intervention is effective, you can run larger and larger experiments to see how it scales before ultimately thinking about broader deployment. I think that that’s like a really effective model.

Another observation that comes to mind is that sometimes thinking about a single paper or a single experiment is not the right unit. One example here is there were a number of RCTs that were run around water quality, but they were all individually underpowered to look at mortality because mortality is rare. Michael Kremer – who recently won the Nobel Prize – did a metaanalysis and found a big, statistically significant effect from these water quality interventions on mortality. That meta analysis played a significant role in GiveWell’s decision to scale up Evidence Action’s Dispensers for Safe Water program. Using this as one example to say that sometimes an individual experiment isn’t enough in and of itself to be decisive, but it can be coupled with other types of evidence that can then lead to a bigger decision.

Paul Niehaus: Can I just add that I completely agree. By the way, the Michael Kremer story is a good example of this pathology I mentioned where sort of we interpret things that don’t reject a null hypothesis as not being informative. Michael basically showed that they are individually somewhat informative and collectively very informative. I think that’s a great example. 

The other thing I wanted to say is there is a very common misperception that external validity – which I don’t even like the term, but I mean whatever – is more of an issue for experimental methods than it is for non-experimental methods. Personally I think that it’s actually often the exact opposite. When you use non-experimental methods, the results are not representative of the population you care about in ways that are very opaque and hard to understand. This is opposed to experiments where it’s pretty clear what population the results are representative for and where you should therefore be careful in extrapolation or scaling up, as Emily mentioned. We don’t need to get into the details of the statistics of that, but if anything I would say that cuts the other way.

Emily Oehlsen: Thinking about the topic of external validity, I think it does raise – and this has been sort of woven into our conversation so far – two challenges that come up when we think about experimental evidence and how to use it. 

One, it is often the case that the importance of some experiments are not evident right at the moment of discovery. We do a lot of grant making at Open Phil that’s particularly directed towards trying to improve health outcomes for people living in low-income countries. It’s sort of clear what we’re aiming for and the significance of the potential results that we could get from any particular experiment. We also do a lot of grantmaking that is more basic science in nature. This relates to Paul’s first question of trying to decide what matters to you. A lot of times when we’re doing that work, it is quite difficult to articulate what a good result means or how it’s going to ultimately flow to impact downstream. That is a challenge that we always have to grapple with.

Another is how to think about effectively using experimentation. At Open Phil, we think about a lot of our grant making as hits based. This is the idea that you are willing to pursue low probability strategies because of the potential upside. Oftentimes with those opportunities, the work is riskier, there are fewer feedback loops, causal attribution is harder, and oftentimes the outcomes aren’t observable until like 30 years down the road and you can’t maintain a control group for that long. Some of our corporate advocacy work in farm animal welfare has this quality, as well as some other areas of grant making. 

I think this is a pretty common observation in the metascience world. In science, you might think that the distribution is pretty fat-tailed and we should be focused on some of these outlier opportunities. And so how to bring experimental evidence to bear productively on those questions is a challenge.

Not in the dimension of learning things faster, but types of experiments or experimental evidence that we’re particularly excited to fund because we think that they’re under-provided in some way by the ecosystem. A couple in this broader bucket is the replication of really promising work. 

Paul and Heidi will know far more about this, than I do. But, I think there are some incentives within the academic world, to under-provide replication because it doesn’t have the same novelty or it doesn’t contribute to your potential career prospects in the same way. But oftentimes, when you’re evaluating a particular intervention and you see one piece of evidence that seems like an outlier compared to maybe y (your prior or other things that you’ve seen), sometimes the most powerful thing that you could then observe is a replication of that work in some capacity. And so that’s one thing we’re really excited about.

To your timeline point: it’s really hard, as Jim was saying, to set up experiments to create that infrastructure and act on it. And sometimes there are particular moments where it is really valuable to spin up something quickly. I think COVID is the one that comes to mind. So being able to create more flexibility in the system so people can jump on opportunities as they arise. 

Heidi Williams: Yeah, and that’s a bit of what we touched on at the beginning. When people think of experiments for science, they often think of this as: What’s the best way of getting the best science? And that’s where I think you get into these ideas, “Well, how would you even measure that? And isn’t that like 10 years, and long, and variable legs of these incredibly tail outcomes which have all the social value?” And when I talk with people that do science funding as kind of their job, I try to kind of anchor them a little bit on what are kind of concrete challenges that you have that are not tied to these sort of more existential questions?

So one example is the National Institutes of Health is very concerned that the average age at which you get your first NIH grant has been going up and up and up over time. And so they’re very concerned that their grant structure for some reason is not doing a good job of identifying young talent. And so for them, if you can kind of say, can we design kind of a research approach, or an experiment, whatever you would like to do, that’s going to investigate our different grant mechanisms, doing a better job of identifying talented young scientists that for some reason might be getting missed by the default system? That’s something where you observe that outcome kind of right away.

Emily Oehlsen: Yeah.

Heidi Williams: You could say, “Well, maybe the young scientists aren’t currently as good as the older ones, but everybody agrees that we need a way of onboarding people into the system.” I feel like some things that can get you out of this “how would we know good science when we see it” issue when it comes up.

But, Jim, I wanted to come back to talk a little bit about the organizational dynamics of how this can kind of work in practice. So oftentimes, organizations understandably see experiments as pretty risky to conduct because: what if we show that our program doesn’t work, and what does that mean for people’s jobs, and what does that mean for me personally as being the person that ran this?

And Ben Jones, who’s an economist at Northwestern often makes a distinction between what he calls existential experiments and operational experiments. The existential experiment is, “Should my organization exist?” And the operational experiment is: “We would like to find talented youth, and we have two ideas on how to do that, and which one is better?” And so I think there are some structural ways in which the research questions that you pick can make this less threatening within organizations. But I’m just curious if you could comment on your work with RISE, kind of how the internal organizational dynamics played out.

Jim Savage: I personally have not experienced that sort of existential threat of whether you have to close down a program or something because it doesn’t have an impact. Which is not to say that I’ve never had any pushback against doing experimentation.

Heidi Williams: Yeah.

Jim Savage: Almost all the time, that pushback has been because doing things is really hard, and setting up a big program, especially something on the scale of RISE where you’re coordinating hundreds of volunteers, you’ve got zillions of candidates, you’ve got different types of software, you’ve got paper applications coming in and chatbot applications and all these sorts of things, it’s an incredibly complex initiative. And each time you add complexity to a program, it just becomes exponentially more difficult to operate. And so especially when you’re setting up some organization or some initiative, it can be really challenging to just add more complexity, and you should have a bias towards past money and what you do.

Heidi Williams: Yeah.

Jim Savage: Which not to say that that’s not also a really good time to do an experiment, because you work out what to do. There is always going to be a tension. I just think that, especially for these more operational or formative evaluation, I think there are, the evaluation people would say, questions. It’s not a fear that you’re going to have to shut it down, it’s just really hard to do experiments. Now, you have seen some fields adopt experimentation that are not full of macroeconomists. So where are these fields? You go looking for them and it’s like, if I’m on MailChimp and I want to send an email to a zillion people, I’ve got an option where I can just run a randomized control trial. They’re called A/B tests, and they call them A/B tests. I think that term is used because it’s non-threatening. Randomization is kind of this scary word. But I can now have different copy and see which one has better click-through rates, so which subject lines get opened more easily. 

You know, if you log onto various news websites, you will see different headlines as they A/B test, or even use multi-armed bandit approaches to work out which are the most higher- high-performing variants of headlines to serve to you. And it’s not because they’ve got a whole bunch of macroeconomists who’ve been pushing people to adopt an experimental method in science.

It’s because the software makes it really easy to do these sorts of experiments. If we are to be able to do more experimentation internally, a lot of it comes down to, how can we reduce the cost of doing experimentation?

Heidi Williams: Mm-hmm.

Paul Niehaus: I think that’s got to be partly because Jim Savage has selected to work in really good, high-performing learning organizations, and that there are definitely examples out there of high-profile, important efforts where people have resisted encouragement to test. You know, Millennium Villages, for example. We would all love to know what the impact of that was. They refuse to do it. There are also pretty well known organizations, examples of RCTs, that got done and got buried because people didn’t like the results. There are some places that are great about it and some places that really do resist.

Jim Savage: When I talk to people, I don’t really hear that, especially if you’re talking to public servants. People admit there is a lot of complexity. They would love to do it, and I think we should be making it easier to do experiments.

Emily Oehlsen: Do you think there’s an organizational feature here? If you’re an organization that does one thing, and the experimental result shows that that thing is not as effective as you thought it was, would that feel more existential to you than if you’re in an organization where you have many different programs going on at once? Where you can take more of a disinterested perspective? Do you think that’s a factor?

Jim Savage: There’s definitely a fear of “evaluation.” I think evaluation has this very threatening tone: “Oh, we’re looking at the sum of impacts of your program or your organization.” That does have some existential threat, but I don’t think we’re really talking about that. 

I’m not talking about that. I’m talking about the idea that you can get program managers to run experiments internally, if it’s easy enough. And I think that most people are willing to do that.

Heidi Williams: Yeah. There’s obviously a continuum. Measuring teacher value added was something that I think felt very threatening to individuals. “I’m getting ranked and scored,” right? I do want to come back to Emily’s point, because there’s an interesting example with No Lean Season that bothEmily and Paul could probably offer perspectives on. An experiment that ended up shaping the organization in important ways.

Emily Oehlsen: I observed this from afar, but it’s an example that I’ve always found, pretty inspiring. Evidence Action was founded in 2013 to scale evidence-based, cost-effective programs. They had their core programs around deworming and scaling free, reliable access to safe water. But then they also had this program called No Lean Season where I think the original experimental evidence was from Bangladesh. It involved giving people both information and then small loans, so that they could migrate to other parts of the country, when seasonal work was scarce where they lived. The original RCT evidence showed that this was a pretty promising intervention for poverty alleviation, and so, Evidence Action started to scale it up. Then they ran two more RCTs as it scaled, that showed that it was less effective than they had expected, and they ended up shutting down the program.

I found that decision quite impressive, to be able to take a step back and say, “Okay, this is not as promising as we thought it was. There are other ways we could deploy this money that we think would help more people and help them more deeply. And so as an organization, we’re going to pivot.” That was a really impressive example.

Paul Niehaus: Especially when you say there are other ways to deploy the money, a lot of that money isn’t in your pocket. Will funders actually respect this choice and listen to us when we say we think you should fund this other thing instead, or will they just walk away entirely? I think there was courage in it. But then also as we’ve talked about, the fact that they had other things that people could move to makes it less of an existential evaluation and more of an operational one, right?

Emily Oehlsen: Yeah.

Jim Savage: One question with these sorts of evaluations is measurability of outcomes. Many of the most impressive investments might be on things that result in some cultural shift, or some change in the zeitgeist, or some demonstration effects that have a lot of people change how they go and do their work. I cheekily use the example of the Koch philanthropies. I think they have pursued many investments that would be almost impossible to study in an evaluation framework that we would be happy with. But nobody accuses the Koch philanthropies of having had no impact. I think people often do the opposite. If you are pursuing some types of things, people might be legitimately afraid of being evaluated when the evaluator will never be able to observe the rich outcomes that they’re actually trying to affect.

Heidi Williams: Yeah. I think of GiveDirectly as kind of a nice example of this. GiveDirectly ruled out doing a lot of experiments that were targeting these more narrow questions. But if I were going to introspect on GiveDirectly’s impact, it seems like it was mostly shifting the narrative around cash transfers.

Paul Niehaus: Yeah.

Heidi Williams: Could you say a little bit about that for people that might not be familiar with the broader context?

Paul Niehaus: I totally agree with what Jim said for this reason, that on the one hand, GiveDirectly had this very evidence-centric strategy, and so even the very first transfers that we delivered were part of a program evaluation which went on to get very well published. And I was totally wrong about that, by the way, thinking, “This is going to be a boring paper. Nobody wants to read yet another paper on cash transfers.” They did. 

It was really powerful, and we said to people, “We’re an evidence-based organization, and we’re going to begin…” And so we’ve gone on to do lots and lots of these. But the most valuable thing that we’ve really contributed to the world is that the narrative around cash transfers has changed dramatically, and we’ve played some role in that.

When we started the very first people we’d go to for funding would say, “This is crazy. This is nuts.” You know, the first time we had a New York Times story, the headline was like, “Is it nuts to give money to the poor with no strings attached?” We now have come to a place where most people don’t think that it’s nuts at all. 

They think it’s obviously something we should be doing to some degree, and the debates are all just about how much, when and when not, and things like that. I think that’s exactly right, that the rigor of the experimental method helped contribute to the credibility of it, and to drive this change in narrative and in perception. But that change in narrative and perception was a very hard part, but the most important part of the impact of it all.

Jim Savage: One of the most helpful things that both GiveDirectly and Open Phil have done within the broader funding community is given the rest of us a really good benchmark, so that when we are talking about intervention that is directed at shifting the zeitgeist or culture or some set of incentives or demonstration effects. 

You’ve got this shadow price in your head “Oh yeah, it costs three and a half to five thousand dollars to save a human life. You can buy this many utils with cash transfers.” Those are really important things to have in your mind when you’re spending philanthropic capital or money for science or something – that there is this opportunity cost out there.

Heidi Williams: Do you think that happens mostly within a cause area that is already a focus for our funders? Because as Emily was saying, at Open Philanthropy, they’re partly using this to prioritize across areas. A lot of philanthropies come in and they know the area that they want to be in already. 

Do you have an example in mind that you could give around that? Is it really prioritizing across interventions for that cause, finding where there is the most high impact?

Jim Savage: No, I think it’s simply that you need to have in your head the knowledge that you can do a lot with this money. It forces you to be creative and thoughtful and think through what you’re trying to do more carefully. That might be a very unmeasurable impact of both Open Phil and GiveDirectly in the long run.

Heidi Williams: One thing about this No Lean Season example. There was one very high impact study, but it was also just a very intuitively comfortable idea for people: that there was this mismatch spatially between work opportunities because of the seasonality of labor in these countries. I think that really resonated with people as a very plausible case. The scaled experimentation did a real service by showing, “The thing that we find intuitive and that one study suggested actually might not be kind of right at scale.” 

Paul Niehaus: There’s this old trope in development: give a man a fish and you feed him for today, teach a man to fish and you feed him for a lifetime. To me the closest analog in terms of things we actually do to try to help people who are living in poverty, is to teach a man to fish. Active labor market interventions where we try to train people and help make them more employable and help them get jobs. That’s an area where the evidence has generally been really, really negative. We’ve tried those things a lot. I don’t think we’re very good at teaching people how to fish.

So that’s a great example of something that at a very loose abstract level seems intuitive: “Of course I want to feed somebody for a lifetime, not for one day, like that seems obvious, right?” But then when you actually get into the data, it turns out we’re just not that great at teaching people how to fish. We should think about whether we can get better at that or other things we could be doing. That’s always been a good example.

Heidi Williams: Emily, you talked about the case for thinking about more organizational or philanthropic investment experimentation as a methodology. Paul brought up one example, which is this idea around surrogates. To spell it out for people that aren’t very familiar with the details of experiments, oftentimes for drug development, the default is that we need to know whether this drug improves survival. 

There are some very specific cases where the regulator, say the Food and Drug Administration, will be willing to accept some substitute outcome that we observe much sooner than improvements in mortality. Then if that surrogate changes, we know that’s actually going to reliably predict that your mortality would be changing later. Those surrogate endpoints enable much shorter trials and a faster opportunity to learn about drug effectiveness than if we always needed to wait 20 years.

That structure has started to be interesting in the social sciences. A group of economists were interested if there were some equivalent of that for school test scores and wages. How do we expand the idea of surrogates beyond this very medical context to a broader set of frameworks? I do think surrogates themselves have really high potential, but there’s a more general interest also in how we invest in the statistical methodology of learning things more quickly. You mentioned one example of a novel way of doing clinical trials that you guys were looking into. Another example would be human challenge trials. Is there one particular one that you want to talk about more?

Jim Savage: An approach from the marketing and online experimentation world that I find kind of compelling is this multi-armed bandit approach. One of the things we get with surrogates is very rapid feedback of whether something is working. We really ultimately care about long-run impacts, but we learn more in the short-run. Now, I don’t really care about whether we learn what works, so much as I care about whether we are doing the thing that works best which might be different from you.

Paul Niehaus: You’re not asking how good is the best thing, but which is the best thing.

Jim Savage: Yeah, exactly. Multi-armed bandits: imagine you’ve got a row of poker or slot machines, and you know that one of them has better odds than the others. How do you go and discover that?

The best strategy is you go and start putting a quarter in all of them, pull a handle, and you keep on doing that until one of them pays out, and you can update your posterior of which is the higher, the one with the better likelihood of paying out based on that observation. It’s a finite sample. And you don’t just now sit down at that poker machine and put everything into it, you still explore, but you start to put more of your money in the machines that seem to be paying out more.

We can do something similar with organizations of what programs to scale up once we’ve got better surrogates. So by seeding many programs and slowly doubling down on those programs that seem to be gaining traction against near-term surrogates, we are hopefully getting the same objective of doing the right thing. Even if we never learn how good that right thing is relative to some counterfactual.

Emily Oehlsen: To add one example: a recent, promising example of this was the recovery trial in the UK during COVID. It was a multi-arm adaptive platform trial, where they were able to investigate many things at once. I think it was quite successful.

Heidi Williams: One thing that’s often struck me about GiveDirectly is there’s a lot of self-reinforcing good will that happens when not just one organization is doing experimentation and learning and having the commitment to that in isolation, but is growing up alongside other institutions that provide support. That say, “We value the work that we’re doing and we’re also learning from it,” or, “We see the social value that you’re creating in taking a more evidence-based approach, and we’re going to support you through funding or other meaningful ways of doing support.” 

But Paul, I’m curious if you could say a little bit about how that played out for GiveDirectly.

Paul Niehaus: We’re doing these podcasts now because there is this interesting moment where there’s a nascent ecosystem building effort underway to support a science-based approach to doing science, which is super exciting. 

That parallels what happened for us at GiveDirectly. We started GiveDirectly and decided we want to take this very evidence-based approach to what we do, at the same time that a lot of other people were creating parallel efforts to do philanthropy and global development in a more evidence-based way. So, GiveWell and Open Philanthropy were getting set up, and Founders Pledge, Google.org, and Jaquelline Fuller were taking this approach. 

Organizations like J-PAL and IPA were building out research infrastructure to make it easier for people to do the experimental trials in the countries that we were working. There were people that were trying to take a more evidence-based approach to thinking about where to work, like 80,000 Hours. That helps us to attract talent to what we’re doing, because people recognize these approaches are evidence based.

So the fact that all of those things were happening at the same time was super important for us and created an environment where we could say, “We have this idea that does sound crazy because you’ve always been told, ‘Don’t just give money to people living in poverty. That’s not going to help,’ but look, there are all these other people that are taking this evidence-based approach to the way they think about where to give. They’re supporting it and validating it.” That was enormously important for us. Also in terms of a morale level, it’s good to feel like you’re not alone in that.

So I also want to highlight because we’re also thinking about science and federal support for science, that there were important things happening in governments that were a part of that. 

So for example, one of the really important, early, influential evaluations of cash transfers was done in Mexico with the Progressive program. That was done because there was a set-aside in the Mexican government’s budget for program evaluation. That ended up being a very influential evaluation that changed a lot of people’s thinking. That was critical. We grew up as part of this ecosystem that was trying to move attention away from places where the founder had great oratorical ability and towards things where there’s good evidence to back it up.

Jim Savage: And Heidi, you’ve been really a part of this. I talk about the Heidi vortex: you’re great at finding all these people in different organizations who are doing this and bringing them together, so thanks.

Heidi Williams: With government employees especially, I feel like oftentimes there are people within an organization that themselves don’t have a public-facing founder. The employee is on staff as part of a huge organization, but they themselves have gone out on a limb to do something that was not the norm. They really want to figure out whether the program that they were doing was working. 

We just had a conference in March where we brought a lot of those government employees together. Daniel Handel came from USAID and was the key person at USAID who really made cash transfer experiments happen. Paul was involved with that and Google.org and others were supporting it.

Another great example is Peter-Anthony Pappas who was out at the Patent Office. He wasn’t convinced by someone with an economics PhD that he should do an experiment. He was tasked with designing a program that was meant to accomplish a goal, and he thought, “Well, how would I know whether it was working?” He ended up designing a randomized experiment without even knowing what a randomized experiment looked like. 

You can find these people who are bringing research into the process of trying to improve their organization’s effectiveness, not, again, because some PhD told them that they should, but with a real intentionality of wanting the work they are doing to be more effective.

The more that we can showcase examples of that, the more it brings a very different meaning to the value of doing research on research. This in turn makes it easier for organizations to justify the additional bandwidth, like Jim was saying, required to start up a program. 

People are doing a ton of work, and this is an additional thing that you’re asking them to do. You might be able to bring in talent to help them. But at the end of the day, people have bandwidth capacities and there’s only so much they can do. The more we highlight good examples of where this has provided value to organizations, changed their decision-making, and really helped them accomplish their goals, the more momentum this work will have.

Jim Savage: I should say for listeners, if you know anyone who is running experiments in large organizations on how to do science funding or funding better, you should have that person send Heidi an email.

Heidi Williams: We’re trying to do a lot of matchmaking for organizations that have very particular constraints on what they can and can’t do, or that need more people. We’ll do what we can to try to get you matched with somebody who can help you on that. That’s a natural point to wrap up, so I’ll just say thanks.

Emily Oehlsen: Thanks, Heidi.

Paul Niehaus: That was great.

Jim Savage: Thank you.

Caleb Watney: Thank you for tuning in to this episode of the Metascience 101 podcast series. Next episode we’ll talk about whether scientific progress has downsides, and if so, how we can accelerate science safely.

Episode Six: “Safety and Science”

Caleb Watney: Welcome. This is the Metascience 101 podcast series. In this episode, Dylan Matthews, a writer for Vox, sits down with economics professor Tyler Cowen, as well as with Matt Clancy and Jacob Trefethen, both of whom work on the science portfolio at Open Philanthropy. They will discuss whether there are tensions between accelerating science and safety. 

Together, they dig through case studies where society has faced this tradeoff between progress in science and safety before, from automobiles to nuclear weapons, and the strategies that we can use to accelerate science safely. 

Dylan Matthews: We’re here talking about science and safety, and the general view at the Institute for Progress and I think the general view of most economists I talk to is incredibly pro-science. You can find some models where the growth of science is the most important thing for the overall wealth and wellbeing of the world. 

Matt, maybe you could walk us through that since it gives us a good baseline. Then we can talk about points where that model might break down and the real dangers and risks that appear.

Matt Clancy: Sure. Economists assume that material prosperity is ultimately driven across long stretches of history and across countries by technology, and technology has its roots in innovation and R&D, which has a lot of its roots in fundamental science. There are long time lags. There could be decades between when discoveries are made and when they get spun out into inventions, but all the gains in income and health are ultimately attributed back to some form of technology, even if you call it social technology. 

Even things that are fuzzier, for instance, your fulfillment or your meaning in life, my own view is that those are really correlated with material prosperity. The two are not synonymous, but it’s a good way to enable human flourishing more broadly.

Dylan Matthews: Got it. Over the history we’re thinking about — and this is really something that starts with the scientific revolution, the Industrial Revolution and some of the changes that began in Holland and England in the 17th, 18th centuries — that was not a period of growth where everyone wins in every situation. There were serious costs, but there’s a broad view we’re taking as a starting point for this conversation where those are acceptable costs, or at least weighed against significant benefits. 

Tyler, how have you conceptualized that balance? It’s not a Pareto improvement, not everyone’s better off – how do you think about risks? For example, ordinary risks, environmental degradation, some public health challenges that come with economic growth to date.

Tyler Cowen: I see the longer run risks of economic growth as primarily centered around warfare. There is lots of literature on the Industrial Revolution. People were displaced. Some parts of the country did worse. Those are a bit overstated.

But the more productive power you have, you can quite easily – and almost always do – have more destructive power. The next time there’s a major war, which could be many decades later, more people will be killed, there’ll be higher risks, more political disorder. That’s the other end of the balance sheet. Now, you always hope that the next time we go through this we’ll do a better job. We all hope that, but I don’t know.

Dylan Matthews: The counterargument to that worry would be that the growth in technology and science is complemented by a set of what Deirdre McCloskey would call the bourgeois virtues. That this technological growth was enabled by growth in liberalism, mutual toleration, and things that you would expect to reduce warfare risk. I take it you’re a little skeptical or at least unconvinced on that.

Tyler Cowen: Well, we had two world wars, and I really don’t blame liberalism for those. I would blame the Nazis, Stalin, and other evil forces.

Dylan Matthews: Hot take. 

Tyler Cowen: But the point remains that more productive powers end up in the service of various parties. Now we’ve made what you could call the nuclear gambit. Well, we’re going to make sure leaders suffer from big wars. We’ve had nuclear weapons, American hegemony. That’s worked out relatively quite well so far. But of course, there’s the risk that if something bad did go wrong, it could be unimaginably bad in a way that even the earlier world wars were not.

Dylan Matthews: Let’s think about some concrete ways the world could go unimaginably bad. 

Jacob, you fund a lot of science. You move $100 million a year, roughly, in scientific funding. What are the ways your scientific funding can go wrong? What are the ways you think the kinds of work you fund could make things go boom?

Jacob Trefethen: I think that everything we fund could go wrong. We fund syphilis vaccine development, and if something goes wrong with a particular vaccine candidate, that could harm someone in a phase I trial. The issue that we often think about is trying to have some sense of when the harms could be very large and stand out. The nuclear gambit that Tyler mentioned is an interesting example, where the harm is so large, we haven’t observed it. We don’t have a base rate to go off, whereas we have quite a few base rates in phase I trials to go off of. That can make it tricky.

The orientation that we often take to our science funding is that historically most biomedical science funding – maybe science funding as a whole – has been very beneficial for people on net. That’s a baseline we should deviate from in particular cases. You then have to tell particular stories about particular cases with really bad potential harms. For us, that often comes up as bioweapons as potential uses of biological technologies or potential applications of transformative AI that could be very new and hard to pick up in the data so far.

Dylan Matthews: Got it. So why is now a moment where these kinds of worries are emerging? We’ve had a germ theory of disease for some time. We’ve had vaccines since the 18th century. What is it about the current environment that makes, maybe let’s start with biorisk, particularly fraught at the current moment?

Jacob Trefethen: There are a lot of the worst bioweapons that you could design, and there are only some number of people in the world who’d be able to design them or put them together. Potentially some state bioweapons programs could do that, and maybe some grad students could do that if they had the right training. 

What’s changing now is the breadth of potentially harmful technologies that are available. At Open Philanthropy, we think about the intersection of AI with bioweapons, because all of the wonderful progress in language models and other parts of the AI ecosystem will make certain actions easier to take for a broader range of people.

Dylan Matthews: Got it. Matt?

Matt Clancy: One thing that has worked well for us as a species for a long time is that frontier science is pretty hard to do. And it’s getting harder. You need bigger teams of more specialists, which means deploying frontier science for nefarious ends requires organizing a group of people to engage in a kind of conspiracy, which is hard to pull off.

People do pull it off — military research does happen. Traditionally, something that’s helped us out is that these things get developed, but then it takes a long time before they get developed into a technology that a normal person, without advanced training, working in a team, can use. By the time it gets there, we understand the risks, and maybe we’ve even developed new and better technologies for tracking and monitoring stuff like that. Wastewater treatment monitoring of diseases is one random example.

Tyler Cowen: But the puzzle is why we don’t have more terror attacks than we do, right? You could imagine people dumping basic poisons into the reservoir or showing up at suburban shopping malls with submachine guns, but it really doesn’t happen much. I’m not sure what the binding constraint is, but since I don’t think it’s science, that’s one factor that makes me more optimistic than many other people in this area.

Dylan Matthews: I’m curious what people’s theories are, since I often think of things that seem like they would have a lot of potential for terrorist attacks. I don’t Google them because after Edward Snowden, that doesn’t seem safe. 

I live in DC, and I keep seeing large groups of very powerful people. I ask myself, “Why does everyone feel so safe? Why, given the current state of things, do we not see much more of this?” Tyler, you said you didn’t know what the binding constraint was. Jacob, do you have a theory about what the binding constraint is?

Jacob Trefethen: I don’t think I have a theory that explains the basis.

Tyler Cowen: Management would be mine. For instance, it’d be weird if the greatest risk of GPT models was that they helped terrorists have better management, just giving them basic management tips like those you would get out of a very cheap best-selling management book. That’s my best guess.

Dylan Matthews: It seems like we’re getting technologies that are radically distributed in ways that have pretty serious misuse risks. As Jacob was describing, we might be at a stage where a talented 15-year-old can design a more-dangerous-than-nature virus and release it. We might be entering a stage with large language models where you might not need that much knowledge yourself. You can just ask the large language model to design something for you, or you can ask it the best way to do a terrorist attack against a given entity. You can ask it how to bring down an electrical grid.

I’m curious how all of you think about radically democratized or distributed risks like those. How is tackling those risks different from some of the other risks that governments are used to tackling from science?

Tyler Cowen: I think of it in at least two ways. The first is — at the risk of sounding like too much of an economist — that the best predictor we have is mostly market prices. Market prices are not forecasting some super increased risk. You look at VIX and right now it’s low. If it went up, it might be because of banking crises, not because of the end of the world. 

The second is just the nature of robustness of arguments. There is a whole set of arguments, very well tested by history, that the United States Constitution has held up really quite well, much better than people ever would have expected, even with the Civil War.

When I hear the very abstract arguments for doom, I don’t think we should dismiss them, but I would like our thoughts to race to this point: actually trying to fix those risks by staying within the bounds of the U.S. Constitution is, in fact, the very best thing we can do, and we ought to pledge that. 

That’s what I find missing in the rationalist treatment of the topic, with talk of abridging First Amendment rights or protections against search and seizure. We need to keep the Constitution in mind to stay grounded throughout this whole discussion.

Matt Clancy: One additional indicator that I look at is to ask if the size of teams doing frontier research is shrinking over time or continuing to grow. We haven’t seen that yet, but we also haven’t had these large language models trained on science yet. But that’s something that I feel will be a leading indicator — if it’s getting easier to do new, powerful science, by small groups. 

Jacob Trefethen: We’re not yet at a point where small groups can do all sorts of leading science. If you are part of a frontier group now, you should treat that with some ethic of responsibility, and you should figure out what projects you want to work on that you think will not lead to a world where it’s possible for a 15-year-old to do something really damaging. 

That applies to funders too. It’s something we think about a lot. We do a lot of red teaming of different things we fund before we fund them. There’s a lot of work you can do upfront. There are capital-intensive projects that are going to create the future, so you don’t have to do all of them. You can do some more than others.

Matt Clancy: There is a precedent to how we regulate dangerous technologies, for example, who has access to high-grade military weapons or so on. In World War II, the U.S. Patent Office had this compulsory secrecy program that silenced your ability to get patents on things that were perceived to put national security at risk. We have liability insurance, and that also affects what people choose to work on and how they choose to create inventions in more responsible or less responsible ways. We do have a lot of tools, and I agree with Tyler that we should resort to them before we resort to some kind of crazy authoritarian plan.

Dylan Matthews: Got it. Let’s talk about AI specifically. AI seems an unusual thing in that it’s a general-purpose technology. There was a nice paper called “GPTs are GPTs” making the point that this is not a specialized thing. It’s not nuclear weapons. It’s not something where you need large industrial capacity to lay it out. But you do need large industrial capacity, it seems still, to build it. What does that imply for the ability to build safe systems out of that?

Tyler Cowen: Again, I view it pretty generally. The world is in for major changes, no matter what your estimate of the ratio of positive to negative. We have a lot of rigid institutions, a lot of inertia and interest groups cemented in. The combination of major changes, with systems not ready for major changes, that haven’t seen comparable major changes for a long time, maybe not since the end of World War II, is going to cause significant transition problems – no matter what kind of safety measures we take. 

Say it doesn’t kill us all, say there’s no terrible pathogen, then the biggest impact will be on our understanding of ourselves and how that in the longer run percolates through all our institutions. I think it’s going to be both hairy and messy. We don’t really have a choice at this point, having opted for decentralized societies, but it’s going to be wild.

Dylan Matthews: Do we have a precedent for that? The 20th century saw a lot of pretty radical revisions in how people think of themselves, how they think of themselves in relation to God, and how they think of themselves in relation to their nation, and to international causes and ideologies. Yet, those changes did not seem to make the world radically less safe in aggregate.

Tyler Cowen: Well, the first half of the 20th century was a wild time, right?

Dylan Matthews: Right.

Tyler Cowen: Since then the changes have been modest. However, the printing press, which was a much slower example, changed everything. You could argue the discovery of the new world, for example, and maybe electricity. There are parallels. You could say that in all the cases, the benefits are much higher than the costs. But the costs, just in gross absolute terms, have been pretty high.

Jacob Trefethen: I agree that the world is in for major changes, and we aren’t going to be able to predict a lot of that. One thing I get frustrated with is the sense of inevitablism that then can pervade from that observation to – it’s not worth thinking along the way and picking different parts of the tech tree. I’m not attributing that to you because you may want to pick different parts of the tech tree, but people will always be going after benefits, health benefits, going after making their own life better. There are many ways to achieve those benefits, and you don’t have to explore every fork. Not everything is inevitable.

Within AI, I think the part of the tech tree we’re on now is a lot better than it could’ve been with some of the large language models. There’s human feedback involved in the ways that those are performing well right now. You could imagine a worse way that they could have been built than they are. I’m sure people are thinking carefully about how to build even better models going forward.

In vaccinology, it comes up for us a lot. For instance, we want to achieve a benefit of a TB vaccine that works in adults. TB still kills 1.5 million people every year, and there’s no vaccine known to work in adults. Well, should we make a transmissible vaccine, a vaccine that can be passed from person to person? Then you don’t have to vaccinate everyone, and it just happens naturally. We don’t think so. That’s the kind of risk that we would assess as part of the decision about what platform to invest in to achieve a benefit that everyone can agree is a great benefit.

Dylan Matthews: Is there a difference in how you think about this at Open Philanthropy by virtue of being a nonprofit, a quasi-foundation entity, since there are some risks that might come up because there are unpriced externalities that emerge with new technologies? 

There are costs to putting lead in gasoline, and they don’t accrue to the people putting lead in gasoline. You as a nonprofit can specialize in finding those and fixing those because the system won’t fix them naturally. Is that the kind of consideration that comes up for you in terms of trying to specialize?

Tyler Cowen: Do you think you’ll be a decisive actor in that kind of tuberculosis vaccine never happening? I don’t know anything about it, but that strikes me as unlikely. Now, if you just want to say, “Well, it’s our institution, we don’t want to be a part of it,” I’m all for that. But I would doubt if you’re going to be decisive.

Matt Clancy: One lesson from technology comes from competing models. Whoever gets the head start often becomes the platform for further development to get ahead. If we can get a TB vaccine – which I also don’t know anything about, Jacob is the expert – that doesn’t use this modality to transmit, that becomes the benchmark that alternatives get tested against. It makes it harder for other people to do clinical trials on other untested versions because I can just get the approved vaccine. 

This dynamic starts to lock in. Another more boring example of dangerous and safe technology is fossil fuels, which emit carbon dioxide, and renewable energy. Everybody’s hoping that we get to the point where renewable energy is so efficient that no one even thinks about using fossil fuels. Why would you use the worst version, the one that smells bad and isn’t as cheap as the solar panels?

That’s one of the powers of technology. If you can pick winners, which is very hard to do, then you can potentially change the course of subsequent development.

Jacob Trefethen: That’s right. Regarding the TB example, a company’s trying to make mRNA TB vaccines. And all the investment that went into the mRNA platform maybe will now pay off. That’d be wonderful. Personally, I am glad that all that effort went into that platform rather than a platform that had potentially more risks. There are new platforms being discussed right now. Should you enable self-amplifying RNA that you put in a smaller dose, and you have maybe less negative response from that, or is that too risky, and you can’t control the amount of RNA that gets produced? That’s a question that should happen now rather than after billions of dollars of investment when you’re rolling something out.

When it comes to science, the sense of inevitablism is particularly inappropriate and often gets shipped in. Maybe I’m reading the tea leaves too much, but it seems shipped in from venture capital investing or investing as a whole, where there’s more competition for deals. There’s a sense that I have to get into this deal because it’s going to happen anyway. So I don’t have to hold myself particularly morally responsible. I can just think of the counterfactual as more inevitable.

Tyler Cowen: But maybe the inevitablism is correct. Say the printing press has been invented. You’re Gutenberg. Someone comes in, has a meeting with you. “What are the first 10 books we publish? These are going to be really important because everyone will want to read them, and they’ll be circulated.” I’m not at all against people having that discussion. Stripe Press has it all the time. 

But at the end of the day, did it matter that much what were the first 10 books they published? It seems there was something inevitable about the printing press that would produce a super wide variety of material. There wasn’t anybody’s decision that was decisive in that very much at all.

Dylan Matthews: That seems like an odd example in that not long after the printing press, we had the Reformation. The fact is the first thing printed was the Bible, and then you had access to the Bible and religious knowledge that was somewhat less mediated by religious authorities.

Tyler Cowen: But the Bible would’ve been printed anyway is my point. Someone might have said, “Oh, we can’t print the Bible, there’ll be a reformation. There’ll be religious wars.” You just say, “Well, look, that’s inevitable.”

Dylan Matthews: We’ll print the Koran instead. 

Tyler Cowen: Don’t dismiss inevitablism so quickly. The name of it makes it sound false, just inevitable, like agency denied. But a lot of things are just super likely once the technology’s been invented. Electrocutions by mistake, for example. Of course we want to minimize the number, but once you have electricity, there are going to be some.

Matt Clancy: I wrote a piece called “Are Technologies Inevitable?” I had a fuzzy centrist view, of course, which is that the big ones are in some sense inevitable. We were probably always going to figure out electricity. There are certain things based on how nature works that you’re probably going to discover and then exploit. Then details are very highly contingent. 

This TB example is something where it could be a really contingent result. In one universe, it could be very different. Would we eventually discover vaccines? Probably in all universes that have science.

Dylan Matthews: Is there a particularly vivid detail that could’ve gone one way or another that’s motivating to you? The world could’ve been this way, but it was this other way, and it didn’t have to be?

Matt Clancy: When there’s a global crisis, the technologies that are at hand – the mRNA vaccine or something – they get pulled to the frontline and deployed. Then we develop massive expertise built around them, and that sets a new paradigm. There were alternative platforms out there, such as the Astrazeneca vaccine.

Matt Clancy: If there were others, if there had not been the COVID-19 pandemic at that time, maybe they would’ve all evolved in parallel at different rates. Maybe mRNA would not have been the inevitable winner. Maybe there’s something that’s a few years back, and if it had had its time in 10 years, it would have been ready for prime time, and it would’ve been even better. 

It’s hard to judge the counterfactual because we can’t see the technologies that weren’t invented, but these crises show a really clear example of something that was at hand and ready to go, then got supercharged and locked in.

Dylan Matthews: How good are our feedback loops for safety? We had a number of examples of technologies where you’d build automobiles, you build highways, and they take off. Ralph Nader points out that they’re dangerous in various ways. You correct them. We get the best of both worlds, which is cars with all their benefits and safety. 

That seems to be the way a lot of technologies work. Where are some problems you guys foresee for that? Are there places where the feedback loop isn’t tight enough? Where it’s too imprecise?

Tyler Cowen: 40,000 Americans – is that the number? – die every year in cars or because of cars. I’m all for what we did, but it’s not that good, right? It’s clearly a huge positive, but I don’t think we can say that we’ve solved the risk problem with automobiles.

Dylan Matthews: Of course. We have not solved it by any means. 

Matt Clancy: There’s two minds about this. When you talk about AI alignment or something, I’ve always believed that there’s probably not a lot of marginal productivity in thinking about this before the technology exists and we don’t even know what form it’s going to take. Before we knew about neural nets and large language models and deep learning, we didn’t know that this would be the paradigm. It’s hard for me to think that would’ve been super productive. As with automobiles, you have to iteratively experiment and correct mistakes as you go, because you can’t anticipate what will happen in advance.

But the big danger is these existential risks. You don’t have the luxury of trying out an existential risk. You have to get it right, and it’s really hard to get it right. That makes it a thorny problem.

Jacob Trefethen: The way it works in different parts of the economy and in different countries can be fairly different. The part of the economy I’m very familiar with is R&D for medical devices, drugs, and diagnostics. In some of those cases, we will fund grants for safety work, before it’s legal to sell a product, where the safety work is very likely to reveal nothing particularly scientifically novel. We funded animal toxicity or toxicology for the drug oxfendazole for deworming. That drug has been used in many different animals and veterinary purposes for decades, and so it’s probably not toxic. But the FDA wants assurance there.

Parts of the economy, including science, are potentially being throttled too much. There are just certain properties of particular types of science that you can identify ahead of time as heuristics and where you might want to go with a bit more care. For instance, if something is spreading or self-replicating or if something evades the immune system. You can say things ahead of time that mean you might want to slow down.

Tyler Cowen: But keep in mind, when it comes to AI, what care often means is taking good care that America is first and not nastier countries. If we’re first and we have a certain amount of hegemony in the area, we can then enforce a better international agreement, as is the case with nuclear weapons. So taking care can mean hurrying, right? This has been the case in the past with many different weapons systems, and we’ve taken pretty good care to hurry. The world has stayed pretty peaceful. The U.S. as hegemon has worked relatively well. I worry that the word care is slanting the debate towards some kind of pause when it actually implies the opposite.

Dylan Matthews: A lot of this depends on the empirics, right? I speak to some people on artificial intelligence who think that China is just unbelievably far behind, and open source models are just completely nonviable. In that world a pause doesn’t seem particularly costly.

Tyler Cowen: But you have to stay ahead of China forever, right, unless you think they’re going to get nice and democratic soon. It’s all over history. The Soviets get to the hydrogen bomb first, which shocked us. We had no idea. There’s so much espionage. China has a lot of resources. The fact that they put out a press release, “Oh, we’re not going to have consumer LLMs.” 

I saw so many AI people, even EA people, rationalists, jump on that. They just point to it. People who knew nothing about China would say, “Ah, the Chinese can’t do anything, so we’ve got to pause, we’ve got to shut down.” This total vacuum of knowledge and analysis. Sam Altman has criticized this as well. It stunned me how quickly people drew conclusions from that. Maybe it just means China will do military AI and not a consumer product.

Dylan Matthews: Does that imply similar things for bio? Does that imply there should be a speed-up of certain gene editing technologies on the theory that someone else will? This arms race dynamic seems like it proves a lot, and maybe more than you intended to.

Tyler Cowen: Well, you want America to be first in science in just about every area. We haven’t quite achieved that, but we’ve come pretty close. We have a lot of experience with that. The basic risk in a lot of global settings is just warfare, right? That’s historically the risk that keeps on recurring, and that’s what we need to be most focused on.

Jacob Trefethen: Your point about care can, I agree, go multiple ways. I think it could once again loop back around. Does it make you want the U.S. government to require more in terms of info security from leading labs?

Tyler Cowen: Absolutely.

Jacob Trefethen: Let’s say that slowed down progress in the U.S., would you be in favor?

Tyler Cowen: I’ve even told my own government that, absolutely.

Emily Oehlsen: Tyler, can you imagine a scenario in which we had nuclear weapons, as we’ve had them over the last half century, and we had the geopolitical threat that they posed, but in addition to that, there was another threat in which they might, of their own accord, self-implode? One might self-implode, and it would set off a chain reaction in which all of them exploded. 

I think that’s the way that a lot of people conceptualize AI, that there’s not just the geopolitical threat, but there’s also an internal threat to the system itself. When you were discussing AI, it seemed you were mostly focusing on the geopolitical arena, but I’m curious how you think about safety when it has those multiple dimensions?

Tyler Cowen: I don’t think your example is so different from the status quo. A nuclear accident could happen. It could lead to a lot of other nuclear bombs going off. You’d like to limit the number of countries that have anything really dangerous. I’m not sure AI is that dangerous, but if need be, limit it. But when you look at how things get limited, I think you want a very small number of leader nations, ideally one. It’s because America is militarily strong that we’ve enforced some degree of nuclear proliferation. Keep in mind, it’s not just the race against China. Our allies want us to develop some form of AI, and if we do not, they will.

You’re Singapore, you’re Israel, you may or may not think America protecting you is enough. But if America doesn’t do it, I strongly expect you’ll have a lot more nations trying to do it because they trust us more than they trust their enemies.hat example all the more militates in favor of America moving first and trying to establish some kind of decisive lead. 

England is trying to do it. I’m fine with that. It’s not that I fear they’re going to conquer us, but America should not do it so the English can set the world’s safety standards? That doesn’t seem like a huge win to me.

Jacob Trefethen: Does anything feel perverse about that reasoning style to you? How big do you think the risk is that makes it worth it to be first?

Tyler Cowen: It’s path-dependent. It’s been a lot of human history. You’re always rooting for the better nations to stay ahead of the less beneficent nations. There’s no guarantee you win. We’ve just been on that track for a long time. You can’t just step off the rails and stop playing. I’m hopeful we’ll do it, but I very much see it as a big challenge, even without the very most dangerous scenarios for AI. Just the risk of flat-out normal conflict is always a bit higher than we realize.

Dylan Matthews: How much would your view of this change if you changed your estimates of how beneficial U.S. hegemony has been historically? For instance, if you went from thinking that it’s reduced the incidence of conflict meaningfully from 80% to 50%?

Tyler Cowen: Oh, of course, it could flip. If we were the bad guys or if we were just so incompetent at being the good guys that we made everything worse, then you would turn it back over the Brits. Singapore, you go first. We’re America. We have nuclear weapons. We’re going to stop everyone but Singapore. You could try that. It’s not what I think is most plausible, but sure, as potential scenarios, yes.

Dylan Matthews: Let’s talk a little bit about information security because this sometimes gets shunted aside as the boring stepchild of some of these first-order debates on safety. But locking down both in bio and AI. Securing relevant data and parameters seems really important. 

Matt or Jacob, how do you guys at Open Phil think about this and how do you make sure people prioritize this?

Matt Clancy: When you’re talking about biorisk and biological risks and biological catastrophes, there’s a deep trade off about how much you disclose about what you’re worried about versus keeping that internal. It’s just this frustrating trade off. 

It’s hard to solve problems and identify solutions if you don’t talk openly about what you’re afraid of, but there’s also a very real risk that you’re advertising things that other people might not have thought about as things to do. If you’re worried that there are not necessarily great solutions out there, then the net benefit of being open can quickly fall to zero. It’s tricky and I don’t know. On the biosecurity side, it’s a very thorny problem again.

Dylan Matthews: We’ve had CRISPR for about 15 years now, in various forms and it’s obviously gotten better. It’s surprising to me that we haven’t had, with the possible exception for the infant in China that was genetically edited, any major scandals or catastrophes to come out of it. We’ve had this immensely powerful biotechnology, and maybe this is a famous last words thing — Norman Angell writing a book about how, in 1909, Europe wasn’t going to have a major war ever again — but it is kind of striking to me that we haven’t had big close calls yet. Do you guys have a theory of why that is? 

Matt Clancy: I don’t know it’s specifically with CRISPR, but in general, you still have these same dynamics that it’s hard to use. It’s not necessarily easy to genetically modify. Scientists operating in labs have one set of incentives, but private firms that are looking to do this have to think about the reputational effect of how they use this thing. 

I remember I went to a seminar once about genetically modified crops and how CRISPR was going to be integrated. The companies had essentially learned that if they’re too cavalier with how they’re going to use this technology, it has huge consumer blowback. They had thought very much about things. “We’re not going to use the technology to engineer tobacco because we just don’t want to be associated with anything bad.” They were going to have all these local partnerships with local seed breeders. 

Again, it just shows that these large corporations are operating in the open, and they have to think about how their decision on how to use this technology will be perceived by the wider world. Those are the people that I think are currently able to use CRISPR, so maybe that’s an explanation. But again, I’m not an expert on CRISPR.

Dylan Matthews: This is the safety discussion in a series of podcasts where we’ve been largely taking, not a skeptical view of safety, but discussing “safety is abused” perspective. 

There’s a ratchet where you regulate things to care about safety, and you get to a point where you can’t build nuclear power plants anymore. People worry about safety to an extent that even perfectly safe things, like vaccines, don’t seem acceptable to them, or things like golden rice don’t seem acceptable to them. 

How do you form a coherent attitude about this that’s neither blasé about risks of new technologies nor knee-jerk defensive in a way that impedes societal progress?

Jacob Trefethen: For us, it starts tricky often and then ends up getting easy, where we want to figure out which direction we should be pushing on a given problem. We end up on different sides of different problems. Once we want to push for development of something, we just try to push as hard and as quickly as we can often. That’s from the seat of a funder. Funders can’t actually do much operationally. We’re just a part of the ecosystem there. 

But there are so many obvious harms occurring in the world that could be prevented through better medical technology, through better seatbelts, all sorts of things, that once you can get comfortable and have done your due diligence, often you should go full steam ahead.

Matt Clancy: But we’re also in the fortunate position of having that secretive biosecurity team that we can run things by. If you have to judge these things on a case-by-case basis, if you can’t say there’s some general abstract principle, then you kind of need this domain-specific knowledge. It works in our org because I guess we’re this high-trust organization.

Jacob Trefethen: We definitely have the benefit of being able to have regular meetings and poll the biosecurity experts before we get involved in a new area. 

We also have other parts of our process that we designed to not give a bad experience to grantees or try to avoid that, where we have a two-stage process for most of our grants. Initially, a program officer will write up if they’re interested in investigating a grant further, and we’ll check in about that and try to catch any potential safety worries there, so that you don’t go through a whole process with a grantee who then at the end of the day doesn’t get money for a safety concern.

Tyler Cowen: One lesson is that if we can avoid polarizing scientific issues, you then have access to the right nudges that can make the world much, much safer at low cost: getting more people vaccinated, making Europe less fearful of GMOs. There are many examples. China has its own problem with vaccines. They didn’t want mRNA, for whatever reasons. Older Chinese people don’t trust Western medicine, don’t trust vaccines, and this led to their zero COVID policy for so long. That was a massive cost, and still a lot of them are not vaccinated and presumably dying or getting very sick from COVID.

Dylan Matthews: What is the best regulated area of science and technology right now? People love to complain about the FDA, love to complain about the Nuclear Regulatory Commission. There are things that seem completely unregulated right now, large language models. Has anyone found the sweet spot?

Tyler Cowen: Every area’s different, but, say, food safety seems to work fairly well. I don’t think we should regulate other things like food safety because with food safety, you just want uniformity and predictability, so you’re not stifling innovation that much. A restaurant doesn’t need a new dish approved by the local authorities before putting it on the menu. But if you go into a restaurant in the U.S., you can be reasonably sure you won’t just get sick and die.

Jacob Trefethen: That’s a good example. Plus one.

Dylan Matthews: Plus one to that. Do you have any favorites, Matt?

Matt Clancy: I’m just running through the list in my mind and saying: “Well, no, not really.”; “No, not really.”; “That’s not great.”; “That’s not great.”; “Too excessive or not, not enough.” Food regulation is a good one, and that probably is true, as a metapoint, that the ones that I’m not noticing are probably the ones that are working the best. The ones that people are not writing articles about saying why we should reform this thing for the better.

Tyler Cowen: I assume this building is super safe. I’m not saying it’s because of regulation, but the private decisions are embedded within some broader structure that’s led to a lot of safety.

Matt Clancy: Even there, we’ve got at IFP our construction senior fellow, Brian Potter, who’s writing all about how TFP in construction is not going as fast as it could, possibly because there’s too much regulation. It’s hard for me to come up with a good example.

Caleb Watney: Fire sprinkler systems seem to be a risk that we’ve basically eliminated via technology.

Tyler Cowen: And fires are way down for whatever reasons, so someone has been making good decisions.

Dylan Matthews: Occupational safety, maybe. I’m not saying I agree with every decision OSHA ever made or that they haven’t fallen down on some parts of the job, but injuries at work in the United States seem way down from where they used to be.

Tyler Cowen: But that rate does not accelerate with the creation of OSHA, it’s worth noting.

Dylan Matthews: I’m not making a causal claim about OSHA, but we seem to be in a pretty good place.

Heidi Williams: How about lead exposure policies in the U.S.?

Dylan Matthews: Lead exposure might be under-regulated at the moment. Our regulatory agencies don’t do well with legacy setups, and so they’re not well prepared to do the funding and work of replacing old lead water mains or soil remediation or things. But it’s hard to get leaded paint in stores now, that’s for sure.

Jacob Trefethen: Depends what country you’re in.

Dylan Matthews: Yes, it does depend what country you’re in.

Matt Clancy: I’ve got one more idea, which is operating behind the scenes. I’ve always thought BARDA is doing an okay job of doing stuff that is not necessarily very public. 

They’re stockpiling medical supplies in the event of nuclear attacks or diseases, and putting these big milestone payments for the development of new antibiotics.

Jacob Trefethen: That’s a great example because we’ve been talking mostly about safety in the context of ways science can go wrong, but science is a contributor to the safety of society in lots of obvious senses. You could target more resources as a government. 

I think BARDA’s a great example. I’ve got the JYNNEOS vaccine coursing through my veins, and that’s thanks to BARDA for funding Jynneos, before the monkeypox outbreak happened, for smallpox. It’s thanks to the FDA approving the JYNNEOS vaccine before the monkeypox outbreak happened.

Matt Clancy: That’s also related to the earlier question about how much to disclose. Every once in a while I might be worried about something, but maybe BARDA is working on it right now. I just don’t know because they don’t want to let people know that they’re on the ball on that.

Tyler Cowen: A key point here is that it’s much harder to regulate very new things well. You see this with crypto. There are some people who hate crypto; it’s just a fraud. If they’re right, crypto can just go away, but they could easily be wrong. Maybe crypto is how the AIs will trade with each other. Over time, you want modular regulation of crypto, whatever particular thing crypto is used for. If we use it for remittances, regulate it as you regulate remittances. Probably that would work fine. But while it’s still evolving into even what the core uses are, it’s very hard to see regulation working well then. You just want a minimum of protections against gross abuses and see what happens, then regulate things in particular areas.

Caleb Watney: We’ve been talking somewhat about path dependence in technology and to what extent you can have one scientific breakthrough that increases risk, and sometimes you can have one that decreases some other previous risk. People talk about the concept of differential technology development, where you can try to be strategic and anticipate safety-increasing technologies and accelerate them so that you get them before other kinds of technologies. That, of course, is in some ways dependent on your ability to predict or anticipate what are the attributes of a technology or scientific area that make it more or less safe.

Do you think that is reasonable, and should the United States be trying to do more strategic differential technology development?

Matt Clancy: We do it extensively, on some domains. The Department of Energy’s ARPA-E is a differential tech development. Or to use the economics of innovation language, it’s trying to influence the direction of technological change. We’re trying to basically jumpstart the green revolution, renewable energy, and so forth. Plans for carbon taxes are also a de facto attempt to steer further innovation away from certain kinds of innovation.

There’s a spectrum. On the technology side, it’s easier to predict the answer to: how dangerous or how beneficial is this technology? What are the unanticipated consequences? In innovation, that’s always a big challenge, but it’s a smaller challenge in technology than in the area of science. 

When you’re talking about fundamental science, it’s not that you have to be totally agnostic. Funding Egyptology is probably not dangerous unless we get a mummy’s curse. But funding gain-of-function research is obviously much more controversial. There, it’s a lot harder to know what you’re going to get, so that’s my big picture thoughts on that.

Tyler Cowen: I’m glad we’re spending more on asteroid protection now.

Dylan Matthews: What would make you change your mind on that?

Tyler Cowen: If we learned there weren’t any asteroids out there, or that they would come much more rarely than we now think.

Matt Clancy: The thing about asteroid protection is that a monitoring system is good if we can see them far away, but it is one of these things where if you develop the technology to move an asteroid cheaply, then you can move the asteroid into the planet too and away from it. On the whole, I’d rather have it than not have it.

Dylan Matthews: Are there other areas where scientific potential to increase safety is underrated? So asteroid detection seems like one place. Mega-volcano detection might be one place. Presumably, there are areas where it’s not merely natural disasters that you can protect against through differential development.

Tyler Cowen: By far, the procedures for launching nuclear weapons, which are not entirely open and common knowledge, to get those right. What exactly right means, you can debate, but we don’t seem to put a lot of effort into that. Those are fairly old systems. Again, maybe you can’t have a public debate, but still, I would want to make sure we’re really doing the best we can there.

Matt Clancy: The other area where there’s been a lot of thought on this is in these biosecurity areas. Far-UVC light, if you could develop that technology and have it embedded throughout the economy, it could make certain kinds of diseases a lot less prevalent and a lot harder to attack a lot of people with those diseases. 

Much better, more comfortable, fashionable PPE could be good for protecting us against future pandemics. Wastewater and novel pathogen detection stuff. Those are the ideas that I hear out there. Any others?

Jacob Trefethen: Those are all great. Also, just having an attempt to make a vaccine for the next pandemic viruses would be great. There’s lots of energy behind that, but not enough. Good work being done, but we’re not there yet on a lot of the obvious societal protecting technologies.

Dylan Matthews: Do we want to do a round of overrated, underrated? Gain-of-function research?

Tyler Cowen: Everyone dumps on it. I’m skeptical, and so many people dump on it, but maybe there’s some chance it’s underrated and it’s actually useful. I just want to make clear that I don’t know. But it’s become a cliché, and I would like to see a lot more serious treatment of it.

Dylan Matthews: I met a biosecurity expert who almost in secret, as though she had a shameful secret, said, “I don’t think it’s totally pointless.”

Jacob Trefethen: Some of it is demanded by regulatory agencies, depending what it means. You’ll be asked to put things through resistance tests, and that’s in a sense selecting for enhanced ability to evade a drug or something. We shouldn’t be doing things that increase the transmissibility or increase the pathogenicity or harmfulness of a pathogen. I’m so mainstream in that way.

Dylan Matthews: Phase I trials for drugs.

Jacob Trefethen: They’re good.

Tyler Cowen: But the whole system of clinical trials needs to be made much cheaper, have a lot more trials, be much better funded, and have far fewer obstacles. That seems to me one of the very worst parts of our current system, and it makes everything much less safe.

Jacob Trefethen: I agree with you generally. I think that I might disagree on some specific cases, but what about Phase 1s in particular?

Tyler Cowen: I don’t have a particular view, but everyone I talk to says there’s so many different obstacles. Exactly which ones you should loosen up, I don’t pretend to know, but it seems something’s not working.

Jacob Trefethen: Right.

Dylan Matthews: Industry capture.

Jacob Trefethen: Of regulators?

Dylan Matthews: Maybe over or under-regulated as an explanation of why the world is the way. I assume most people would say they’re against industry capture.

Jacob Trefethen: Got it. Just checking. I think probably overrated in some circles, underrated in others. I think on net, maybe underrated as an explanation.

Tyler Cowen: Normatively, I don’t think industry capture is necessarily so bad. It depends on the alternative. A lot of times it gets things done. you build up cities, you have a lot of construction. The government where I live, Fairfax County, at times has been quite captured by real estate developers. I’m all for that. Bring it on. It’s one good recipe for YIMBY.

Dylan Matthews: The CDC.

Matt Clancy: I mean, they’re not highly rated at the moment.

Jacob Trefethen: I think scientific talent at the CDC, underrated. Outcomes, probably appropriately rated as not so hot in recent years.

Dylan Matthews: Nuclear waste.

Jacob Trefethen: Dial it up.

Tyler Cowen: I’ve been reading all these pieces lately, saying it’s not such a big problem. I don’t feel I can judge. But given the alternatives, I want more nuclear power. If we have to deal with waste, I say, let’s do it.

Dylan Matthews: Geothermal.

Jacob Trefethen: Probably underrated.

Matt Clancy: Seems underrated.

Tyler Cowen: Same.

Dylan Matthews: Global zero for nukes.

Tyler Cowen: Just impossible.

Matt Clancy: Is it a serious plan for many people?

Tyler Cowen: Who goes first?

Dylan Matthews: Barack Obama seemed to believe in it a little bit. He seemed important for a while.

Tyler Cowen: But what did he do? I don’t blame him. I think it’s impossible, but you can cut back on the number, it doesn’t really matter. You might save some money.

Dylan Matthews: Yeah. Zoonosis.

Matt Clancy: I will say that when we worked for the Department of Agriculture, we looked at this a lot for antibiotic resistance and farm animals. They use antibiotics, and it was always feared that this would be the vector through which we would get very bad, antimicrobial-resistant [illnesses] coming to humans. From what I could tell, it was very hard to make that case in practice. In theory, it’s compelling and the story makes sense, but it was really hard to ever trace back conclusively an example. So, it’s probably still correctly rated.

Jacob Trefethen: I would say underrated by the broader public. You could just make vaccines and antivirals against some of the obvious potentials, but obviously, we haven’t done that in some cases.

Tyler Cowen: All I know is I hear a lot of claims I don’t trust.

Dylan Matthews: AI model evals, either voluntary or mandatory.

Jacob Trefethen: Do listeners know what that is? I guess not rated.

Dylan Matthews: Not rated. The idea would be to release something like GPT-4 or Claude or another large language model, you would have to go through either a non-government agency, like the Alignment Research Center, or a government agency that tests to make sure that it doesn’t do a set of dangerous things.

Jacob Trefethen: For models above a certain size it’s something that’s got to happen at some stage. There is another one of these episodes about the political legitimacy of science. If you have industries or scientists taking what the public perceives as large risks, that are on behalf of other people, that’s not going to last. So, probably underrated.

Tyler Cowen: We don’t yet have the capacity to do it, but as you know, when Apple puts out a new iPhone, they have to clear it with the FCC. I mean that’s been fine. There’s a version of it that can work, but right now, who exactly does it? How is it enforced? What are the standards? Is Lina Khan in charge? Is Elizabeth Warren in charge? I just don’t get how it’s going to improve outcomes. It’ll become a political football and polarize the issue, so I say we’re not ready to do it yet.

Matt Clancy: Self-regulation is probably a good place to start rather than involving government agencies and having nonprofits that are focused on this. I agree with Jacob that eventually, you probably want to codify this somehow, but you have to start somewhere, and this seems a reasonable place to start.

Dylan Matthews: Luddites, either current or historical.

Tyler Cowen: They were smart. They didn’t see how good progress would be. They didn’t know fossil fuels would come into the picture. They’re a bit underrated, maybe. They weren’t just these fools.

Matt Clancy: I do have some sympathy for them, I’ll admit. They were responding to real problems.

Jacob Trefethen: I do think it’s wise to consult what makes your life go well or not. There are a lot of things that don’t feel connected to technology directly. It’s falling in love, having friends, it’s all of that. 

In the grand scheme of things, that is probably a connection that we as a community need to keep making, if we want to make the changes in metascience and the science world broadly continue to matter to people. It gives me a little bit of generosity to the Luddites too.

Dylan Matthews: That seems like a beautiful place to end. All you need is love. 

Caleb Watney: Thanks for listening to this episode of the Metascience 101 podcast series. Since we recorded this episode, Matt Clancy has published a long and thoughtful paper sketching out a framework to help think about these trade offs called “The Returns to Science in the Presence of Technological Risk” — I highly recommend reading it if you thought this conversation was interesting. For our next episode, we will consider the role that political legitimacy plays in our scientific enterprise.

Episode Seven: “Science and Political Legitimacy”

Caleb Watney: Welcome back to the Metascience 101 podcast series! I’m Caleb Watney and in this episode, Dylan Matthews leads a conversation with Alexander Berger, Tyler Cowen and myself on the relationship between effective, robust scientific institutions and our notions of political legitimacy. How does science change when we are spending dollars that are accountable to the public? 

Dylan Matthews: I’m Dylan Matthews. I’m a reporter at Vox. I like to write about philanthropy, progress, and things of interest to the IFP world. I have three great guests for you: Caleb Watney, who is a co-founder of the Institute for Progress; Tyler Cowen, professor at George Mason University and author of the Marginal Revolution blog; and Alexander Berger, who is CEO of Open Philanthropy, a leading funder in this space.

We’re going to talk about science and politics today, and how to build scientific institutions that have some form of political legitimacy. As people who think that science is relatively important to progress, this is a fairly central question. 

Caleb, why don’t we start with you? How would you characterize America’s current framework for politically supporting science? We can start there, and then we can get into some of the strengths and limitations.

Caleb Watney: I think you could conceptualize this a couple of ways. The first is in terms of pure funding outlays. The majority of our basic research is funded directly by the federal government. The National Science Foundation and the National Institutes of Health together comprise around $60-70 billion per year, which is non-trivial. They fund a lot of basic research, a little bit more applied research especially on the NIH side.

There are also a number of quite lucrative tax incentives that we provide. For example, the R&D tax credit is a huge incentive trying to recognize the fact that when private firms invest in research, oftentimes, there are positive externalities that they don’t totally capture. So financially, the public sector is a huge driver of science.

Scientists as a class are oftentimes government employees. When a little kid thinks about who a scientist is, they think of NASA. Or they think about people working directly in university physics labs. There’s a quite tight link in the public imagination between science and the public sector.

Dylan Matthews: Got it. One aspect that sometimes doesn’t get fleshed out as much is the connection to the university system. 

We have a large public university system. The majority of our research universities are publicly funded. How much of that is coming out of state and local versus this national level that you’re describing?

Caleb Watney: Right. In terms of research funding, most of it is driven by the federal level. Again, NSF and NIH are the biggest funders of university research. It’s true that money is fungible and sometimes state and local budgets, especially for state schools, will provide a lot of funding for universities. But in terms of the pure research budgets, a majority of that comes from the federal government.

Dylan Matthews: Got it. If we’re thinking about things that influence decision making for these kinds of institutions — Tyler, maybe I can bring you in here — these are institutions that are overseen by Congress and are answerable to the public. You sometimes get freakouts about the NSF funding, research where you put shrimp on treadmills, that kind of thing. What do you view as the main risks of putting so much of our resources in this kind of an institution?

Tyler Cowen: I would stress just how decentralized science funding is in the United States. The public universities are run at the state level. We have tax incentives for donations where you have to give to a nonprofit, but there’s otherwise very little control over what counts as a viable nonprofit. 

One specific issue that I think has become quite large is how much we run our universities through an overhead system. On federal grants and many other kinds of grants, an overhead is charged. The overhead rates are very high, and well above what the actual marginal overhead costs. 

You might think that’s a crazy system, and in some ways it is crazy. It means there’s intense pressure on professors to bring in contracts, regardless of the quality of the work. That’s clearly a major negative. Everyone complains about this.

But the hidden upside is that when universities fund themselves through overhead, there’s a kind of indirect free speech privilege because they can spend the overhead how they want. Now, I actually think they are violating the implicit social contract right now by spending the overhead poorly. But for a long while, this was why our system worked well. You had very indirect federal appropriations: some parts of which went to science, other parts of which went to education. It was done on a free speech basis. 

But like many good systems, it doesn’t last forever. It gets abused. If we try to clean up the mess — which now in my view clearly is a mess — well, I’m afraid we’ll get a system where Congress or someone else is trying to dictate all the time how the funds actually should be allocated. 

That’s a question I’ve thought through a good amount: how or whether we should fix the overhead system? I feel we’ve somehow painted ourselves into a corner where there is no good political way out in any direction. But I think you’ll find case by case that the specifics are really going to matter.

Dylan Matthews: Let’s get into some of the specifics. Do you have an example of the overhead system breaking down that is motivating for you here?

Tyler Cowen: Well, universities are spending more and more of their surplus on staff and facilities — on ends that even if you think they’re defensible in some deep sense like “Oh, we need this building,” it’s about the university. It’s about what leads to long run donations, but it’s seen as a violation of public trust. 

The money is neither being spent on possibly useful research, nor educating students. The backlash against universities is huge, most of all in Florida, Texas, and North Carolina. It seems to me that where we are at isn’t stable. How we fund science through universities is, in some ways, collapsing in bad ways. The complaints are often justified, but odds are that we’ll end up with something worse.

Dylan Matthews: I don’t want to focus too much on the state aspects of this. Obviously, this is a heavily state-sponsored enterprise, but pharmaceutical and chemical companies employ huge numbers of scientists. 3M has plenty of scientists working on various polymers and things. 

What does the division look like there? What is the kind of symbiosis between these types of scientists? I guess that’s for the field, but if Caleb maybe wants to get a stab at it?

Caleb Watney: Sure. I think the conceptual understanding, oftentimes is this spectrum from basic scientific research, all the way to very applied technology. The classical understanding is that you put in federal resources at this early stage of the pipeline, really basic research stuff that may not pay off for another 10, 20, 30, 40 years. It’s hard for private sector companies to really have incentive to invest in that kind of research, so there’s a strong case for federal investment.

Then after the basic scientific advancements are made, it moves down the pipeline, and eventually you to a point where pharmaceutical companies, chemical engineering firms, or whoever can see the light at the end of the tunnel. They can see the way to potentially commercialize whatever technology and that’s the moment they jump in.

Oftentimes, this sort of spectrum between basic and applied science misses the fact that working on applied science can generate insights or questions that then lead to basic scientific results. So, it’s often the case that you look back at the old industrial research labs: Bell Labs, Xerox PARC, etc. They were oftentimes working on quite applied problems, but in the process of working on those problems, they generated insights and solved basic scientific questions as well. 

Dylan Matthews: Alexander, you have a sort of unusual perspective here as someone who funds scientists and attempts to improve science policy. You have a heavy incentive to pay people to try to understand this better. I’m curious — what are the main lessons you’ve gotten in terms of why the funding system works the way it does and what its limitations have been?

Alexander Berger: I think to speak to one micro example of limitations: a project we did a few years ago, that our science team led on, was looking at the winners of an NIH review process called the Transformative Research Process, or the TR01. 

R01s are the standard NIH grant, usually around $1 million for most biomedical research. The TR01 was meant to fund more experimental, higher upside, higher risk science. Our science team did a process where they invited a bunch of people who had applied and been rejected by the NIH to reapply to us, so we can get a sense of who else was in the field and what was the kind of science that the NIH wasn’t necessarily able to support at the current level. And just get a really diverse cross-cutting sense of what kind of research was being put out there as transformative.

One of the things that they were most surprised by was — I’m making “air quotes” but you can’t see — how “normal” most of the science was. In spite of the fact that the NIH had tried to set up this process to enable transformative basic, risky research, it still had all of this process around. The applications were really long. They were still asking for preliminary results. So, it still ended up looking a lot like you already needed to have a lot of the research done in order to get the funding to do the research. I think that kind of risk aversion in the scientific funding process is something that we’ve seen a lot of. And it makes scientists often a little bit pessimistic about the prospects of reform because they see at these large-scale research bodies — who fund lots of good research, for sure — that it’s hard to really enable them to take risks to try new things.

Dylan Matthews: So let’s do a bit of Chesterton’s Fence reasoning here. For listeners, Chesterton’s fence — this British writer noted that if you see a fence out in the field that you haven’t been to before, you should probably think about why the fence is there before you tear it down. If the fence here is these bureaucratic restrictions that require onerous applications for funding, that seem to create these problems that I was describing, why did that come about? What problems was that solving prior to the reforms that brought it about?

Alexander Berger: I think that really goes to the heart of this discussion around the political economy and policy and science. Like the thing that you were saying about the research on shrimp treadmills. The fact that science has always felt vulnerable, especially when it’s curiosity and scientist driven, has created a lot of bureaucratic processes to try to show that, “No, we’re being careful, rigorous, and responsible. We’re not just throwing money after flights of fancy.” 

In order to be able to defend these large-scale, public appropriations to support relatively basic research that might fail and might not pay off. These projects could sound kind of weird to someone just hanging out and wondering about why tax dollars are being spent this way. So, I see that as the core driver of the bureaucratization of the process — the need to minimize risk and maximize explicability in an enterprise or process that is itself very curiosity-driven and hard to plan.

Tyler Cowen: I think there’s a general problem in science funding, also arts funding, and it’s the following. There is a lot of underproduced public goods out there. Basic science is one of them. At the margin, you can always do something with government. If it’s small enough, it can be well-controlled and have positive impact. But as it gets larger, Congress or someone else wants to have a say. Then effectiveness is greatly diminished. Over time, bureaucratization sets in, labor costs rise, maybe the states and different senators want their share of the thing, whatever else. 

So you have this scarce resource. It’s the ability to do things without attracting too much attention. You have to think very carefully how you allocate that. I think a lot of good science policy is knowing when you can do more in an area without attracting too much attention. That’s always going to change over time. It won’t be a fixed formula. Knowing that we could set up 27 different ARPA-like entities, but in fact, the total amount of money would be so high that Congress would really start interfering with them all, and then we’ve got to pull back from that. Even though the abstract arguments for doing that might be quite strong. It’s a kind of art: figuring out the balance of what you can get away with and keeping enough autonomy so that it still works well.

Caleb Watney: This kind of gets at one of the real meaty, thorny issues in the heart of science funding, and especially when you’re considering the political support for it. In many ways, the strongest theoretical support for public funding of science is for basic science, but that’s also the part that is the least politically defensible. It’s the part where you are most likely to find really weird, strange things — yeah, sometimes you are funding underwater treadmills with shrimp running on them, but sometimes you end up doing that and you’ll discover something really interesting about underwater mechanics that ends up changing how submarine design works.

Dylan Matthews: Or you invent the transistor or something.

Caleb Watney: Yes, exactly. I think one way to do this is to be cautious about political limitations and how much can fly under the radar as Tyler gets to. 

Part of it is also thinking about science as a portfolio approach. Oftentimes, public servants who are working in science agencies get dragged before Congress and get told, “What are your successes? What are you working on?” I think it’s actually quite hard for them to point to successes, and part of that is due to the fact that basic science is hard to predict and hard to know way down the line. 

But also, we don’t actually have a lot of great, inherent justifications for why science is designed the way it is. A lot of it is path dependence. We designed a series of scientific institutions, especially after World War II, and the design of those has just persisted, without a lot of experimentation. 

This is one of the things that we’ve been working on: are there ways that you could build experimentation into the way that science agencies operate? That way, you could actually get a baseline of, “Hey, we tried these two different procedures, these two different ways of allocating funds across a portfolio. And we found that this one generated X percent more citations,” or “This one produced 10% more novel research proposals as judged by the new technical keywords that were combined in an application.”

Alexander Berger: Isn’t there a parallel in terms of IFP’s work on policy change, to what Tyler was saying about wanting scientific research funding to sort of stay below the radar sometimes? Like people talk about the secret Congress idea. Sometimes when you’re doing science policy, you actually don’t necessarily want to be in the headlines, you don’t necessarily want the President announcing it from the White House steps. You might want it to be something where it’s operating behind the scenes as a second-tier issue.

Tyler Cowen: Universities for a long time enabled that, but now they too are in the line of fire. It seems to me a lot of our institutions now have become too legible in a way that’s not sustainable. I admit that’s maybe a controversial idea for you, Alexander. But I worry about this, the idea that “Oh, you know, I saw Spock on Star Trek, the professor on Gilligan’s Island, the scientists are working on this. It will be fine.” There is something useful to having a world like that.

Caleb Watney: A book I think a lot about is Revolt of the Public by Martin Gurri, which talks a lot about a lot of these themes: what happens when information becomes way more legible than it used to? His primary thesis is that the internet made a lot of the behavior of public institutions and the behavior of our elites so much more legible, trackable, and findable than they used to be. Even if our institutions or our elites are failing at roughly the same rates that they did 50 years ago, it’s so much easier to find and make those failures legible.

One example, outside of science I think a lot about is the National Football League. There’s a lot of complaining about the quality of refereeing. A lot of people are convinced that referees are so much more incompetent than they used to be. You’ll see on Twitter people pulling out clips of, “Look at this referee making the obviously wrong decision in these 10 games, with the same team again and again.” I think it’s totally wrong. I think referees are probably just as good if not better than they might have been 40 years ago. But it’s so much easier to draw out the failures in very highly legible ways. This is a trend that’s absolutely happening with science as well.

Alexander Berger: And it’s actually like Monday morning quarterbacking across society has just gotten way more pervasive because we have better documentation. Everything is more legible.

Tyler Cowen: It may be great in some areas like food safety, where you just want a very low rate of error. But when you’re playing a game where there’s one hit in every 10,000 attempts, it may be quite counterproductive to have too much legibility to the public. Because some of the failures will be quite absurd, the Golden Fleece Award or Solyndra. We need to think of some new ethos to recreate some of the illegibility but still keep accountability and get some new lens that maybe no one has figured out yet.

Alexander Berger: I mean DARPA is an amazing success story in this front, where the fact that they’re still so high status in spite of the fact that so many of their projects fail catastrophically. I think they have successfully sold the ethos of the brilliant program manager out there taking risks at the frontier. And I think the tie in to defense makes it-

Tyler Cowen: It’s the military, I think, that sustains them, not that the public understands their model.

Dylan Matthews: At the same time, we have a bunch of ARPAs now. We have an ARPA-H. We have an ARPA-E. How do we account for that? Is it military hero worship and that you want to copy the successful military institutions?

Caleb Watney: I think the military aspect of it certainly provides a vein of legitimacy for ARPAs, but part of it is that a lot of the bets that ARPA managers make are not public. They can fund a portfolio of 40 things and even if only one of them works out that can be the thing that you trumpet. The 39 failures are not nearly as legible in the ARPA model as they are under the traditional NSF or NIH model.

What’s interesting is that, across a lot of our scientific institutions, we’re seeing almost cultural evolutionary responses to this. How do you justify to the public why you’re spending money on things that might fail? The ARPA model was one version of this. Peer review in the traditional scientific system is another example of this. 

As an NSF program officer, being able to tell the public, “Hey, it wasn’t me who made this bet on this underwater shrimp treadmill,” — to keep coming back to that example — “We asked a panel of experts, a panel of capital ‘S’ scientists, and they said that this was a good idea.” That provides at least a vein of defensibility that science has relied on for a long time. But that defense mechanism is weakening, especially as capital “S” science becomes more polarized than it used to be.

Tyler Cowen: It seems we’re in a weird world, where at the very micro level of the individual researcher, the emphasis is on the defensibility of your research way more than ever before. A paper has to be longer, robustness checks everywhere, all these appendices. But at some higher macro level, maybe it’s due to polarization. 

But defensibility is much weaker. Say you are in a state legislature or you are in Congress. Well, maybe what matters is your party and what your district looks like and how well you did. The accountability lines are weaker. This weird mix of defensibility is way stronger at the micro, but quite a bit weaker at the macro. That’s a problem science has to deal with, so it makes us risk averse and then poor allocators at the highest tiers. 

Dylan Matthews: Our friend Emily Oehlsen had a helpful contribution to the conversation here about the idea of like weak or strong link problems. 

Sometimes if you were trying to regulate the safety of apples that are being sold, you care a lot about the worst apple and making sure none of them have poison in them. But maybe for science, you want to maximize the quality of the strongest link, make the best paper, say a special relativity paper rather than making sure that there’s absolutely no papers about panpsychism or something that make it into the mix. 

That does seem somewhat helpful here, but I don’t know how we get around the problem that Tyler diagnosed that all this research is legible, the weakest links will be pulled out and highlighted in legislatures and Congress, and absent some IARPA-style, extreme secrecy. It’s hard for me to imagine how you get around that dynamic.

Alexander Berger: How much do you think polarization is the root cause of the problem? It’s striking to look back at statistics on the partisan affiliation of scientists from 50 years ago. They were way less left-leaning than today, maybe even right leaning at some points. I wonder if that helped contribute to the relatively bipartisan credibility of science and scientific research institutions. In a way that has declined as scientists as a population have become consistently more left leaning. But I’m curious what you think, Tyler.

Tyler Cowen: It’s part of the chain of the problem, but I doubt if that’s the primary driver, because it seems that it is endogenous and it’s relatively recent. 

I think the primary driver of a lot of our problems is that there are not any good scientific funding institutions that stay really good forever. It’s just a fact of life about a lot of things, in the private sector as well. That’s what’s driving this. 

When people look for very abstract principles on what worked, I get quite suspicious. I think I have less nostalgia for past successes than a lot of science policy people in our circles and I keep on coming back to this time inconsistency point. Maybe scientists turned against the Republican Party. Basically, they stopped agreeing with it, and it was in their interest to do so, and the gains from conformity like in many areas have become higher. All that together makes it part of the chain, but not the first step.

Caleb Watney: I think that this aspect of new versus old institutions can definitely be an explanatory factor in the declining effectiveness of science. 

Even if you were to make our scientific institutions much more effective, I don’t know how much more political support would that necessarily generate. On the margin, it would help. But again, if our model here is that people are pulling out the failures, publicizing them, and making them legible, even more successful scientific institutions will still have failures that are possible to bring out in the spotlight.

Alexander Berger: I think they would have more embarrassing failures, right?

Caleb Watney: Yeah, if connected to success is taking on more high-risk, high-reward failures. The flip side of this is maybe that we need to do a better job of marketing and telling positive stories about the successes of science. Here is a way having more effective scientific institutions might imply better communication of science, its upside, and the successes.

Alexander Berger: I’m always skeptical of that kind of approach, because I feel like it implies too much responsiveness to public opinion. I think science polls okay. People like it. It’s a little bit like mom and apple pie. The bigger issue is the polarization of the research workforce has meant that the bipartisan support that, to a remarkable extent, science and scientific funding has benefited from over a long period of time, is decaying. So, it does seem like you need to have a partisan analysis of this problem, as opposed to merely a secular-change-type story.

Tyler Cowen: Part of the problem might be that it’s no one’s priority, except for, say, the people around this table and some of those we know. That makes it especially vulnerable. The scientists themselves are not effective defenders.

Alexander Berger: But that doesn’t seem true. I mean, think about the CHIPS and Science Act, the NIH budget goes up, not down. Trump tried to cut it, and Congress stopped him. The extent to which these institutions are politically durable is underrated.

Caleb Watney: I think it’s true. I mean, the NIH is exceptionally popular in Congress. Broadly considered, it’s often like, “Oh, you can’t actually try to change the NIH without getting NIH’s buy-in first.” I think unless you really want to go to bat as your number one issue as a senator. I think it’s exceptionally hard to change the NIH without the NIH’s buy-in.

But I think this is also modeling the fact that the NIH has already built in a bunch of defense mechanisms, and now is politically popular. It’s possible to make the case that the NIH has been too responsive to concerns about conservatism. That they’ve built in too many defense mechanisms. And now, they do have sustainable support in Congress, but they’re also way less effective than they could be.

Dylan Matthews: What do you make of the fact that the NIH is still politically supportable in Congress despite the fact that their most prominent employee, Anthony Fauci, has been on cable news for the last three years as a prominent hate object? The fact that they’re still popular in getting more money in spite of that is interesting to me and I don’t feel like I have a good model for it.

Caleb Watney: I mean this is pretty recent, and so we’ll see in some sense how this changes long-term support for the NIH. There’s a new NIH director who’s been appointed, and they’ll have to be congressionally confirmed. I think the expectation is that there will be a long fight about gain of function research and other things that the NIH has funded as part of that.

One reason why the NIH in particular has had such support is that the areas of science that they focus on feel quite explainable to the average American. They’re working on curing diseases, curing cancer, curing Alzheimer’s, and those are diseases that affect millions of Americans around the country. I think it’s quite popular to say, “We want to cure cancer, so you should fund the NIH.”

Alexander Berger: Right. It’s quite noticeable that the NIH is bigger than the NSF by a large margin. Biomedical research gets more funding than everything else combined. That’s not actually true because of the defense R&D spending, but-

Dylan Matthews: This is maybe an area where none of us have looked into it enough to say, but Howard Hughes Medical Institute is one of the biggest foundations in the United States. They fund a lot of biomedical research directly. They’re obviously not as prolific of a funder as the NIH. 

When you compare their application processes to the NIH, what does that tell you? Since if it’s way easier, that tells you there is something about the government and politics that makes us really dysfunctional. But, if they’re not that different, then that seems like a bit of a puzzle to me.

Caleb Watney: The economist, Pierre Azoulay, has a great paper, where he got access to both some of the HHMI data and some of the NIH data, and compared researchers that were right on the margin of being accepted as an HHMI principal investigator as opposed to doing the traditional NIH process, and seeing how their selection into one mechanism versus the other change the kind of research that they did. As background, most NIH grants work through this more project-based approach, where you submit a very specific grant application to a panel of peer reviewers, it gets scored, and then if you get accepted, you get funding to go to that specific project. Whereas HHMI operates much more on a person-based funding model where they select the particular scientist, give them a length of time, and say, “Whatever you think is important within your broad area of expertise, we’re going to give you funding to go and do it.”

Pierre’s paper shows that principal investigators who ended up getting the HHMI fellowship ended up doing more impactful work, both as judged by how likely it was to disrupt other research in that field, and also how likely it was to get more citations, more papers, more awards later on. So it seemed that allocation mechanisms really did meaningfully change the kind of research that they were doing.

Alexander Berger: HHMI was also associated with a second change that gave people more time between renewals. You don’t need to apply for a specific project, but you also have unconditional funding for a longer period of time. That explains why researchers are willing to take more risks, and they had both more hits — I think almost twice as many papers in the top of the citation distribution — but also more failures, more papers that almost ended up uncited because they might not have panned out or might not have been of interest to other scientists.

Tyler Cowen: I think of the two structures as quite parasitic on each other, a bit like the major music labels and the indies. You can say, “Oh, the one works this way, the other works the other way.” But neither could exist without the other.

The NIH props up the whole super-costly, bureaucratic, at times innovation-clogging infrastructure. But the innovators need that. And in turn, the NIH needs more innovative groups on the fringe to push or nudge them in other directions over the longer run. So I think of it as one integrated system.

Dylan Matthews: Like the classic HHMI Matador records comparison that you hear many times.

Tyler Cowen: Exactly.

Dylan Matthews: What are some of the services that NIH does provide those innovators? We’ve been pretty down on some of the processes for these groups. So what do you see as the basic infrastructure that they’re supporting that we would miss when it’s gone?

Tyler Cowen: Security for the profession as a whole, which is immense. There’s a place you can go. The fixed costs are very high and there’s really no one who wants to pick those up. If you go to a venture capitalist with something that’s 10-year R&D, much less 20- or 30-year, it’s very hard to get anywhere with that, much less with high sums of money. So if you design an institution to pick up a lot of fixed costs, it’s going to be very hard for that institution not to be super bureaucratic. 

Now, I would much rather see it be less bureaucratic, but there’s even a way in which Fast Grants, which I helped direct with Patrick Collison and Patrick Hsu: It’s itself parasitic on NIH. You’re funding at the margin, you’re speeding up at the margin, but you don’t have to pay any of the basic tabs, and that’s why you can move quickly. So I think we need to do a better job at the margin, of adding pieces that will fill in for what NIH will never be good at. And I’m not that optimistic about reforming the NIH.

Caleb Watney: You use this word “parasitic.” I would say maybe “complimentary.”

Tyler Cowen: No, I know. That’s podcast talk.

Caleb Watney: Right, right, right.

Dylan Matthews: Yes, yeah.

Tyler Cowen: It’s all parasites.

Dylan Matthews: As you remember from high school biology, there’s mutualism… 

Caleb Watney: But I think it’s true that sometimes we get caught up in thinking, “What is the best way to fund science,” in an abstract sense. That misses the fact that we should probably have a portfolio approach where we’re trying to fund different kinds of science in different ways. I do think the role that the NIH plays is being this funder of last resort. They’re just pumping so much money into the system that even if your thing doesn’t directly get funded, in some downstream sense you’re probably going to benefit from them.

Actually, one interesting example here is Katalin Karikó, the Hungarian-born scientist whose work was pioneering in developing mRNA vaccines. She was quite public after COVID in a New York Times profile about the fact that she was applying for NIH funding back in the early ’90s to advance her work on mRNA vaccines, and she was getting consistently turned down. At one level that is, in some sense, a massive failure of the NIH. 

But on the other hand, she was able to continue to stay in the United States in a downstream way: She was able to get funding from somebody else who was funded by the NIH, she persisted around for a while, and eventually she actually got funding from DARPA.

You can see that as an example of the system failing, and we could have possibly had mRNA vaccines 10 years earlier if we had made a different set of funding decisions. But also, the base support layer that NIH played meant she didn’t have to leave science altogether, and that seems like a plus.

Dylan Matthews: Yeah. That tees up a conversation on immigration, which is something that I did want to ask about, and that I know you work on a lot, Caleb. 

Science seems like an area where there are huge gains to agglomeration, to having smart people in a scene together. Most of the smart people are not going to be citizens of one particular country. There seem to be major gains to easing international migration on this, but there are major political challenges to that.

Caleb, what has your experience been trying to convince Congress to let more scientists stay in the United States? What’s been easier or harder about that than you expected?

Caleb Watney: Right. So when we launched IFP, we decided high-skilled, STEM scientific immigration was going to be one of our major focus areas, partially because the gains here seem so large. 

If your basic model is that talent is distributed roughly equally around the globe, then the fact that the United States has only 4% of the world population means that the majority of cutting-edge could-be genius scientists are going to be born elsewhere. If you really want to take agglomeration benefits seriously — if you think adding a bunch of smart scientists all in one cluster is really going to boost productivity — that implies you have to have ways to allow them to come and stay here. The United States already has this massive benefit of the world’s premier university system. We end up training a huge number of global scientists. Scientists-in-training come to our universities and then for bizarre, prosaic reasons we end up forcing them out in one way or another.

We do think that there are gains to be made in trying to improve the immigration system. It’s hard, for a variety of reasons. One is just that immigration as an issue has been bundled. It’s quite hard in this all-or-nothing sense to really push forward just high-skilled immigration, because it always gets tied back up into the border and DACA. Even though Indian PhD students in chemistry are not actually coming to the United States via the southern border, it’s been so polarized as an issue, it’s hard to separate.

I think there’s maybe hope that we’re starting to see some unbundling of it in the CHIPS and Science bill that Alexander mentioned earlier. There was actually, in the House version of the bill, a green card cap exemption for STEM PhDs and master’s students that passed, but didn’t ultimately end up in the final conference version of the bill. But I saw that as a positive sign that the political system is starting to be able to unbundle these issues.

Alexander Berger: What do you see about the political power of universities on these things? I would have thought that universities would really have cared about that provision. At least on the funding issues, it seems like they do show up and have some power to wield.

Caleb Watney: This is one of the great puzzles I find in the political world. If you talk to university associations or university presidents, they’ll definitely acknowledge that international students are a huge community that they care about. 

But I have not found that they put their money where their mouth is in terms of political force. Some part of this may be a collective action problem, where they benefit very directly by increasing funding in some specific NSF appropriations fund that they know their school plays particularly well in. In some sense, they can directly make that connection, whereas high-skilled immigration is an argument that’s much harder to directly make. 

There’s also a perception that because there’s this bundling, by stepping into the issue, they may be adding political polarization to themselves. If you’re the University of Iowa, you may not want to be making the case for full-fledged immigration reform. And if your model is that it has to be all-or-nothing, then I think that poses political issues.

Tyler Cowen: I would gladly triple the level of immigration and prioritize scientists. But I wonder if a key issue moving forward won’t be cooperating with bio labs or science labs in allied countries or even in non-allied countries. They’ll be more and more capable. I don’t think we’re going to send a lot of money overseas, but access to artificial intelligence or to intellectual property: that may be a way we can get certain things done with less legibility, just like there are some trials run in poorer countries. 

There’s a lot of labor there, and maybe we’re not going to let it all come here. So just how we establish working relationships across borders, maybe it’s a kind of frontier area where we can do something better. That would give us this new model, get us a bit away from nostalgia. Even with a much more liberal immigration policy, India is, what, almost 1.4 billion people? Only so many of them are going to come here, and we can do something there.

Dylan Matthews: I guess, but my question about that would be are we so sure our partner countries have any more functional immigration politics than we do? If the question is about partnering with, like, France, I trust the American political discourse on immigration a lot more than I trust France’s.

Tyler Cowen: They don’t have to let in immigrants, but they just have people you can work with and different rules of the game, and you have different people trying different approaches. We can expect maybe more progress from a number of other foreign countries than we’ve seen lately.

Caleb Watney: It’s interesting. I think this partially gets at how much you think in-person agglomeration effects really matter. With this new era of remote work and whatnot, it might be possible to have a lot more international scientific collaborations. But it seems like there’s still really massive gains just from in-person, physical interaction, and that relies on being geographically located in the same place.

Tyler Cowen: Sure. But, say, that doesn’t happen the way we all would want, what do you do at the margin-

Alexander Berger: Especially in biology, right, where people learning to pipette the right way or having the right exact lab technique just ends up being weirdly important.

Caleb Watney: You could say, in some sense, across a lot of areas of cutting-edge science and technology, tacit knowledge is just increasing in importance. 

Semiconductor manufacturing seems to be the kind of thing that you really just have to work directly on the factory line with somebody else that’s been working in semiconductor manufacturing for the last 10 years to learn the knowledge that they have. There’s a weird way in which especially for the very cutting-edge frontier of science and technology, in-person interactions are becoming even more important. 

Drawing back a little bit, I do think it’s interesting that other industrialized countries with whom we are allied are making different decisions about their immigration system. I don’t know per se if I would trust, say, France’s immigration system. But the UK, Canada, Australia, New Zealand, Germany to some extent, are much more aggressively targeting international scientists and trying to bring them into their borders. The UK especially has this interesting global talent visa, an uncapped category for cutting-edge scientists. 

China is also trying to be very aggressive about recruiting back talent. They have the Thousand Talents program. They also have the less reported thousand foreign talents program where they’re explicitly trying to bring international scientists to their border. I think China has similar issues with this because they have much lower rates of immigration or assimilation in general.

But, in some sense, the big barrier for all these countries that are not the U.S. is that people would prefer to move to the United States. If you ask them for their preferences of where they would like to move, it’s still the United States as number one. Canada’s been eating its way up there, but I almost think that’s just like USA-lite and they are willing to go there as a secondary location.

Alexander Berger: Hey, Toronto is pretty nice. Just to make a really obvious point that I think we all know, but might not be totally obvious to listeners: I think this kind of stuff can often end up sounding like, “There’s like a war for talent, and we want to win the zero-sum fight.” That can be part of the story or or why this policy appeals to some people. But I think it’s really important to note that there’s actually really big global gains from letting scientific talent concentrate on the frontier. 

There’s these papers, particularly by a researcher named Agarwal, looking at International Math Olympiad winners from around the world, and finding that kids at more or less the end of high school had performed similarly on objective international tests of math talent. But when they ended up in the U.S. vs. another rich country vs. staying in a lower income country, they were significantly more productive as post-PhD math researchers if they had moved to the U.S., and they were more likely to publish. They’re more likely to do a PhD.

There’s always worries about whether you have adequately controlled everything, but this is a situation where you had quite strong early measures of talent that ended up suggesting that even moving from the UK to the U.S. can be a pretty big gain in terms of your eventual output.

Caleb Watney: I think they were about twice as productive in the U.S. I mean, they were still much more productive moving to the UK than staying in their home country. But yeah, they were about twice as productive if they moved to the U.S. than to the UK, which is a wild fact about the world. A lot of people’s perception is that the UK has a pretty good scientific ecosystem. They’ve got Oxford and Cambridge and lots of cutting-edge scientists who are working there. And yet it still seems to be the case that the United States’s research environment is that much more productive. 

Tyler Cowen: Longer-run, is there any argument for having a greater number of multiple centers and giving up some gains today? You might end up more innovative. Like, do we really wish that in the year 1890 everyone had moved to Britain or to Germany? Right? Some came to the US. It actually paid off.

Caleb Watney: Yeah, I think you’re both not going to practically get everyone because people have countervailing things that they care about like being close to family. But also because there can be a specialization in research culture. 

There’s a really interesting paper that looks at the multiple competing clusters that could have been the home of automobile manufacturing. A bunch of cities in the Midwest had large manufacturing and industrial capacity that were the home for early prototyping around automobiles, and Detroit was like a relative unknown. It was much smaller. What the paper identifies as one of the things that made Detroit the ultimate winner was that it was a physically smaller city, so it’s just easier to run your prototypes back and forth across different facilities.

There can sometimes be a way in which being smaller allows you to specialize culturally in an area. If we think a lot of the power of these innovation clusters actually comes from the softer cultural side of it, that means you have to have a large chunk of people in those networks going to the local bars and talking about automobile manufacturing, or in San Francisco talking about software, or in Massachusetts talking about biotech — or, actually, there’s been a small cluster around virtual reality. It’s launched around Disney World, because there’s already so many use cases there. 

So I don’t think it’s inevitable that we end up getting a bunch of clustering in one giant mega-city, partially because innovation clusters do have this cultural dynamic there, and you actually need sufficient saturation of one particular area. A bunch of specialists in petroleum manufacturing or fracking are going to be different culturally than experts in artificial intelligence.

Dylan Matthews: To pivot this to politics a little bit, do we have any experience in setting up new clusters like that? I think there’s been some discussion in the U.S. about trying to relocate things to post-industrial cities, people getting priced out of major innovation hubs on the coasts. Do we know how to do that? Do we know how to do place-based policy like that, and is it at all desirable?

Caleb Watney: This is a big focus of ongoing legislation. The CHIPS and Science bill, which we’ve referred to a couple of times, made a major bet on reviving regional innovation. So across the National Science Foundation, the Department of Commerce, and the Department of Energy, there’s these big programs with a want to revive regional innovation within particular areas and we are making big bets on that. 

I am cautiously pessimistic about our ability to actually do that: Especially, from a top-down level, trying to say that we want Cleveland to become the next biotech hub, and then we’re going to spend lots and lots of money to make that happen. It just hasn’t worked out historically. There’s a whole Wikipedia page of failed Silicon Valley knockoffs that all have “silicon” in their name, like Silicon Slopes and Silicon Heartland and whatever. There’ve been a lot of attempts to recapture the magic of Silicon Valley.

Where I’m actually a little bit more bullish is — a lot of these efforts have been financing-focused first and I think financing can help, but I would be much more bullish if there was talent first. When I think about a regional innovation cluster that has succeeded more recently, it’s Pittsburgh, which was going through a bit of an industrial depression. Then, especially around Carnegie Mellon University, they made a really strong, targeted bet on robotics and AI, but that’s partially because they had a world-class university that was already there. They brought in a bunch of international students, and there’s cool literature showing that when international students come to university, and then especially when they start a company of their own, about 40/50% of the time it starts in the county where their university was. You can get these really strong clustering effects around universities. A talent-focused effort at regional innovation that then uses financing is the sprinkle on top. It may not still work, but I’d be more bullish about that.

Tyler Cowen: Even in that case, it’s worked for science, but Pittsburgh still has lost population.

Alexander Berger: Yeah. I feel like this is the classic thing where industrial policy to revive dying regions is just a really, really hard problem. And it’s an example of the way the policy process ends up prioritizing politics over innovation per se, right? We’re sitting here recording this in South San Francisco, and we can kind of see across the bay of Berkeley. Berkeley urban economist Enrico Moretti has a really nice paper showing that even within U.S. metro areas, there’s really big agglomeration effects in patenting. Moving from the fifth-biggest city in your area of research, not even from the bottom of the stack to the first, leads to notable gains in terms of output for people who are working on biology or on new micro-electronics. That’s kind of the opposite of what the centers-oriented drive is going to push you towards.

Dylan Matthews: Yeah. If we’re pointing to Carnegie Mellon as a success case, one of our major regional policies has been land-grant universities and setting up new universities. We’ve had a remarkable slowdown in creation of new universities since the ‘60s. At the same time, the most recent attempts — we probably can’t see from here to Merced, but I don’t think UC Merced is setting the city of Merced on fire. What are the costs and benefits of that? Do we need more universities? Do we need to rethink what they’re doing before we start adding more of them?

Tyler Cowen: I think I’m a little more drawn to a longer-term perspective than the rest of you. If I think of the late 18th century, no one thinks Germany will be the prevailing science power — and Germany becomes that within a century. How they did it is maybe not clear, it wasn’t reviving anything. If you go back much earlier, the Renaissance, no one thought England had any potential as a science power. There wasn’t even a notion of such a thing. Yet that’s where the scientific revolution comes. There seems to be some time horizon of something like a century, where you just can’t at all see what’s coming. 

Even though I want to triple immigration, I think that makes me a little more tolerant of the status quo than the two of you. So maybe next time, it’s India, which, when I was a kid, was a country just completely written off. But 60 years from now, it will be doing a lot of great stuff. Like, I don’t sit around wishing, “Oh, if we had only hired the best Toyota people in 1965, automobiles would be better.” In fact, it seems better that we let them stay with Toyota and didn’t bring them to Detroit. So, I don’t know. I think we should think about clusters a little more long-term and just be tolerant of things coming out of nowhere.

Caleb Watney: I mean, to potentially push back. The last time I would say we saw a major sea change in scientific leadership on the global scale was from Austria and Germany in the late 1800s, early 1900s. Then eventually, over the course of the 20th century, it shifted mostly to the United States. To my mind, that was primarily a story about massive immigration, across three specific waves of emigration and immigration. The United States ended up capturing a lot of that specific talent. 

The first was in the early stages of World War II. There was a mass wave of Jewish refugees that were being forced out of both Germany and Austria, and that included Albert Einstein and a bunch of the early pioneers in the Manhattan Project. Then after World War II, there was Operation Paperclip on the U.S. side and Operation Osoaviakhim on the Soviet side. And they’re basically both trying to recruit as many German scientists or, in some sense, forcibly kidnap them back to their countries because they realize these talents are so important.

Dylan Matthews: You got the Jews back, then you got the Nazis back.

Caleb Watney: Yes, yes. Both in turn. And the third wave, you could say, is post-Cold War, around the late 1980s, as the Soviet Union was on the brink of collapse. You have the Soviet Scientists Act of 1992. We created a specific time-delineated visa to be able to suck up as much Soviet mathematical talent as possible. 

Across those three waves, you saw a sea change in U.S. innovation, U.S. science. I do think sometimes clusters can arrive out of nowhere. But like, the last major sea change we saw was literally people moving from one place to another, and then their scientific leadership followed.

Alexander Berger: This is a totally different topic, but earlier in the conversation, I said science is really popular. It’s like mom and apple pie. But when I think about comparisons to the post-World War II era of immigration and the space race and the Cold War, science was coded as optimistic. You had the growth of engineering and you have Sputnik, you have space. I think it’s a little bit harder these days to imagine an optimistic utopian future, in spite of the fact that I think science, per se, and biomedical research especially are relatively popular and uncontroversial. I think it’s a little bit harder to just imagine a much better future. I wonder if that undermines some forms of this public case for science, relative to a more optimistic, mid-century style.

Tyler Cowen: Yeah, it has to be more than popular is the way I would put it. And maybe it’s missing that extra.

Caleb Watney: One proxy for this that I think is interesting, and I hear people sometimes talk about, is how optimistic does a country’s science fiction feel? 

In the 1960s, around the time when America was optimistic about science, our science fiction was quite optimistic. A lot of people today feel like it’s quite dour, quite pessimistic, always dealing with dystopian, world-ending scenarios. Chinese science fiction is sometimes pointed as being quite positive; The Three-Body Problem, even though it’s in some sense, dealing with apocalyptic things, takes a much more positive approach that humans have agency in some sense to change the world around them with their technology. 

But I think that tends to be more of a lagging indicator of scientific progress, rather than a leading indicator. I think when people have seen change in their own lives happening at a much faster, more rapid rate, it’s easier to imagine on a fictional scale what that would look like if trends continued over the course of my lifetime. Although I’m sure that there’s some way the two feed into each other.

Tyler Cowen: My purely anecdotal sense is that teenagers doing computational biology are super excited. There’s an old guard they war against. They think they’re going to change the whole world and cure everything. And that might all be overstated, but I feel some of that has come back recently, I hope.

Dylan Matthews: I don’t know if this is on topic as a political thing, but I was trying to think of why none of my friends and I wanted to go into science in college. And it was mostly that it seemed utterly miserable, that you worked as a vassal in the empire of some professor, doing minor tasks at the direction of some grad students. You had no freedom. You had no ability to formulate your own hypotheses and learn from them. That’s a caricature, but I wonder what a policy goal of making science fun would look like.

Caleb Watney: I think part of it would be really trying to attack how long it takes to reach the frontier. The NIH tracks the average age it takes to become a first-time PI, and it’s consistently going up and up and up over time. Part of this is connected to the growing burden of knowledge discussions that we had in an earlier episode. 

But part of it is also that it’s very hard as a young person to have agency in science. That is a key thing that drives people away from it. A lot of young people want to work on things where they feel like, within a relatively short amount of time, they can have an impact.

Alexander Berger: It is especially true in biomedical research, where the standard lifecycle is an increasing number of postdocs, and the age at which people get their first R01 has been going up, and might be above 40 now. The career choice that you’re making at this point just seems pretty unattractive relative to a lot of other options people may have.

Tyler Cowen: Who’s the number one science role model right now?

Dylan Matthews: I would have said until recently Elon Musk 

Alexander Berger: He’s an entrepreneur.

Tyler Cowen: He’s not a scientist in that sense.

Dylan Matthews: Of course.

Tyler Cowen: Maybe Stephen Hawking for a while, but that’s over, and he was in a way famous for something other than science. Katalin Karikó has not seized that mantle. That may be a personal choice on her part.

Dylan Matthews: Jennifer Doudna, perhaps.

Tyler Cowen: No one’s heard of her out there.

Dylan Matthews: Out there, we say gesturing to San Francisco.

Tyler Cowen: Out there, running through the window. Yes. Maybe in this town, but–

Dylan Matthews: Yeah. I mean, if you view computer science as a science, there might be. But even there, I don’t know.

Tyler Cowen: That is fraught now.

Dylan Matthews: Yeah, that’s fraught now. But Larry Page and Sergey Brin, met as PhD students, I suppose they’re not heroes.

Tyler Cowen: But that would help, if we had — whatever we all might think of them — people who are digested easily by the public and viewed as almost purely positive.

Dylan Matthews: Yeah. We need two, three Bill Nye, the Science Guys.

Caleb Watney: I sometimes think about, where do these scientific cultural heroes choose to go? You can read a lot of biographies of the early 20th century, and you see folks like Vannevar Bush who go from science to the government. I think there’s like a less clear connection there today. 

If Vannevar Bush was alive today, it’s unclear that he would go to the NIH or the NSF. “Can you as a young person have agency in a federal agency” is also a pretty open question. That also connects to the earlier point Tyler made, that new scientific institutions might be one way around this.

Alexander Berger: Yeah. I feel like that’s a broader cultural sclerosis, right? If you look at the age of the mean member of the Senate over time, our institutions have gotten older, the people who run them have gotten older. Overall, it feels harder to imagine regeneration for large swaths of existing U.S. institutions of all kinds. 

I mean, universities actually have been a super interesting case. Around the time when the U.S. started taking the lead in science, very early in the 20th century, that was the last time that we saw major new universities, University Chicago and Stanford in the 1890s, being founded. You don’t really see a random billionaire starting new universities in the same way anymore.

Dylan Matthews: We’ve talked a lot about deficiencies in the U.S., and how globally we want to distribute science. Are there other countries with science policies that seem politically viable that you’re envious of?

Caleb Watney: There’s not another country that really stands out as like, “Oh, man, I wish I could just adopt all of their policies.” I think there are particular countries that have particular policies that I think are interesting.

I’ve been impressed by New Zealand’s willingness to try new things. For example, they are one of the only countries that tried to use a lottery system, at least a partial lottery, for how they distribute scientific grants, which is interesting and attacks the very idea of how good are scientific institutions at being able to select meritorious grants within a population. It remains to be seen how that will work out. 

Actually, one thing I’m disappointed about is they didn’t really do it in a randomized way so that you could have a control group and see how the lottery would have done compared to some other kinds of system. But I appreciate that they were willing to take that risk in the first place.

Alexander Berger: The fact that it’s hard to point to cross-country examples of especially good science policy or science funding is part of my reason for pessimism about cultural or institutional reforms leading to profoundly better outcomes. I have this running debate with Patrick, who we did another episode with, where I think he sees a lot more optimism for those kinds of reforms. 

The lack of other vastly more successful science funding bodies in other countries to point to suggests that either the funding bodies just aren’t that important, or maybe the Pareto frontier is just closer than we see. 

Caleb Watney: To push it back on that, I think you can actually argue that the U.S. has the scarce resources that would be required for any country to actually push out the scientific frontier. So in some sense, the U.S. is stagnating only says something about how bad our institutions have been.

Tyler Cowen: Finland and Singapore have done education very well. In the realms of scientific innovation, they don’t seem to have that much to show for it. 

Weirdness is maybe the input that is scarce. The United States is pretty well run and we’re weird. We’re sitting very close to America’s weirdest, most tolerant, most open, most chaotic major city, which is San Francisco. We’re here in the legal entity of South San Francisco. But that’s no accident we’re near the weird place with Haight-Ashbury and Jefferson Airplane. And I think that’s what Singapore and Finland can’t pull off.

Dylan Matthews: Can we do a round of over/underrated? Patents.

Caleb Watney: I’m going to say appropriately rated, but insufficiently. Basically, I think patents work really well for some sectors and they work really poorly for other sectors. I would love to actually have patents or intellectual property rights much more differentiated by industry, but that would pose all sorts of issues with international IP agreements and whatever. 

Alexander Berger: In some sense, I think they’re underrated. I feel like nobody walks around on the street being like, “Man, patents are so great.” But in some deep sense, like, it-

Dylan Matthews: Maybe in that one court in Texas.

Alexander Berger: Yeah, exactly. But in some sense, I mean patents are what enable large-scale, pharmaceutical investment in developing new drugs. That seems, the classic case where it’s really valuable to be able to do. That’s pretty cool.

Tyler Cowen: High capital costs are underrated in those cases.

Dylan Matthews: Yeah. Prize awards.

Tyler Cowen: Overrated. People need opportunity, they need talent. 

Some dangled, big patch of money at the end of it all — I don’t know. I’m not sure that that kind of pecuniary incentive, it’s at the same time too large and too small. You’re not going to get to be a billionaire. I think amongst people like us who use the phrase, they’re overrated.

Alexander Berger: How does that interact with Emergent Ventures?

Tyler Cowen: They’re not prizes. They’re grants.

Alexander Berger: Isn’t part of the appeal that you’re creating a validating mechanism and the community?

Tyler Cowen: Well, the community is important, but that’s a kind of input. And the validating mechanism also, it’s a way of networking. If they are prizes, I get more worried. If they are ways of investing in networks and giving people a start and a nudge, then I’m happier.

Caleb Watney: I would say, prizes themselves are overrated, but there’s a broader category of alternative ways to finance innovation, dramatically underrated.

Tyler Cowen: Agree with that.

Dylan Matthews: Yeah. Advance market commitments.

Caleb Watney: Underrated, definitely. Although I will say, there’s a small bubble of people with whom they are overrated. They work very well within a particular set of conditions and circumstances. But I have some concern that we might start looking around and applying them as a square peg in a round hole, but, like, they are still dramatically under utilized in the policy world.

Dylan Matthews: The Bayh-Dole Act. For listeners who aren’t familiar, it enables collaborations with industry and publicly funded universities and allows patenting of certain publicly funded innovations.

Tyler Cowen: It could always be worse, right?

Caleb Watney: It seems fine.

Dylan Matthews: Yeah. Seems like a fine answer. Price controls.

Caleb Watney: Overrated.

Tyler Cowen: Overrated, but you’re asking someone where you know the answer in advance.

Alexander Berger: By who and in what context?

Dylan Matthews: For innovation-specific products. So I think prescription drugs are the classic case, but maybe medical price controls more broadly.

Alexander Berger: In general, I think that this is an example of where I think advance market commitments might not be exactly the best idea, but doing more to reward breakthrough progress in a way it doesn’t end up being passed on to consumers, has a lot to be said for it. I think it’s a good thing that the U.S. subsidizes so much of the innovation for the world — and I’m pretty happy to do it. But the 20th year of a patent that is discounted at a IRR hurdle rate by some corporate decision maker at like 12% per year is a very, very expensive way to induce marginal innovation. So finding more ways to make R&D spending cheaper for companies, rather than that marginal year of financial incentive seemed pretty attractive.

Dylan Matthews: Funding lotteries.

Tyler Cowen: I’m all for more innovation. I’ll try anything, as Caleb said, but I wouldn’t bet heavily on funding lotteries, per se.

Caleb Watney: I would almost compare lotteries to giving cash directly in the international development context, where just the presence of them can provide a baseline with which you can compare everything else to. We know that there are lots of ways of spending international aid that are more effective than giving cash directly. But the fact that we have a strongly established baseline is very helpful for the larger community. I think there’s lots of ways of directing scientific grants that I’m sure would be dramatically more effective than lottery. But I’m slightly concerned that we don’t have right now the baseline to test against.

Alexander Berger: And I feel like the analogy is even better than that. It might be the case that the mean dollar of aid is better than unconditional cash transfers and that the median is much worse.

I feel that way about funding mechanisms compared to lotteries. I think most funding mechanisms that actually exist or are widely used might be worse than lotteries, even though it might be the case that it’s very easy to do better than lotteries.

Caleb Watney: We recorded these sessions with several other workshop guests in the room listening in. After the initial conversation, Emily Oehlson and Jim Savage joined in with some additional thoughts.

Jim Savage: Gun to your head, what share of GDP would you put into public R&D?

Caleb Watney: I would say it almost doesn’t matter what the socially efficient rate is, because the political constraints are almost always going to be binding before the economically efficient rate. Even if we could effectively sink 15% of GDP into R&D, which might end up being optimal, I don’t think you would ever politically be able to hit that rate. In some sense, I would say politically, we can always just go harder.

Tyler Cowen: I think about economics, obviously the field I know best. I would spend less on it. I would spend more money on creating open data sets, and give way less or maybe zero to researchers. And whatever’s left over send to the biomedical sciences. 

It’s so case-by-case specific. The idea that we’re just going to take the status quo and shovel in a lot more money, I really don’t like. I would press the no button on that. But I can think of a lot of areas, methods and programs that I would give more money to if they would reform.

Emily Oehlson: Maybe this is too much of a can of worms, but the political legitimacy question of the moment seems to be how we should think about scientific progress in private artificial intelligence labs. What do you think?

Caleb Watney: Seems hard. 

Tyler Cowen: I’m all for it, so I wouldn’t use the word accelerationist, but I think our best chance at having stable international agreements that limit AI in some ways will come about if there’s American hegemony. It reminds me a bit of nuclear weapons. 

I don’t think we have any choice but to proceed at a pretty high clip, understanding that safety measures only tend to get developed in a hurry when there’s actual, real problems facing you. So I’m fine with saying, “This is great. Let’s do more.” I don’t think the dangers are zero, but I’m very much on record as staking out that position.

It just seems to be obvious that we’re going to do that anyway, so we want to be part of it in a better way. There’s no way to really fight all those incentives and stop it, so let’s jump on board and improve it.

Alexander Berger: I think there’s a really interesting question around the international balance of power that does seem much more salient to me on this issue than most areas. 

Like, when I think about progress on cancer biology, I don’t really have any sense of worry about getting beat, but I think there is a sense in which the analogy to weapon systems seems more salient for AI systems. I expect there to be much more invasive monitoring of labs, much more government engagement over time, much greater sense of national champions, than I think we typically see with non-profit research universities.

Jim Savage: There’s this great development paper where they allocate micro grants to a community of people who then have to allocate those grants to people whom they perceive as being the most effective in their communities in India. They have a clever mechanism to allocate that money in an incentive-compatible way. What’s to stop or what would be wrong with, say, the NIH making block grants to schools within a university, where they all have this rich context on each other’s research, and then have them divvy it up according to where they think it’s best spent?

Caleb Watney: I think there’s a way in which you could interpret the increasing centralization of especially biomedical labs as actually one way of doing this, basically. You’re just having larger blocks of scientists together apply for something and then they’re in some sense distributing funding across the lab. It might be that we’re in some sense already moving toward that world. You could also think about other ways people allocate the respect of their peers, in the form of: “Who votes yes on the peer review panel? Who endorses it in public letters? What’s the general sense of this whole area of science?” That is in some sense a reflection of what small, local departments think.

More generally though, I would make a pitch for scientific surveys as a pretty underrated thing in terms of both defining scientific progress and deciding which areas of science to fund more. I think there’s a lot of concern that the current ways in which we measure scientific progress, things like patents or citations or papers, are pretty poor proxies. People think that good science is a know-it-when-you-see-it kind of phenomenon. But that is measurable, through large-scale surveys.

So I would love to see almost a scientific census, or something that really tries to measure what scientists do across the board, and what they think both about individual people’s works but then also broad categories of work. I’d be particularly interested to see, maybe outside of your subfield, what other discipline ends up providing you and your research the most benefit. It would be an interesting way of trying to assess where scientific positive externalities are coming from.

Alexander Berger: I like the idea of allowing scientists to allocate funding themselves in a little bit more of a market to projects that they like. But I worry about primarily using the university bureaucracies to do so. If you look at the UK system, the Research Excellence Framework has some features of this, and I only absorb it through the rantings of unhappy UK professors on Twitter. My sense is that it ends up being a very painful bureaucratic process, rather than capturing more of the upsides, as a market-type system of local information seems to ideally deliver.

Tyler Cowen: I would second those remarks, and if we were going to spend more on one thing, if I get my one-item wishlist, I want to spend more on 13 to 17-year-olds. That’s when you can really influence people. I’m not sure you need to give them large amounts of money. You give them something with a science tag connected to it, help them do something at the margin. That’s the one thing I would do.

Alexander Berger: You see this compelling evidence from some of the Chetty papers and others, showing that early exposure to innovators seems to matter a lot. That sort of role model effect — the geographic effects in terms of how people are patenting and what they’re working on — I think that makes a lot of sense.

Caleb Watney: Thanks for listening to the Metascience 101 podcast! Next time we’ll discuss whether the invention of new ideas is overrated when compared to the bottlenecks for diffusing them out to the rest of society.

Episode Eight: “Invention vs. Diffusion”

Caleb Watney: Welcome back. This is the Metascience 101 podcast series. Today, Derek Thompson sits down with Eli Dourado to investigate the bottlenecks standing in the way of the invention versus the diffusion of ideas. Derek and Eli discuss whether new ideas are getting harder to find, how to get these ideas to scale, and how a crisis can spur effective implementation. Since we recorded this episode, Eli Dourado has started a new position as Chief Economist at the Abundance Institute.

Derek Thompson: Hi everyone. I’m Derek Thompson. I’m a staff writer at The Atlantic and the host of the Plain English podcast with the Ringer Podcast Network. I’m also working on a book about the future of progress in America and why America can’t build stuff with the New York Times writer, Ezra Klein.

What’s the real reason for the great stagnation? Why has it become so hard for America to build what we invent? I’m very honored to have the perfect guest to answer that question. Today’s guest is Eli Dourado. Hello.

Eli Dourado: Hey Derek. Great to see you.

Derek Thompson: Good to see you as well. Why don’t you give your own brief bio? 

Eli Dourado: Sure. I’m Eli Dourado. I am a senior research fellow at the Center for Growth and Opportunity at Utah State University. I’m an economist by training, but I work on trying to get economic growth actually going. This takes me into a lot of spaces, especially physical world technology and how to get that going.

Before this, I was the first policy hire at a supersonic airplane company. So I’ve done it in the private sector as well.

Derek Thompson: There is this very famous idea in the study of science, technology, and progress in America, which is that ideas are getting harder to find. This dates back to a famous paper, co-authored by Nicholas Bloom at Stanford, that showed that it’s just harder in recent years to have scientific breakthroughs in areas like pharmaceuticals.

You and I have gone back and forth about this quite a bit on why ideas are getting harder to find. One popular explanation is sometimes called the knowledge burden

For example, the field of genetics was broken open by a monk, Greg Mendel, who was basically looking at peas in his backyard. Through that study he put together the idea of dominant and recessive genes. Today, if you want to make a breakthrough in genetics, you can’t just grow some peas in your backyard and invent the field. You have to get hundreds of people together to do these big GWAS studies —Genome Wide Association Studies — to figure out some tiny detail in some polygenic disease like schizophrenia. In fields like genetics, it’s getting harder to push forward. 

The knowledge burden says that the smarter we get, the harder it is to push forward in the field. You come at this from the opposite end. You think that there are cultural reasons why we are making it harder for ourselves to come up with breakthroughs in all of these scientific fields.

Lay out your theory of why ideas are getting harder to find.

Eli Dourado: Nick Bloom is a good economist. I don’t want to trash his paper too hard, but I think the root of what’s going on is that economists are looking in the wrong place to explain the great stagnation. There is too high a degree of abstraction and maybe an unnuanced understanding of one of the core basic models of economic growth, the Solow model.

The Solow model says that economic output is a function of labor, capital, and a third term called “A,” which stands for total factor productivity. Obviously, not an abbreviation. If you were taught the Solow model in college by a professor who’s not trying to be terribly nuanced, you’ll come away with the idea that “A” really stands for ideas. It represents all the recipes that we have for combining labor and capital into output. 

If you observe that we’ve had a slowdown in economic growth over the last half century – which is true – and you know this is not caused by labor and capital shortages because we can measure those. So, we know it is not caused by that.

Then, the obvious conclusion is that we have a deficiency in the growth rate of “A.” So it must be the case that we’re facing an ideas slowdown. Then we can measure the spending in R&D, and we know that’s not going down. If anything, that has gone up.

Therefore, it must take more R&D spending to get more ideas. That’s what’s causing the growth slowdown. This is the lamp post that economists are searching for the keys by. It’s the initial place that you would look.

But I think that the real explanation for what’s going on here is that “A” is not just pure ideas that we know — as in, idea recipes that we know for combining labor and capital. Instead, it is all the ways that we do, in fact, combine labor and capital. But there are other reasons that we might not combine them – we might not use certain recipes even if we know them.

Some of those reasons include cultural opposition, legal opposition, legal barriers and so on. I think that’s the right place to look. We should be thinking about all the different ways that we have trouble instantiating ideas in the real world.

Derek Thompson: This is really the meat of what we want to talk about today. The word I use for what you’re describing is implementation. For a long time, I thought that progress really meant invention. All of my favorite books about the history of scientific and technological progress celebrate moments of invention. They celebrate Edison, they celebrate the Wright brothers, they celebrate Edward Jenner and the smallpox vaccine story in 1796. 

But in an article for The Atlantic, at the end of 2022, I started to think hard about the question of progress, and I looked at it through the story of the smallpox vaccine. I thought about that magical, golden day when Edward Jenner stuck a lancet into a young boy and inoculated him from smallpox, maybe the first smallpox vaccination in the history of the world.

At that moment, in a world of one billion people, only one had been inoculated. Is that really progress? Is it really progress when 99.9999% of the world has not benefited from the invention of what is essentially a prototype? No, it’s not. I began to think that maybe the story of progress that matters isn’t the story of invention, which is of course important, but the story of implementation.

How do you take an idea from one to one billion? This thesis I called “arguing against the eureka theory of progress.” Yes, invention is of course important. Going from zero to one in an idea matters, but implementation — from the one to one billion — is the journey of that idea to the rest of the world.

That might be the more important story of progress. Back to you, Eli, as we’re circling the same idea here. Why do you think the U.S., which is still rather good at invention, has gotten worse at implementation?

Eli Dourado: We don’t do transformational building anymore, in the world. Let’s look at what Robert Gordon identified as the five great inventions: electricity, the internal combustion engine, communications technologies, indoor plumbing, including urban sanitation, and chemistry including pharmaceuticals and materials.

A lot of those are inherently physical; they involve transforming the world. Fundamentally, the problem is that we’ve become unwilling to bear the short term costs that this entails. We call America the land of the free and think that people can do whatever they want. That’s true, as long as you’re willing to abide by a few simple constraints: nobody can be inconvenienced, nobody can get hurt, and no jobs can be lost. Within those parameters, you can do whatever you want, which turns out to be not very much. 

What if we invented the automobile today? This is one of Robert Gordon’s five great inventions that he says drove economic growth in the middle of the 20th century. Today, we would go to regulators and the public. We’d say, we’ve got this great new thing. It will provide trillions of dollars of economic value, but it’s also going to generate a fair bit of pollution. It’s going to kill 40,000 people per year in the U.S. We’re also going to have to take a bunch of land by eminent domain to build highways. It’s going to put horse and buggy makers out of work.

Thinking realistically about what would happen today, people would say, “Get out of here,” and would not let these things happen today. You’d face many more obstacles than they did a hundred years ago in the implementation of the automobile idea. Not in the invention of the automobile, of course, but in rolling out all the infrastructure that we would need for it. Ultimately, ideas are getting harder to use, and that’s the binding constraint.

Derek Thompson: One hypothetical that you could make is to imagine that the automobile were prescription medicine. Would we accept a prescription medicine that had all sorts of benefits, but also, using the very real example of the automobile in America, killed 36,000 people a year? That’s a lot of deaths, and reasonable people can say the status quo of cars in America is unacceptable. We hear this not only from the Ralph Naders of the world, but also from Silicon Valley, when they say that one really good reason to accept self-driving technology is that people are freaking terrible drivers. They kill tens of thousands of people a year. That’s why we should be more accepting of AVs. 

We could probably do hours and hours on alternate histories of the car in America vis-à-vis Europe, for instance, but let’s get into more detail on this turn in implementation. You would agree this turn dates to the 1960s and 1970s, when the U.S. had a raft of laws and legal decisions that made it harder to build stuff in America. Those laws, regulations, and surge of localism were a response to very real problems. 

From the 1950s and 1960s, we did build highways over minority neighborhoods that just had no ability to have input. We did poison the air and the water of America. We did build without any kind of 21st century ethic about environmental and minority considerations. You’re nodding as I’m saying this. I don’t want our listeners thinking that we’re about to have a debate about whether or not the spoliation of the Earth is fine. 

How do you think about balancing the need to build in the 21st century with the fact that the last time we built very fast we created all of this havoc?

Eli Dourado: Environmental regulation that actually protects the environment in a narrowly tailored but effective way is an unalloyed good. One of the ways that we enjoy our greater wealth today is that we have better environmental quality. Research on air pollution shows over and over again how damaging it is. 

The way that we have actually addressed environmental regulation, however, is through a lot of procedural laws that require community engagement and create a lot of veto points for anyone to use.

A lot of times that isn’t underprivileged people speaking up and advocating for themselves. Often it’s extremely privileged people who can hire a lawyer, who can use this veto power to block projects that personally inconvenience them.

It’s completely valid to say that we want safety, that we want good environmental outcomes, and biodiversity, and that we want to spend some of our wealth on these things in a way that creates social justice.

I’m a hundred percent on board with that. We’re going to have a much better chance of getting all those things if we are wealthier, because when we are wealthier, we can afford to spend more on those considerations. When societies are at subsistence level, they spend zero on most of those considerations.

As they get wealthier, they start to spend more on them. As we get even wealthier, we will spend more and more on them. However, the laws that actually passed were highly procedural. The one I’ve spent the most time with is a law called NEPA, the National Environmental Policy Act. The original statute was actually written in a pretty inoffensive way.

It says that if we’re going to take major federal actions that have a significant effect on the environment, then we’re going to at least state what those effects are. We’re going to write them down so that anybody can see what the effects are. Like a look-before-you-leap kind of good governance law.

However, NEPA was implemented through executive orders, regulations, and court decisions such that it became highly procedural. Now you basically have to do a substantive environmental review, even if the action you’re taking doesn’t have a significant environmental impact. That’s actually where most of the harm of it comes from. 

Then this process also now requires public input, which wasn’t in the original text of the law. And that opens the door to lawsuits after the fact. The agency decision to approve or not approve a project, or to move forward or not move forward, gets put under a microscope in a way that gives anybody a pretext to sue and to try to block a project. Over and over again in some cases. There’ve been all kinds of projects, including ones that are good for the environment, that have been effectively blocked by lawsuits that seek to weaponize NEPA.

That’s a major part of the turn.

Derek Thompson: I buy the broad outlines of that story. After the 1950s and 1960s where we did build highways and did allow polluting factories to truly wreck havoc across the country, Congress and the courts gave the people a microphone. A microphone that they could use to have their voices heard, to block the kind of projects that were demolishing neighborhoods and turning rivers green with the spill off from textile mills in New Hampshire. 

Today very often what’s happening is that higher income homeowners, who are against local energy and housing projects, are using the microphone to block projects that would, in fact, help the country in the bigger picture. These are projects that would help the country decarbonize and thereby help poor people who more often tend to be victims of environmental pollution.

It would help to build local housing projects that would relieve housing inflation, which would be good for the middle class. But the people who’ve grabbed this microphone often use it in a way that is orthogonal or antithetical even to how the most ethical and progressive reformers of the 1960s might have imagined.

Let’s talk a bit about how regulations in science, including in pharmaceuticals, might be blocking the translation of new ideas to new products. In my essay in The Atlantic, I talk about the legacy of Operation Warp Speed, which as I see it is an absolutely fantastically ironic policy program. I say ironic because it’s one of the most successful government programs of the last few decades, and yet it has also been politically orphaned. Democrats don’t seem to want to talk about it because it gives Trump a lot of credit. Republicans don’t want to talk about it because it created the vaccine that half of their non-seniors did not take and think is a Bill Gates conspiracy product. 

But it was extraordinarily successful at breaking land speed records for the development and distribution of vaccines. One way that Operation Warp Speed went from invention to implementation wasn’t just by spending more money. It was also by creating this glide path through “whole of government” urgency from approving the vaccine, accelerating the clinical trials, and then making it as easy as possible to build and map the supply chains that would get that vaccine into hundreds of millions of arms in a matter of months.

Let’s pause here before we think about some implications of Operation Warp speed. Why don’t you, Eli, dilate a little bit about what you think the most important accomplishments and deserved legacy of Operation Warp Speed are?

Eli Dourado: What I love most about it is that mRNA technology was completely untested in humans before. We took something off the shelf that we thought worked because it had been used in animal vaccines.

It had been used in veterinary vaccines, and we understood the theory behind it and we knew it would work. But it had never been done in humans before. If this were business as usual, we would’ve been very slow to adopt it. mRNA vaccines would’ve gotten extra scrutiny.

We took something that we were fairly sure was going to work but hadn’t been done before, and we did it. There are so many things like that in the world where it just hasn’t been done before, but we have good reasons to think it will work. But people and companies are just too risk averse and have to pay the billions of dollars in clinical trials to try something novel. 

With the vaccine, however, we just went for it. I don’t even want to say it was that big a risk because we kind of knew it would work, but we did something that we ordinarily wouldn’t have done, which is base a vaccine on what some people would call experimental technology.

Derek Thompson: I wonder whether a regrettable feature of the success of Operation Warp Speed is that it’s further evidence that America needs catastrophes to fast forward progress.

So you could say Operation Warp Speed was a wonderful idea, but we never would’ve gotten that pace of progress without a global pandemic. You could say the same for all sorts of technologies like the U.S. advanced airplane technology after World War I, and in World War II we had the Manhattan Project for nuclear bombs. 

But you know, on a less controversial scale, we have radar, penicillin manufacturing. The Internet and GPS were obviously developed during the Cold War. Clearly, the Apollo Project never would’ve landed a man on the Moon if Sputnik didn’t exist. The crises are focusing mechanisms. I wonder whether one meta question, of this podcast about meta science, is the degree to which advocates need to make a stronger case that there are crises that require a new, brave approach to the way that we do science and technology in America.

Eli Dourado: I agree. Statistics clearly show that the biggest period of productivity growth was World War II. That was the only time we truly had an all of society mobilization to just get stuff done. Crises jolt our complacency. During the crisis you put your complacency aside and you’re willing to do unusual, unnatural things to get things done. It works the other way around too, which is that if you’re complacent for too long, then the odds of a crisis hitting you then go up.

It’s like the Don’t Look Up phenomenon. If we ignore problems and are not proactive about them, then that’s when they become catastrophic. In terms of pandemics versus other diseases, we approved things rapidly during the pandemic because it was an emergency.

But I think about all the people who have terminal illnesses and other, serious illnesses. It’s an emergency for them, also. We should be pulling out all the stops a lot more often to get treatments to those people and more broadly to try to get more problems solved.

Derek Thompson: A crisis is a focusing mechanism, but it is up to us to decide what counts as a crisis. As I wrote in the piece in The Atlantic, we could announce an Operation Warp Speed for heart disease tomorrow. On the very solid grounds that it is the leading cause of death in America. The leading cause of death in America does seem like a national crisis.

We could announce a full emergency review of federal and local permitting rules for clean energy construction under, again, the very firm rationale that climate change is also a crisis. We could do the same for national zoning laws by announcing that there’s a housing crisis, since we spent the 2010s building the fewest number of houses per capita of any decade on record.

Sometimes defining a crisis is a collective subjective definition, but sometimes it’s a political determination, and you need political bravery to make that determination. 

In writing this piece, one of the most interesting conversations I had with anyone about Operation Warp Speed was with Heidi Williams. We talked about what an Operation Warp Speed for cancer research would look like. She told me on the one hand it would involve spending more money on cancer research but also experimenting with the way that we do research on cancer medication. That’s been a recurring theme of this podcast series.

One way that we could reform trials, Heidi told me, is that we could reform the way that the FDA uses what are called short term proxies for deciding whether or not a cancer medication is going to prevent cancer. She alerted me to this absolutely fascinating piece of information, which is that, between 1971 when the War on Cancer was announced and 2015, only six drugs were approved to prevent any cancer. That is way fewer than the number of drugs that were approved to treat recurrent or metastatic cancer.

One of the reasons why is that it’s really hard to do research on whether a drug is going to prevent a cancer decades out. By the time you have evidence that your anti-liver cancer medication is keeping the 30 year old from turning 70 and getting liver cancer, well that’s 40 years later. By that time, maybe the patent has run out. 

Heidi said that with some diseases, say heart disease treatments and beta blockers, we look at patients’ cholesterol levels in the short term rather than wait for the full mortality results of the heart disease treatments. We could similarly establish short-term proxies for approving drugs that prevent cancers if we did the research to figure out what those short term proxies are.

But it seems like we could save tens of thousands of lives or extend hundreds of thousands of lives by decades, if we figured out some way for the FDA to approve cancer prevention therapies without waiting 50 years to see if the therapy actually prevents cancer in 50 years. That’s just one idea to accelerate the development of life-saving medication without spending a hundred billion dollars of extra money on research. 

That was a long windup, but to throw it back to you, Eli, do you have other ideas for ways that we could create this glide path from the lab to the pharmacy, the same way that we did for the covid-19 vaccine or for other necessary medications? 

Eli Dourado: Science is like other industries. We’ve talked about all the dysfunction that we have in clean energy deployment where we have to get a lot of buy-in from a lot of people. Science is kind of the same way.

There are institutional review boards approving or not approving, or asking a lot of questions about experiments and so on, especially on humans. That is another form of community engagement that is creating a veto point.

We need to figure out why clinical trials have gotten so expensive. Some data that I’ve seen says they’ve gone up 50x in cost per subject, and I don’t know if I have an answer there. Some of the increases in costs are pretty organic and reasonable. We’re going after rarer diseases now, so recruiting is harder, et cetera, but I don’t think that accounts for the full amount. Getting the clinical trials cost down is almost the whole ballgame.

We’re saving a life for every $3,000 to $4,000 that we spend on drugs. It’s very, very high ROI, in general, in pharmaceuticals. It’s ironic because the part of the medical system that people complain about is paying for prescription drugs. But there is a high ROI because you don’t have humans in the loop.

You can imagine that you have an ailment, and you can either treat it through surgery or a doctor gives you a pill. For the surgery, you have to pay for all the equipment for the hospital, the time of the surgeons, and the time of the nurses and the anesthesiologists. The pill is so much better because you get humans out of the loop. That needs to be the goal. 

With regard to surrogate markers like you’re talking about, the FDA has done some of that, some for cancer already. They had a bad experience with a surrogate endpoint for Alzheimer’s. They approved an Alzheimer’s drug a couple years ago based on some markers. But the theory of what causes Alzheimer’s was a bad one. They got burned by that experience, so I’m worried they’re going to want to take a step back from that and start requiring more. 

More generally on what you were saying about the problem of things taking so long that they are off-patent. We need to rethink market exclusivity for medications. Right now you have an exclusive period based on your patent filing. But what if the market exclusivity were based on who bears the cost of the clinical trials, instead of who has the patent? So if a chemical is off-patent and you prove that it’s safe and effective for a certain purpose in a certain population, you should maybe get market exclusivity for that.

Or maybe we should just unlink it. Frankly, maybe we should get rid of patents entirely because drugs are the only place where they seem to have value. Figuring out why clinical trials are so expensive is number one. Then delinking the patent from the market exclusivity. You need something to reward the company for going through the cost. But it might not have to be the patent exclusivity period.

Derek Thompson: Let’s say the White House calls you tomorrow and says, “Eli, we think that the most important bottleneck to coming up with a truly brilliant generation of medications to extend the lives of Americans is the out-of-control cost growth of clinical trials. We want you to help us solve this. We want a Manhattan Project for reducing the cost of clinical trials.”

Where might you start? Where might you start to unlock that bottleneck just a little bit? Or start your investigation into what are the most important components of this cost inflation crisis?

Eli Dourado: Some of this probably has something to do with medical records, in terms of patient recruitment. Everybody’s been calling for compatible electronic medical records for a long time. We still don’t have them.

That would be part of it. Not getting so hung up on privacy all the time in medicine would also be valuable. That might make it easier to recruit patients. After that, then you actually need to understand at a much more tactical level why the trials are so expensive.

What happens is that you usually use a consultant to run your clinical trial. Those consultants are very buddy-buddy with the FDA. They have a long history in the pharma industry. If you’re a super scrappy biotech startup and you do a clinical trial, it will still proceed at the pace of the legacy industry.

You can’t do it according to your own culture. You’re doing it to the least common denominator culture. Figuring out how to solve the way those are run with their lack of urgency is the right thing. A lot of very tactical breakthroughs are needed. 

The other thing to think about is: do we need to prove effectiveness in drugs, or is it enough to prove safety? So right now, since 1962, I believe, the FDA requires both safety and effectiveness to be proven in clinical trials, whereas before that it was just safety. Yet once a drug is approved for effectiveness for one condition, doctors can prescribe it off label for any other condition that they want. We give doctors complete freedom to decide what drugs are effective for. 

That system of off-label prescribing is extremely valuable. We use it all the time, and doctors would be up in arms, and patients would be up in arms rightly if we took it away. This raises the point that having an effectiveness requirement initially doesn’t seem valuable, and it just adds another layer of clinical trial. It’s another obstacle.

I would want to look at whether cutting that part out, at least initially, to see if that increases the rate of drug throughput.

Derek Thompson: The only thought I had while you were talking was that we’ve begun to have international comparisons of infrastructure costs. So for example, you can look up online the cost per mile of building a subway in New York City, Los Angeles, Madrid, Moscow, wherever else.

It’d be interesting to have that kind of international cost comparison for the clinical trials that are being done within those countries.

It’s possible that different countries might have different standards and some might have gotten to a more Goldilocks position than the U.S. in terms of balancing a certain amount of privacy, the patient’s health, and a care for effectiveness beyond some level of zero.

I would like a little bit of care for effectiveness, even if it’s a bit less strict than we currently have. Some kind of international comparison might be a useful data point in this investigation.

Eli Dourado: Unfortunately, the FDA right now is very selective about where the clinical trials are done and what the rules are for the clinical trials. They’ve rejected some international trial data, just because they don’t trust it. I agree we could be creating a little bit more international competition.

Without reducing the quality of the trials, some jurisdictional competition in how recruitment could be done or other factors would be pretty valuable. But right now the U.S. is basically the major market for the world, because Europeans have price controls on drugs.

No drug manufacturer is going to recover their costs on the European market. They’re going to recover their costs if they can get to the American market. So often nobody really cares about drugs unless they’re approved in the U.S.

Derek Thompson: Eli Dourado, thank you very, very much.

Eli Dourado: Great talking to you, Derek, as always. 

Caleb Watney: Thanks for joining us for this penultimate episode for this Metascience 101 podcast series. For our final episode in this series, we’ll talk about different career paths and how you can get involved in metascience research.

Episode Nine: “How to Get Involved”

Caleb Watney: Welcome. This is the final episode of our Metascience 101 podcast series where we’ll turn to how you can get involved in metascience research. Professor Heidi Williams discusses career paths with innovation economist Matt Clancy and Professor Paul Niehaus. This episode touches on academic, non-profit, and private sector paths to research, the importance of your surroundings, and how you can find good, use-inspired questions.

Heidi Williams: Great. Our goal with this discussion is to give some advice to students and young people who might have listened to some of this series and are excited to get involved but are not exactly sure what that might look like, from a practical perspective. 

On paper, the three of us here look similar in the sense that we all pursued PhDs in economics. I would guess that we each saw some value in the toolkit that the field of economics provides, to help us make progress on problems that we care about. Paul’s and my career trajectories, on paper, look even more similar since we both finished our PhDs and went straight into academic jobs. In practice, however, each of the three of us actually took different paths that shaped how we thought about making progress on problems that we care about. 

We all value interacting with people with a wide variety of skill sets. And we wanted to bring our perspectives to this discussion on these issues for young people.

Let’s start off with careers in government. Matt, after you finished your economics PhD, your first job was at the U.S. Department of Agriculture, USDA. Tell us about opportunities to improve science from a public service perspective.

Matt Clancy: Sure. I worked for the Department of Agriculture, in the Economic Research Service (ERS) there. I was a research economist, and my government agency was unusual in that it was more like an academic department than many research departments in other agencies. 

In other agencies, often you’re focusing on solving a problem for your agency’s stakeholders. For instance, if you work for the Environmental Protection Agency (EPA), you might literally be doing cost benefit analysis-type stuff. We were still trying to publish in academic journals, and, in that sense, that made us more similar to you in your careers.

But there are differences with academia. There, you’re aiming to publish research that you think is interesting, and that you hope your peers are going to find interesting from a pure knowledge standpoint. Our end goal was to help policymakers craft policy, and we tried to anticipate their information needs, because research takes years to play out. There was an entrepreneurial element where we needed to forecast out three to five years, “What are going to be the issues that are going to be important in agriculture?” We have to start researching and gathering data on those things now, so that we’ll be able to inform policy down the road.

In making policy decisions, sometimes we can’t identify something very well, the data is not very good, or there’s not a nice clean experiment. Yet a decision still has to be made. Some number has to be used to guide it, or else it’s just based on intuition. That mindset gave me a different framework about the value of my research from the one I had when I started my PhD. I started asking, “What’s the end point of doing this?” At USDA, we knew that somebody needed to make a decision, and we wanted to inform that decision with better information.

Heidi Williams: There are a lot of different agencies that people don’t think about as intersecting with science policy, but they actually have very important inputs into a lot of the topics that were covered in this series of episodes. 

To give one example, the Congressional Budget Office needs to tabulate the budget implications of basically every piece of legislation that comes through. They are asking questions like: What’s the research and development investment budget, and how do we think about those implications? Or: What are the productivity implications of changes in high skilled immigration policy? 

Those are questions that economists themselves research. When you’re at a government agency, you might be tackling a very similar question, but with a specific consumer in mind. You’re thinking, “We expect that a given set of people in Congress is going to have these types of questions, and we’re going to need to pull together research and synthesize the best available answer to that question.”

That’s much more the motivation than simply the need to come up with curiosity-driven research questions. In agency work, you know what the research questions are, and you have a very direct connection to the consumer of your work. Is that one way that you would describe it?

Matt Clancy: At the Department of Agriculture, and at Census and at the U.S. Patent and Trademark Office, the box of acceptable research questions is definitely smaller than when you’re at a university. 

I went to university after this, and there you can do whatever you want. You don’t have anybody over your shoulder checking in on you quarterly to see what research projects you’re working on. 

But within this still large box of research that we thought would be relevant to policymakers, we had a lot of scope to do research that interested us. At the end of the day, the American taxpayer is paying you, and they are trying to get something for it. That’s the ethos in these agencies rather than just seeking knowledge for knowledge’s sake. You do have some autonomy, however. 

Here’s a concrete example: on my first day, at the Department of Agriculture, they told me, “We need to know about the implications of restrictions on antibiotic use in agriculture. We think using them so much may cause antimicrobial resistance, and it could be a problem. There are going to be new restrictions on antibiotics.”

That’s going to have knock-on effects on agriculture, because they don’t use antibiotics just for fun but for a reason. They actually help the animals grow more quickly. If we’re not going to let them use them for that purpose anymore, can we incentivize the drug agencies to develop other drugs that will have the same effect without this antimicrobial resistance? That kicked off a two-year project to understand the whole sector and the incentives, and to be able to give advice about what we should do.

I was given an objective with what we need to do. It still interested me as this new problem that I could sink my teeth into. The other half of the job evolved from hearing, “Matt, you just sort of have to figure out what to do.” At that time I thought, “There’s all this patent data that we’re not using to study innovation in agriculture; let’s build a data set and start exploring questions about that.”

Paul Niehaus: This is dynamic, and new opportunities like this are opening up. They’re in places like USDA ERS that are very long-established, and where it’s understandable what a job there looks like. There are places like USAID, which is most related to what I do, where they’ve had a chief economist for a long time, but not really a chief economist office and team. Now, Dean Karlan is trying to build a culture of evidence-based decision making, and that may open up new opportunities as well. Those are some of the most exciting, new opportunities for stuff like this.

Heidi Williams: I agree. It seems like students thinking about careers in government can choose an agency whose mission you find really inspiring, as in “I’m really inspired by Sasha Gallant, and I want to go work at Development Innovation Ventures at USAID.” They do have entry-level jobs that can become an on-ramp to further work. 

Or you can also match through fellowship programs that try to hit people at certain career stages and on ramp through them to more exposure to government. One that’s very natural for PhDs is the American Association for the Advancement of Science (AAAS) Fellowships, which gives you a direct placement in government, and there’s often support for you to see more than one office. For people just out of undergrad, the Horizon Fellowship is another program that is very good about helping you find a placement, even if you’re not currently in government. These fellowship programs can provide a natural way to on-ramp people into government and to find a good placement for their particular skill set.

The second category I want to talk about is what you can think of as academia-adjacent research jobs. There is economic research that is done outside of academia, and that is often, in the way that Matt was describing, more closely related to real-world problems. Think tanks are one natural place. Some private philanthropies like Open Philanthropy are doing research in a very directed way. 

I would also put journalism tackling social problems in this category. I think of this as very closely adjacent to research. Something like the Vox Future Perfect Fellowships or public writing that’s not necessarily attached to a given outlet, both are engaging in research on questions that you think are really important. 

I’m curious if you each could each share an example of someone you’ve seen in that position. What are the pluses and minuses of that career track for people as a means of exposure to other career opportunities?

Paul Niehaus: The first thing that comes to mind is the global development space in the NGO world. There are certainly positions in the World Bank, which is a well established track, and some of the other big multilateral development banks. Many of the bigger NGOs, especially the ones that are more evidence focused, have a research function internally. The IRC has a great research team. At GiveDirectly, we have a research team. There are people there with PhDs who are doing great economics research that is very focused on the needs and questions of the NGO that they work at.

I typically see people go there a little bit later in their careers, after having done some academic work and reached a decision that they would like to shift the balance. They might think, “I want to be doing things that are going to have an immediate tangible impact, and where I’m confident that the questions I’m looking at are important questions, because they’re coming to me from the rest of the team in the organization.” That’s a great route.

Matt Clancy: Where I work now, Open Philanthropy, has a number of different people engaged in basically pure research positions. I’m actually a research fellow, although a portion of my duties is grantmaking. There are some people who do pure research. Though again here, it’s not purely curiosity driven. 

There’s an instrumental objective, like say, “We’re thinking of launching maybe a new program. There’s an academic study that shows that the program was really effective. Can we dig into that study, replicate it, and make sure it’s effective?” Other things are more open-ended, like learning about potential areas to fund. Sometimes it’s researching if there are tractable ways to make progress on those, if the problems are important, and if there is a valuable marginal dollar or whether that space is already saturated.

What you mentioned earlier about writing in public is an interesting, new path. The internet is a prominent way to network that we didn’t really have twenty years ago. It used to be that to network with people and find opportunities, you had to move to DC and meet the government policymakers at happy hours or different functions.

You can advertise what you’re interested in on the internet as well. Writing a high-quality blog credibly signals, “This is what I’m interested in, and you can see my quality.” This may all break down with ChatGPT in the future. But that’s how I changed my career trajectory. I was in academia, working on the New Things Under The Sun project. That caught the attention of Caleb and Alec in the think-tank world, and that’s how I began my collaboration with them.

Brian Potter was a construction engineer who was writing a super high-quality analysis of construction and asking why productivity in construction was not going up like other industries. Now, he’s joined the Institute for Progress too. I can think of other examples too. So if you’re not in the job you want to be in, you’re not in government, you don’t work for a think-tank, one possible way to get attention is through the internet. 

Paul Niehaus: I have seen that work also in the opposite direction. There was a remarkable civil servant in India, who had blogged about the latest research papers that were coming out. We all wondered, “Who is this gem of a human being?” Then we started talking to him to figure out what research we should be doing, because we really valued his opinion. He’s gone on to work at Global Innovation Fund, funding research, among other things. 

Matt Clancy: It can be a new kind of credential too, because for the right open-minded person, you can point to a voluminous documentation of your interest and expertise in the topic. 

Heidi Williams: I really encourage students to spend time in government at some point, whether right out of undergrad, or while they’re doing graduate work, because you can often get a much better sense of the relevant constraints and objectives of the institutions that you study, from spending time physically working in them.

But I know, from people that similarly spent even a short time in private sector firms, that they learn a lot too. “Wow, the way that I conceptualize how firms make decisions, what they see as the regulatory constraints, or how they think about their path for getting ideas out to have an impact on the world is very different from how I thought.” That then brings them back to research a different set of questions. 

Do you think too few people see time in the private sector as something that they should do? Do people assume the private sector is a place that you go there to stay, and not to rotate in and out? 

Paul Niehaus: That sort of rotation, once you’ve committed to an academic path, can be a little tricky, because if you’re really full-time, what do you do? You ditch your co-authorship relationships and tell the editors that you’re not going to do referee reports. It’s hard to unwind the web of commitments and obligations that you make in any one path, and really commit to another one. 

But yes, 110%, there’s incredible value in spending some time and exposure in the private sector. My own experiences, starting two companies and having to build things from the ground up, led to all kinds of painful, lived experiences and lessons learned that way.

The example that I give to my students, which I really love, is from Paul Oyer, your colleague at the business school at Stanford. Paul has this great job market paper, which shows that sales spike at the end of the fiscal year because salespeople want to make their quota. 

This paper came about because he was sitting in grad school, and they were looking at some data on seasonality and sales, and there was a spike and, and everybody said, “Well, that’s weird.” Then they just moved on.Paul said, “Well, that makes sense, because it’s the salespeople making their quota,” because he had worked in sales right before going to grad school. Everybody said “No, no, no, that wouldn’t make any sense.” He said, “I’m pretty sure that’s what it is.” 

So he wrote a great job market paper and got a great job out of it. It’s knowing how to interpret the things you’re looking at, what sorts of things to look for, and not dismissing offhand things that seemed to not make sense from one mindset, because you’ve actually been out there in the world.

Heidi Williams: All three of us decided to go pursue a PhD in economics. If you were going to advise students on who should think about that path, what are the things that people often miss in thinking about this option as a path for having social impact with their work?

Paul Niehaus: The single biggest thing that I think people don’t understand is that having a PhD is so flexible. Having a PhD and an academic job is such a platform, and people do such different things with it. Heidi, you’re one of them, Matt, you were one of those people. I’ve done a whole diverse mix of things, including some research, but also starting a multinational NGO, and a couple of companies and lots of other things.

There are always trade-offs. But the first thing I want everybody to know is that a fundamental feature of the job is that you get to decide what to do with your time. If you want to get tenure, if you want to publish a lot of papers, that adds constraints. You have to think about how to do that, and what people are going to be responsive to. But that just gives you enormous freedom, right?

It also gives you a degree of security. When I’m doing entrepreneurial stuff, while I have this academic job, there’s some risk here, but I know I can afford to take risks. If I want to express an unpopular opinion to a policymaker, I feel the freedom to do that, because I know that it’s not going to cost me my job.

I think there is so much value to the platform aspect of it. But the key thing is that you need to envision it that way, not everybody is going to teach you to think of it that way.

Matt Clancy: I could imagine somebody who thinks, “I love research. I think I want to dig into these problems, but I don’t want the academic life where I have to move all the time, and I have to do an extensive predoc, and then I have to jump through all these hoops, and then I’m racing to get tenure.”

That’s one path, but that’s not the path you have to necessarily take, like Paul said. 

I went to Iowa State University, and I’m doing fine in my life. Most of the people in my cohort are also doing fine, teaching at small liberal arts colleges or in government. We didn’t have to run through all the postdoc stuff. If the predoc and this tenuous life are not what you want, the key thing is to ask, do you actually still want to do a PhD? Do you want to learn all these skills? Do you want to spend years digging into a problem and trying to get to the bottom of it? 

Paul Niehaus: Another thing that was really useful to me when I was deciding whether to do a PhD was a conversation where I was trying to decide whether to get into more of the “thinking” or the “doing” side of global development work. Somebody said to me, “It’s a lot easier to get from the ‘thinking’ into the ‘doing’ than the other way around.” There’s a lot of option value to that path. 

That really bore out in my life, because I ended up getting into a bunch of “doing” opportunities based on things I was seeing in the research. I realized, “Oh, the research says this is a good idea, and no one’s doing it. So I guess I’m going to do that.” I think it’s still broadly true that there are more options by getting the PhD first.

Heidi Williams: To echo this idea that came up, people often look at the average path of somebody that takes this route and decide that’s not what they would want, and I think that is not the right way to think about this. Just because the average person who’s doing a PhD in economics is really stressed out about this unidimensional measure of success and has one career track in mind that would equal happiness — actually, a big feature of getting a PhD is that you get to choose what path you want.

If you see economics as a toolkit that would let you make an impact on the social problems that you want to study, I completely agree with Paul that the world is your oyster. You can choose the problem that you work on, you can choose the institution through which you work on that problem, and you can bring a really rigorous set of tools that might not otherwise be applied to that. I agree that it’s a very flexible platform. 

Matt Clancy: Although I was saying with Iowa State University that I didn’t do a predoc and take all this time, Heidi, you had a really good experience with your predoc. I’m not saying that you should avoid them.

Heidi Williams: Oftentimes the structures that a profession has sometimes get formalized as requirements, and then people do them because they’re requirements. It would be better to think, “What would be something that I can do as an investment that would give me more information about whether this is a career path I want, and also give me more certainty about what area I want to go work in, if I do get a PhD?”

Straight out of undergrad, I was really lucky. I got a job with Michael Kremer, who’s an amazing economist. I was working on a problem that was motivated by the very policy-relevant question, “How do we develop vaccines that are needed in low-income countries where it’s not profitable for private firms to want to come develop them? But how do we bring the tools of economic theory to have contract theory papers written on the right contract that could actually incentivize private firms to do research on these problems that are socially important?” 

There is this term that gets thrown around sometimes in the sciences called Pasteur’s quadrant. It’s use-inspired research. We know what the problem is that we need to solve, but you actually need to do the basic theory research in order to come up with the right solution. 

My predoc was an incredibly rewarding experience. It made me think, “Oh, I absolutely want to go get a PhD.” It really honed my view of the area of research I wanted to work in.

But somehow the lesson comes out this way: “Oh, someone had a job like that, and then they got into graduate school. I need a job like that to go to grad school,” and then it becomes this box to check. When you’re looking for these experiences, one important thing to think about is, “What am I getting out of this for my own development as a person, rather than thinking of it as a credentialing mechanism?”

It’s also really important to think about the impact that you can have by advising and teaching students. When you are an academic, you do your own research on problems that you think are important. But through your advising and teaching, you can also guide students towards working on those questions and support their work on those questions.

I don’t know if either of you would like to share an example of that. For me, one of the main reasons why I have found it rewarding to stay in academia is providing this important source of value.

Paul Niehaus: The challenge of being individually productive is always interesting. You still find new problems to work on. But the challenge of creating a community around you — to have people who are collectively productive and creative and find good problems to work on — is so much more motivating.

Leadership in the academic sector looks different than leadership in the private sector. There you might get promoted through the ranks and at some point be doing strategy and bigger picture stuff. There isn’t an obvious analog to that within the academy, but the kinds of mentorship and soft leadership that you can have by creating paths for younger researchers are exciting and rewarding as well.

Heidi Williams: Matt, Paul, and Tyler, who’s here with us, each provide templates of mentorship. You can carve out ways to support people in academic research that I think are really great.

Paul Niehaus: Maybe this is segueing into things to think about if you do decide to do a PhD. But one thing that I do find very different in my academic versus non-academic experiences is that the non-academic experiences are intrinsically team efforts. You join a team, you’re doing something together, everybody’s all in. For many people, if it’s a good team, and if the purpose you’re working towards is something you care about, then that can be an incredibly fulfilling experience.

In the academy, that doesn’t happen on its own. You have to be very intentional about finding the right people and putting those teams together and deciding what level of commitment you’re ready to make to each other. 

For people who came and ultimately left, the key factor for them was just not having found that team, and experiencing a very solitary exercise. They were sitting alone in their room with a whiteboard or with their laptop, and that was not what they were looking for professionally. 

So if you choose this route, have this awareness that you’re going to have to be much more intentional to have that experience of doing something important together.

Heidi Williams: If you do get a PhD, not because you want to be famous and publish papers in prestigious journals, but because you see this as a toolkit for making progress on problems that you care about — one thing that students may struggle with is that that’s not the average reason why your peers are there. It may not be easy to find an advisor who empathizes with that being the reason that you’re there. 

Paul, you’re one of the people that I think of as most thoughtful on this. How do you structure support for retaining your center of focus on what’s most important to you, as opposed to what’s most important to the institution and people around you?

Paul Niehaus: Within economics, I do think there has been a big shift in recent years. There was a time when a lot of people would feel very uncomfortable talking to advisors about any sort of “non-traditional” career path, say about a non-academic job that they might be interested in. There’s fairly broad acceptance that that is not good, and that departments should create a culture where you can talk about anything that you want to do and be supported. In many places, I think that is also increasingly the reality. We’re not all the way there, but I feel optimistic about that.

It’s important to be intentional about creating and finding a community of people who are like-minded and supportive. Sometimes I feel like there’s this invisible divide between people who are there mainly because they are curious and they like to satisfy their curiosity, and people who are there because they believe that if they’re thoughtful, they might be able to have a big impact on the world through what they do.

They’re all wonderful people, and I don’t dislike curious people, but the second group is my tribe. Finding those people and spending time with them is super fun and life-giving, and it also helps me when I have to make decisions about what I am going to prioritize. I know that within that community, certain things are respected and valued, even if they don’t necessarily maximize the number of lines on your CV. 

With my co-authors, we are very explicit and open with each other that what we’re hoping to do is to improve anti-poverty policy in India, and that we’re all comfortable with the fact that that may mean we don’t publish as many papers, and that’s okay.

Heidi Williams: I want to talk about the fact that for many people, the institution where they spend time can have quite a substantive impact on what they value. Where you work can impact the way you think about what parts of your work are socially valuable, even in subtle ways. 

If you get a PhD in economics, you can end up teaching in a business school, a public policy school, an economics department, or a public health school. There are lots of different academic jobs that you could have. People often think of it as, “Well, I’m going to take the best job that I get, in the microcolony of environments that’s most attractive to me.” But in my experience, those institutions can offer very different incentives for what kinds of things you work on.

Many economists who study innovation teach at business schools, and they end up teaching courses for MBAs. Many of the problems that they end up getting exposed to are problems relevant to private sector firms that are doing innovation. 

There’s also some alternative state of the world where all of the public policy schools recognize that innovation policy is a really important area, and everyone with my background is teaching masters of public policy students. And are then asking, “What do we need to train the next generation of policymakers that are going to really affect science and innovation policy?” 

For some reason, that split happened, and most people like me teach at a business school, and in my view, that probably had a really large impact on what kinds of questions people study. 

Matt, you can comment on some broader institutional differences across research in different environments. But even within academia, this is an issue that can really matter.

Matt Clancy: For much of my career, I don’t think I appreciated how important your social environment is. When I applied to college, I got into University of Chicago and Iowa State University. I went to Iowa State University, because I thought, “Well, it’s cheaper, and it’s all physics.” I was going to major in physics. That’s the end of my thinking about that. I didn’t think about who my peers would be.

That probably would have made a difference, because in my subsequent experience, who my peers were did influence me quite a lot, such as working with USDA doing use-focused research. That deviated me from what I had thought of as the most valuable research when I was doing my PhD. When I came back to work at Iowa State University, somewhat by accident, I was given two office choices, and one was the Department of Economics and the other was the Agricultural Entrepreneurship Initiative Center, where I ended up taking an office because they were the ones who I thought it was good for me to be in the same building with.

I think the subsequent years were really different. I was surrounded by entrepreneurs and people who weren’t interested in talking about what my research was. But, they were trying to encourage students to start businesses and talking about these kinds of things. That affected what I viewed as a useful contribution that I could make as an academic, and I started New Things Under The Sun, a living literature review project, to try to make academic literature accessible to not only other academics, but also policymakers and these entrepreneurs are trying to start businesses.

The entrepreneurs often think that academic papers are too disconnected and irrelevant to their needs, because they’re just lots of equations and 60 pages long. But I thought that there was a lot of value in the literature and started that project. I probably wouldn’t have started it if I had been just across the street in the other department and been talking about my research projects all the time.

Then, I went to work for the Institute for Progress. Again, that was policy focused and policy relevant. Now, at Open Philanthropy, once again the most important research is viewed differently from academia. 

I don’t know for sure how easy it is to select into the right environment until you’ve tried it, maybe there’s nothing better you can do than just sample. Be aware that the values of your peers is not the only way things can be. Do you guys have any thoughts on that?

Paul Niehaus: Over time, one thing I look for is the people to hang out with. Insights come along, and sometimes an insight is a great policy idea, or sometimes it’s actually a better business idea, sometimes it’s a better research idea. So I just love being with people who are flexible about that and happy to consider any of those possibilities, as opposed to people who are always looking for just one of those things. That flexibility in your intellectual peers is worth looking for.

Heidi Williams: It’s also very rewarding to look for institutions that support that broad approach too. If you come up with a problem and think, “This would be socially valuable to do,” some institutions may say, “Well, we can’t really support that, or that’s not the work that we do.” But sometimes you match with an institution that says, “We agree that’s high social value. Just find a way to make that happen.” 

Or you meet people that have that mindset. Paul, you’re a great example. You’ve been in academia, you’ve started a non-profit, you’ve started a for-profit. You think very flexibly about how to get a socially valuable idea out there. You don’t think, “You know, there’s one specific tool that I have, and if this isn’t there, you know, that’s just an idea that’s lost in history.” 

The three of us have talked about institutions that are a little more narrowly focused. For instance, maybe for-profit spin outs could happen, but if it’s an idea that’s not profitable, it’s discarded and never brought up. Or an organization like Open Philanthropy is trying to do a very focused research on one question, but along the way, they come upon other questions that they would like to know the answer to, but those aren’t the current priority that they’re working on.

It would be great to find better ways of connecting those use-inspired questions that arise along the way. An individual institution might not have the time or interest to pursue them, but can we more publicly raise those as questions that will be useful for people that are in academia or in more flexible settings to pursue? I’m curious if you have any examples of that, that you would flag as productive case studies.

Paul Niehaus: I think it’s about mindset and connective tissue. Organizational specialization is a good thing, and it’s important and productive. But I see this issue when I talk to my colleagues about the policy impact of their work. 

One of my colleagues said to me, for example, “In Europe, there’s this very well-established process where committees consume the latest research and it feeds into EU policymaking, but in the U.S., I just don’t know what I’m doing.” Then, I call Heidi and say, “We need to get this guy connected to some people at Brookings, because we need connective tissue between the university and the sort of places in DC that are doing the hard work of translating this to make it legible to policymakers.” 

To me, that was an example of one person working seamlessly with one set of institutions, and in another case, none of that connective tissue had been built and was clearly needed. 

Matt Clancy: You said connective tissue. I’ve studied a lot of the economics of innovation. In the hard sciences, they have a direct connection to industry, because industry is building new technologies out of academic discoveries that are made, whether it’s mRNA vaccines or rockets. In the social sciences, we haven’t often had that. We haven’t had organizations reading the latest social science research to figure out how to set up new products.

That’s feedback that we have been missing, that is really healthy for the field, to just hear how your ideas play out in the real world. How theories work or don’t work, replication, and validation of what we’re doing. If it doesn’t work, then that becomes generative of figuring out why. From a purely self-interested academic perspective, it’s really useful to create more of this circulation among different groups.

How do we find this connective tissue? You can take a sabbatical at a government agency, or you can sit on some of these joint advisory committees or something. But, the other thing you can do is try to find people like yourself, Heidi, the people who are in academia and engaging with a world that’s wider.

Paul Niehaus: This happens but in a too idiosyncratic way. I make an effort to do this. I was talking to Heidi’s colleague, Al Roth, a Nobel Laureate for market design. He has a weekly tea with his students, and occasionally an entrepreneur will write to him saying “Oh, I have this market design question related to this market I’m trying to build in my startup.” Al will say, “Come to tea and hang out with us.” Then one of the students may end up picking up that problem to work on it. These things happen, but now it’s driven by individual people who make an effort to bridge the gap. The hope in the longer term is that we’re going to see it more institutionalized. 

Matt Clancy: There’s another virtue to online public writing. It’s accessible to people outside your audience. We’ve got a great system for communicating academic ideas to each other on the seminar circuits, conferences, these journals, within academia. But what if you want to reach people outside that bubble? 

I hear from people all the time who read New Things Under The Sun who are practitioners or policymakers. We just meet for virtual coffee or Zoom to talk about some problem they’re facing. They want to know if the academic literature has anything of value to say. I imagine that if more people had the time and space to communicate about their field in a way that is discoverable by people not in the field, then that’d be another way to build these connections.

Heidi Williams: As we wrap up, could you give a sales pitch to people who are motivated to use research to make progress on an important social problem? What’s the case for them to go down that road, as opposed to doing something that is more direct service and less of this longer term path?

Matt Clancy: I’ll take a shot. The best case is that you can think of knowledge creation and research as a lever that can have very long run impacts. If we can discover a marginally better way to do something, such as how we fund science or how we run peer review, then this evidence-based knowledge can spill out over the whole world.

One of the main values of knowledge is that it can be applied by everyone and it’s not trapped in any specific context. It’s non-rival. You can be exceptional at your role in direct service, but it’s hard to extend your reach far. Research can go really far if it’s done well and if it targets problems that matter. That’s my pitch.

Paul Niehaus: I love it. I don’t want to speak for science broadly — what do I know about many of the sciences? But Matt’s point is that if there are things that are well-remunerated by the world, with the way the world is currently structured, there are going to be plenty of people to work on those problems. The problems that are going to be neglected, and therefore the ones where you’re going to be able to have an outsized impact, are the ones that are creating these public goods. Knowledge that you can’t capture all of the return and profit from it yourself — that’s where the huge returns are going to be.

I think of this as a useful heuristic for myself. I try to find things that are not going to benefit me privately, precisely because that means they’re likely undervalued and high impact. For economics in particular, I went into economics because to address the pressing issues of our time, I need to be able to think about human behavior quantitatively. I think that was a great heuristic and still is.

Matt Clancy: Compared to other social sciences, economics has a disproportionate policy impact. There are statistics about how often economists testify before Congress, and it’s about twice the other social sciences all added together. So if you are going to pick a field where your goal is to have policy impact, economics is empirically really strong. 

What about you, Heidi? What are your opinions about why people should do an economics PhD, if they want to have a positive impact?

Heidi Williams: It’s related to things that both of you touched on. 

When you think about, “What do I want the scale of impact for my work to be?” I think it’s really hard to think of that only in a direct sense. I’m somebody who really values my teaching and my direct advising. But at the end of the day, I want some substantial part of my work to be feeding into making systemic change. I want to be doing that in a way that’s not just based on my own ideology and theory about what we should do as a society, but rather based on research that informs and gives me confidence that we can do something better than what we’re doing right now.

Research plays such a unique role in honest advocacy for progress in a very directed way, and it’s a very rewarding life path. Academia is a place where you can have direct service of teaching, advising, and having individual relationships with students, and at the same time, you’re able to scale your impact through research that informs broader, more systematic change. I find this really rewarding.

Paul Niehaus: Just as a data point — although we’re pretty clear-eyed about the constraints and limitations you sometimes face in academia, I would also say that we’re having a blast.

Heidi Williams: Yes. On the right day, you will get me to tell you about how horrible academia is. But at the end of the day, I feel very happy with my job.

Matt Clancy: I would say you can get a PhD and do research and actually not be in academia. It is also possible. 

Heidi Williams: You can be very happy.

Matt Clancy: Yes. That’s right.

Heidi Williams: So good. I think that’s a good note to wrap up on.

Matt Clancy: Thank you.

Caleb Watney: The Metascience 101 podcast series has come to a close, but our colleague Tim Hwang will continue releasing fascinating interviews about metascience on this podcast feed. So stay tuned! 

You can find more information about the Macroscience newsletter at macroscience.org. You can learn more about the Institute for Progress and our metascience work at ifp.org, and if you have any questions about this series you can find our contact info there.

A special thanks to our colleagues Matt Esche, Santi Ruiz, and Tim Hwang for their help in producing this series. Thanks to all of our amazing experts who joined us for the workshop. Thanks to Stripe for hosting. Thanks to Prom Creative for editing. Thanks to you, the listener, for joining us for this Metascience 101 series.