Last month, the National Security Commission on Emerging Biotechnology published a short white paper on leveraging biological data. Data is a critical national resource, they argue, driving innovation in biotechnology and needing serious investment in terms of standards and systems to make that data work.
They’re singing my song overall, but one sentence really jumped out at me. As they introduce tools for data generation, curated databases, and metadata standards they argue that these tools are national infrastructure like any other: “Like infrastructure for water, electricity, and roads across the country, the United States needs a persistent biological data infrastructure to support biotechnology advancements.”
People don’t usually talk about roads and wires when they talk about data, but they do talk about water—not the pipes and sewers, but what flows inside of them. We hear often about the deluge, river, cloud, lake, or flood of data, all language that implies straightforward, if aggressive, liquid movement to fill new spaces.
Infrastructure on the other hand is usually invisible and always unappreciated, noticeable much more when it is not working than when it works as designed. Infrastructure is easy to take for granted but really hard to build and maintain, and for that reason it’s an excellent metaphor for thinking about what science needs to make useful data in an age of AI.
In her fascinating book Data-Centric Biology, philosopher Sabina Leonelli takes this infrastructural work seriously as an object of study, focusing on the hard work of making data useful in different contexts. Instead of these water metaphors that imply movement through simple sloshing, Leonelli pushes for thinking of data journeys, which require strong infrastructure in turn:
Thinking about data journeys is important because journeys are hardly ever unproblematic. Journeys require long-term planning, reliable infrastructures, and adequate vehicles and demand energy and work, as well as a considerable amount of financial resources. They may be short or long, fast or slow. They can happen in a variety of ways and for a variety of reasons. Often they require frequent changes of vehicles and terrains, which in turn force travelers to change their ways and appearance to adapt to different landscapes and climates. Furthermore, journeys can be interrupted, disrupted, and modified as they unfold. Travelers may encounter obstacles, delays, dead ends, and unexpected shortcuts, which in turn shift the timescales, directions, and destinations of travel.
Of all the ways that people talk about biological data and discovery these days, the concept of the journey does a much better job of describing what it actually feels like to do science. Doing a new experiment is like trying to get someplace for the first time with just a paper map—or worse, just verbal directions from an ornery local. When you’ve been there a few times the path becomes second nature, until you need to tell someone what you saw there and how to get there too.
It’s these journeys and how we share them with each other that make up the practice of science, and each interaction opens another surface that can add “science friction.” Paul Edwards and colleagues studied this friction at the interface between disciplines in climate research, and the work the researchers needed to do to communicate context through metadata. They write:
Every movement of data across an interface comes at some cost in time, energy, and human attention. Every interface between groups and organizations, as well as between machines, represents a point of resistance where data can be garbled, misinterpreted, or lost. In social systems, data friction consumes energy and produces turbulence and heat—that is, conflicts, disagreements, and inexact, unruly processes.
As AI dominates the conversation, the critical need for more and better data is coming to the forefront, but there is also a risk continuing the trend of disembodiment and hiding of the work it takes to make and move the data that AI depends on. Building and maintaining metascientific infrastructure isn’t as glamorous as reaching any particular destination, but like other critical infrastructure, it shouldn’t be taken for granted.