The Bitter Lesson of Synthetic Biology

Back

The theory behind the original synthetic biology manifestos was that we could build biological systems up from their component genetic parts, and recombine them in a modular fashion. We, largely, could not - or at least not in a way we could predict what they’d actually do. The biggest and most bitter lesson that can be learned from the last 20 years of synthetic biology is that full context creates far more useful knowledge than individually controlled or simulated cases.

Biological systems are all interconnected - every change can affect everything else - and so the idea of modularity just doesn’t really work. We can’t really predict what biological systems will do from first principles, and this is alarmingly unsatisfactory for many biologists.

In fields like software engineering the entire stack is designed so you can have abstractions and interfacing, where each abstraction or interface is sufficiently small such that a human can understand them. These stacks are then built on top of each other. This is the same for most fields of human endeavor from mechanical engineering to VLSI to civil engineering. But biology is different: no abstraction really exists between components, and everything interacts with each other.

There is one field where “everything interacting with everything” is actually embraced: AI. AI is one field where full context (with LLM systems) seems to lead to more intelligent decision making. AI systems can be remarkably simple (<5000 LOC for implementing LLMs), yet leverage a massive amount of data and computation to get good results. Biological systems, likewise, can operate over remarkably simple ingredients: bacteria can be grown with a basic chemical slurry and an incubator - yet with billions of dollars and decades of research we still don’t understand them fully. It could be that full context, with all those seemingly irrelevant details, could be required for simulating sufficiently complex systems like intelligence or biology.

In this way, AI systems naturally mirror biological systems. We roughly understand how they work from first principles (linear algebra or molecules, respectively), but once you put them together we really don’t know how the emergent properties of intelligence or life arise. This black box is frustrating! But we might be able to leverage this property: we may be able to effectively simulate biological systems - not through first principle understanding of the components, but through massive data collection with the full context of our modifications.

In some places, we are already doing this: we have gotten remarkably good at modeling protein folding by using AI systems that simply learn from all protein space. But we do not have the data yet for learning how cells could work in a similar way. The inputs and the outputs are less obvious than things like proteins, and cells are arguably much more complex. But alas, the route to understanding cells may not be traditional modeling or individual experiments testing components, but rather, massive data collection with all the brilliant and bullshit details of real life.