When Ginkgo Bioworks needed to design a novel enzyme capable of breaking down a specific class of industrial pollutants, the traditional approach would have involved years of iterative experimental work: proposing candidate protein sequences, synthesizing them, testing their activity, and using the results to inform the next round of design. The process is expensive, slow, and fundamentally limited by the throughput of wet lab experimentation.

Working with OpenAI's research team, Ginkgo completed an equivalent design process in approximately six weeks. The collaboration produced a protein design tool that combines OpenAI's large language model capabilities with Ginkgo's proprietary biological data — the result of years of high-throughput protein synthesis and characterization experiments — to predict which protein sequences are likely to have desired functional properties before any physical synthesis occurs.

The tool is not a general-purpose protein design system. It is specifically optimized for the class of enzymes that Ginkgo works with most frequently, trained on data that reflects the company's particular experimental focus. But within that domain, its performance is remarkable: in blind validation tests, proteins selected by the AI system showed the desired activity at a rate approximately four times higher than proteins selected by experienced human researchers using conventional computational tools.

"We are not replacing the biology. We are compressing the iteration cycle. Instead of running 500 experiments to find 5 that work, we are running 50 experiments to find 5 that work. That is a 10x improvement in efficiency, and it compounds across every project."

— Jason Kelly, CEO, Ginkgo Bioworks

The collaboration is one of several high-profile examples of AI being applied to accelerate scientific discovery in biology. DeepMind's AlphaFold 3 has transformed structural biology by predicting protein structures with near-experimental accuracy. Recursion Pharmaceuticals has used AI to identify drug candidates that would have been missed by conventional screening approaches. Insilico Medicine has advanced AI-designed drug molecules through clinical trials. The Ginkgo-OpenAI work adds to this growing body of evidence that AI can meaningfully accelerate the pace of biological research.

What distinguishes the Ginkgo collaboration is its focus on the design-build-test-learn cycle that underlies most experimental biology. Rather than simply predicting the properties of known proteins — a task at which AI systems now excel — the tool is designed to generate novel protein sequences that are predicted to have desired properties. This is a harder problem, and the results are correspondingly more uncertain, but the potential impact is also greater.

The collaboration also illustrates the importance of domain-specific data. OpenAI's general language model capabilities provided the architectural foundation, but the tool's performance depends critically on Ginkgo's proprietary experimental data. This creates a competitive dynamic that will shape the AI-in-science landscape for years to come: the companies and research institutions that have accumulated the most high-quality experimental data will be best positioned to build the most capable AI tools in their domains.

For the broader scientific community, the Ginkgo-OpenAI collaboration raises important questions about access and equity. If the most powerful AI tools for biological research are proprietary systems built on proprietary data, the benefits of AI-accelerated science may accrue primarily to well-funded commercial entities rather than to academic researchers or scientists in lower-income countries. Several academic groups are working on open-source alternatives, but they face a fundamental disadvantage in the quantity and quality of training data available to them.

"The risk is that we end up with a two-tier scientific ecosystem — one where well-funded companies can use AI to compress years of research into weeks, and another where academic labs are still doing science the old way because they cannot afford the tools."

— Professor of Computational Biology, UC Berkeley

Ginkgo and OpenAI have indicated that they plan to publish the methodology underlying the collaboration, though the specific training data and model weights will remain proprietary. Whether this represents an adequate contribution to the scientific commons is a question that the research community will be debating for some time.