As a graduate student in the lab of protein design expert David Baker, PhD, at the University of Washington, David Younger, PhD, noticed a major bottleneck to designing massive amounts of proteins in silico: how does anyone test so many proteins in silico? 

“You would have someone in the lab designing tens or hundreds of thousands of proteins; they might only be able to test 10 of them, or maybe 100 if the project is further along,” Younger told GEN Edge. Even then, you’re only measuring binding against one thing. You’re not measuring specificity or cross-reactivity and not determining the epitope of exactly where that protein is binding.” 

Younger began to ponder how to use synthetic biology and machine learning to engineer a cellular system to generate and analyze large multiplex data sets around how proteins bind to each other.

“With any kind of biological technology, you go in hoping for a paper, and anything else is kind of icing on the cake,” said Younger. “In this case, we were happily surprised, three or four years later, that this platform is really working. This has implications for helping computational protein designers test designs with higher throughput, and there are some real, near-term pharma applications.”  

In 2017, Younger spun out his research into a company called A-Alpha Bio, together with Randolph Lopez, PhD, to use their platform, called AlphaSeq, to develop an in-house therapeutic pipeline and form partnerships with pharmaceutical companies to inform the discovery and development of novel therapeutics. But Younger’s interests aren’t just limited to pharma—he’s also been interested in mitigating potential future biothreats. 

In 2022, A-Alpha Bio received funding from the Department of Defense’s (DOD) Joint Program Executive Office for Chemical, Biological, Radiological, and Nuclear Defense’s (JPEO-CBRND’s) Generative Unconstrained Intelligent Drug Engineering (GUIDE) program to preemptively generate data and train computational models that enable rapid medical countermeasures against potential future biothreats.  

The DOD’s initial investment up through 2023 had been $3.4 million, and now funding has expanded as A-Alpha Bio has been awarded another $14.5 million to accelerate antibody discovery and optimization for likely biothreats in partnership with the Lawrence Livermore National Laboratory (LLNL), a federally funded research and development center. With this extra help, A-Alpha Bio will be able to make large datasets of antibody-antigen binding and train and test predictive computer models for unknown pathogen families of concern. 

“We’ve measured 10 million interactions in this collaboration and have focused so far on three pathogenic families,” said Younger. “This extension of funding will allow us to really broaden both our potential impact on pandemic preparedness and our interest in training more generalizable models. Having an interaction space that covers more pathogen diversity is exactly what we’re interested in doing.”  

What’s love got to do with it? 

At the heart of Younger’s synthetic biology approach to measuring very large numbers of protein interactions with quantitative accuracy are two simple concepts: yeast phage display and mating. 

Essentially, Younger built two distinct yeast surface display libraries—one with thousands of unique antibodies and the other with variants and homologs of antigens. The two libraries, which each have a different yeast sex (i.e., A and alpha), are swirled together in a liquid culture where the yeast collide. Cells will stick and then fuse if there is a sufficiently strong interaction.  

Once the culture has grown, the fused yeast can be taken out and sequenced to be counted. This shows the protein-protein interactions and how strong each pairwise interaction is by measuring how often the yeast fuses. Younger claims that, to gather enough data for machine learning to analyze to optimize crucial binding properties, the process frequently necessitates several cycles of generating libraries and screening. 

The yeast libraries can be custom-generated for different applications, such as cancer, which is the therapeutic focus for A-Alpha Bio. For an oncology program, the antigen library can consist of a huge catalog of variants of a particular protein of interest. However, because the data is so quantitative and enables multivariable optimization of critical binding properties, AlphaSeq is attractive for screening potential synthetic biothreats. 

“We want to take every possible viral family that is a possible future pandemic, either naturally derived or potentially weaponized, and we want to essentially generate data as if there were a pandemic in that family,” said Younger. “We want to understand how antibodies bind to diverse antigens within that antigenic family. We also want to train predictive machine learning models so that we can more easily respond if a new variant within that viral family crops up. It’s hard to generate many antibodies against many antigens with any other platform.” 

Data, the digital diamonds 

One aspect of the partnership with the DOD that Younger is particularly excited about is having full usage rights for the expansive database they are building. It is approaching one billion protein interaction measurements, which Younger says is several orders of magnitude larger than anything that exists, such as the Biological General Repository for Interaction Datasets (BioGRID) database, which has several million entries. 

“The most exciting companies combine exciting proprietary sources of data that give new insights with sophisticated computation,” said Younger. Our work with groups like LLNL, DOD, and JPEO-CBRND allows us to continue to invest in this massive data generation campaign, which we can use to leverage for all other applications that we’re interested in as a company.” 

Even though the biosecurity work doesn’t overlap with the focus of A-Alpha Bio’s therapeutic pipeline, Younger said that these data can still be used to train their own machine learning models and be applied to their drug development campaigns in areas like oncology that are still in their early discovery phases. 

“We’re looking at an experimental tool that allows us to generate these really comprehensive, highly quantitative, vast data sets that are just impossible to generate using any conventional technologies and then layering machine learning on top of that actually to generate the practical insights and have that impact that we desire to have,” said Younger

“It’s a really exciting time for the field, and we’re excited to be a part of it.” 

Previous articleSevere COVID-19 Lung Disease Linked to Ferroptosis
Next articleCannibalism Genes in Human Genome Evolved to Serve New Cellular Functions