Predicting which proteins bind to each other, or protein-protein interactions, has been a challenge for methods based in computational biology. One of the reasons is primarily due to the vast diversity and complexity of protein structures. Now, a team of scientists has developed DiffPALM (Differentiable Pairing using Alignment-based Language Models), an AI-based approach that can significantly advance the prediction of interacting protein sequences.
The study is published in PNAS, in the paper, “Pairing interacting protein sequences using masked language modeling.”
DiffPALM leverages the power of protein language models, an advanced machine learning concept borrowed from natural language processing, to analyze and predict protein interactions among the members of two protein families with unprecedented accuracy. It uses these machine learning techniques to predict interacting protein pairs. This leads to a significant improvement over other methods that often require large, diverse datasets, and struggle with the complexity of eukaryotic protein complexes.
Another advantage of DiffPALM is its versatility, as it can work even with smaller sequence datasets and thus address rare proteins that have few homologs. It relies on protein language models trained on multiple sequence alignments (MSAs), such as the MSA Transformer and AlphaFold’s EvoFormer module, which allows it to predict the complex interactions between proteins with a high degree of accuracy. Even more, using DiffPALM shows high promise when it comes to predicting the structure of protein complexes, which are intricate structures formed by the binding of multiple proteins, and are essential for many of the cell’s processes.
In the study, the team compared DiffPALM with traditional coevolution-based pairing methods, which study how protein sequences evolve together over time when they interact closely. This is an extremely important aspect of molecular and cell biology, which is well-captured by protein language models trained on MSAs. DiffPALM is shown to outperform traditional methods.
The application of DiffPALM is obvious in the field of basic protein biology, but extends beyond it, as it has the potential to become a powerful tool in medical research and drug development. For instance, accurately predicting protein interactions can help understand disease mechanisms and develop targeted therapies.
The researchers have made DiffPALM freely available, hoping that the scientific community adopts it widely to further advancements in computational biology and enable researchers to explore the complexities of protein interactions.