AlphaFold2 is a machine learning model that leverages neural networks to accurately predict the 3D structures of proteins using only their amino acid sequences as input. AlphaFold2 is not a large language model (LLM) in the traditional sense, like GPT or BERT, which are designed to understand and generate human language.
AlphaFold2 (Jumper et al., 2021) is a machine learning model developed by Google DeepMind in 2020 to accurately predict 3D protein structures from amino acid sequences. AlphaFold2 was trained using data from the Protein Data Bank (PBD), an archive of information about the three-dimensional structures of proteins, nucleic acids, and complex assemblies. The structures predicted by AlphaFold2 can be deposited back into the PDB, contributing to the database’s growth for future research.
AlphaFold2 marks a breakthrough in structural biology, significantly outperforming earlier methods. Previously, determining protein structures through experimental techniques like x-ray crystallography or cryogenic electron microscopy (cryo-EM) could take months or years, but highly accurate protein structure prediction with AlphaFold2 takes just hours.
AlphaFold2 was used to create the AlphaFold Protein Structure Database as a collaboration between Google DeepMind and the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). It provides high-quality, accurate predictions of protein structures at a massive scale. The database was developed as a result of AlphaFold’s success in the Critical Assessment of Protein Structure Prediction (CASP) competitions, specifically CASP13 and CASP14, where AlphaFold showcased its revolutionary ability as a structure model with atomic-level accuracy (Tunyasuvunakool et al., 2021).
AlphaFold2 gained widespread attention by making a significant leap in the CASP13 competition in 2018, placing first in the free modeling category for modeling novel protein folds. It then dominated the CASP14 competition in 2020, achieving a median global distance test (GDT) score of over 90. GDT measures the accuracy of predicted protein structures compared to experimental ones, and a score above 90 indicates near-atomic accuracy.
In addition, AlphaFold2's accuracy is typically evaluated using the predicted local distance difference test (pLDDT) score, which ranges from 0 to 100. A pLDDT score above 90 indicates high confidence in the predicted structural conformation, often comparable to experimental data. Structures with pLDDT scores between 70 and 90 are considered moderately accurate, while regions below 50 are generally considered low confidence and likely to represent disordered or flexible regions. In practice, AlphaFold2 tends to produce highly reliable predictions for the core regions of proteins (often with scores above 90) but struggles with more disordered or flexible parts, which typically score lower. Overall, AlphaFold2 has dramatically improved the prediction of protein structures with high confidence, particularly in regions with clear, well-ordered tertiary structures.
How AlphaFold2 Works
Applications of AlphaFold2 include:
AlphaFold, AlphaFold2, and AlphaFold3 represent significant advances in protein structure prediction. Each model builds on the successes of its predecessors to tackle increasingly complex biological challenges.
Each iteration of AlphaFold has progressively expanded the range of biological problems it can address, moving from single protein structures in AlphaFold to complex molecular interactions in AlphaFold3, opening new avenues for research in fields like drug discovery and systems biology.
AlphaFold-Multimer is an extension of the AlphaFold2 model designed to predict the structures of protein complexes comprising multiple interacting protein chains. While AlphaFold2 focuses on predicting the structure of individual proteins, protein complex prediction with AlphaFold-Multimer addresses the need to understand protein-protein interactions, which are crucial for many biological processes, such as enzyme activity, immune responses, and signaling pathways.
AlphaFold models have dramatically transformed drug discovery by accelerating the understanding of protein structure variants, which is critical for developing new drugs.
Improvements in protein structure prediction have lasting impacts on several drug discovery use cases, including:
The benefits of AlphaFold models on drug discovery are far-reaching. In the short term, this includes:
In the future, this could include new applications such as enabling personalized medicine. Structural predictions of mutant proteins in individuals can inform the development of personalized therapies, particularly in some cancers where a specific mutation can correlate with disease progression.
AlphaFold2's code and models were made open source in 2021, allowing anyone to access, modify, and deploy the model. Unlike its predecessor, AlphaFold3 is not fully open source.
Google DeepMind and Isomorphic Labs released AlphaFold3 as a managed service through platforms like the AlphaFold Server, making its key features free to the scientific community for noncommercial academic research. This allows researchers to generate molecular structure predictions without installing the model themselves, providing broad access but without full control over the software's underlying code.
While researchers can use AlphaFold3’s capabilities for free through designated platforms, they cannot directly access or modify the underlying source code as they could with AlphaFold2. However, this still provides significant value for academic and research purposes, especially in drug discovery and biological research.
Partnerships with Isomorphic Labs or other licensing options might be required for commercial use.