What Is AlphaFold2?

AlphaFold2 is a machine learning model that leverages neural networks to accurately predict the 3D structures of proteins using only their amino acid sequences as input. AlphaFold2 is not a large language model (LLM) in the traditional sense, like GPT or BERT, which are designed to understand and generate human language.

How Was AlphaFold2 Created?

AlphaFold2 (Jumper et al., 2021) is a machine learning model developed by Google DeepMind in 2020 to accurately predict 3D protein structures from amino acid sequences. AlphaFold2 was trained using data from the Protein Data Bank (PBD), an archive of information about the three-dimensional structures of proteins, nucleic acids, and complex assemblies. The structures predicted by AlphaFold2 can be deposited back into the PDB, contributing to the database’s growth for future research.

AlphaFold2 marks a breakthrough in structural biology, significantly outperforming earlier methods. Previously, determining protein structures through experimental techniques like x-ray crystallography or cryogenic electron microscopy (cryo-EM) could take months or years, but highly accurate protein structure prediction with AlphaFold2 takes just hours.

AlphaFold2 was used to create the AlphaFold Protein Structure Database as a collaboration between Google DeepMind and the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). It provides high-quality, accurate predictions of protein structures at a massive scale. The database was developed as a result of AlphaFold’s success in the Critical Assessment of Protein Structure Prediction (CASP) competitions, specifically CASP13 and CASP14, where AlphaFold showcased its revolutionary ability as a structure model with atomic-level accuracy (Tunyasuvunakool et al., 2021).

How Accurate Is AlphaFold2?

AlphaFold2 gained widespread attention by making a significant leap in the CASP13 competition in 2018, placing first in the free modeling category for modeling novel protein folds. It then dominated the CASP14 competition in 2020, achieving a median global distance test (GDT) score of over 90. GDT measures the accuracy of predicted protein structures compared to experimental ones, and a score above 90 indicates near-atomic accuracy.

In addition, AlphaFold2's accuracy is typically evaluated using the predicted local distance difference test (pLDDT) score, which ranges from 0 to 100. A pLDDT score above 90 indicates high confidence in the predicted structural conformation, often comparable to experimental data. Structures with pLDDT scores between 70 and 90 are considered moderately accurate, while regions below 50 are generally considered low confidence and likely to represent disordered or flexible regions. In practice, AlphaFold2 tends to produce highly reliable predictions for the core regions of proteins (often with scores above 90) but struggles with more disordered or flexible parts, which typically score lower. Overall, AlphaFold2 has dramatically improved the prediction of protein structures with high confidence, particularly in regions with clear, well-ordered tertiary structures.

How Does AlphaFold2 Predict Protein Structures?

How AlphaFold2 Works

Amino acid sequence: Start with the protein sequence, representing that protein’s building blocks with each of their particular side chains.
Multiple sequence alignment (MSA): AlphaFold2 compares the target sequence to similar sequences in other organisms. This is called homology, and these comparisons are essential for identifying important structural patterns.
Structure prediction: AlphaFold2 models both short- and long-range amino acid interactions to predict the protein's 3D fold.
Refinement process: The model iteratively refines its predictions, improving the accuracy of 3D structure predictions with each step.
Output: The result is a 3D model of the protein, often as precise as experimental methods but much faster.
Confidence scores: AlphaFold2 provides confidence scores to indicate the reliability of different parts of the predicted structure.

What Are Some Applications of AlphaFold2?

Applications of AlphaFold2 include:

Understanding protein function: The AlphaFold2 algorithm helps scientists study protein structures related to diseases like cancer, providing insights into their function and role in disease progression.
Identifying drug targets: Pharmaceutical companies use AlphaFold2 to model disease-related proteins, which acts as a template for drug design and speeds up the discovery of druggable targets.
Enzyme engineering: Biotechnologists use AlphaFold2 to design and optimize enzymes for applications like biofuel production and waste degradation.
Tracing protein evolution: AlphaFold2 allows evolutionary biologists to compare protein structures across species, helping trace their evolution and adaptation.
Accessible learning tools: AlphaFold2 provides free access to high-quality protein structure data, making it easier for students and researchers in underfunded areas to study protein biology.

What Is the Difference Between AlphaFold, AlphaFold2, and AlphaFold3?

AlphaFold, AlphaFold2, and AlphaFold3 represent significant advances in protein structure prediction. Each model builds on the successes of its predecessors to tackle increasingly complex biological challenges.

AlphaFold (2018): The first version of AlphaFold was developed to predict the 3D structures of proteins based on their amino acid sequences. While groundbreaking in its ability to predict structures using evolutionary information and multiple sequence alignments, it had limitations in dealing with proteins that needed more evolutionary data.
AlphaFold2 (2020): AlphaFold2 was a major breakthrough, winning the CASP14 competition and achieving near-experimental accuracy. It introduced an end-to-end deep learning architecture using transformer networks and an iterative refinement process that improved the accuracy of predictions even for proteins with little evolutionary information. AlphaFold2 can predict monomeric protein structures with high confidence, bringing the protein folding problem close to being "solved" for many cases.
AlphaFold3 (2024): The latest version, AlphaFold3, expands the scope to predict not just protein structures but also protein-protein interactions and interactions with other biomolecules like DNA, RNA, and small ligands. This makes it particularly useful for understanding complex biological systems, where proteins don’t function in isolation but as part of multi-molecular networks. This version integrates more contextual information and handles multi-molecular environments, making it more applicable to real-world biological challenges such as drug discovery and understanding cellular processes.

Each iteration of AlphaFold has progressively expanded the range of biological problems it can address, moving from single protein structures in AlphaFold to complex molecular interactions in AlphaFold3, opening new avenues for research in fields like drug discovery and systems biology.

What Is AlphaFold-Multimer?

AlphaFold-Multimer is an extension of the AlphaFold2 model designed to predict the structures of protein complexes comprising multiple interacting protein chains. While AlphaFold2 focuses on predicting the structure of individual proteins, protein complex prediction with AlphaFold-Multimer addresses the need to understand protein-protein interactions, which are crucial for many biological processes, such as enzyme activity, immune responses, and signaling pathways.

How Has AlphaFold Models Impacted Drug Discovery?

AlphaFold models have dramatically transformed drug discovery by accelerating the understanding of protein structure variants, which is critical for developing new drugs.

Improvements in protein structure prediction have lasting impacts on several drug discovery use cases, including:

Accelerated target identification and validation: By providing highly accurate predictions of protein structures, AlphaFold has expedited the identification of new drug targets, particularly those that were previously intractable due to the lack of structural data.
Enhanced structure-based drug design: Access to detailed protein structures lets researchers perform more precise virtual screening and rational drug design. This leads to identifying hit compounds with better binding affinities and specificity.
Understanding disease mechanisms: AlphaFold's predictions help elucidate the structural basis of diseases caused by protein misfolding or mutations. This understanding is crucial for developing therapies targeting these aberrant proteins.
Facilitating protein-protein interaction studies: AlphaFold-Multimer enables the modeling of protein complexes, providing insights into protein-protein interactions essential for developing inhibitors or modulators in signaling pathways.

The benefits of AlphaFold models on drug discovery are far-reaching. In the short term, this includes:

Reducing drug development costs and time: By streamlining the early stages of drug discovery, AlphaFold reduces the need for time-consuming and costly experimental methods like x-ray crystallography and cryo-EM for structure determination.
Advancing neglected disease research: For diseases that receive less research funding, AlphaFold provides an invaluable resource by supplying structural data that would otherwise be unattainable, aiding in the development of new treatments.
Supporting academic and small biotech research: AlphaFold's open-source nature democratizes access to structural data, enabling researchers in academia and smaller biotech firms to contribute to drug discovery efforts.

In the future, this could include new applications such as enabling personalized medicine. Structural predictions of mutant proteins in individuals can inform the development of personalized therapies, particularly in some cancers where a specific mutation can correlate with disease progression.

Is AlphaFold2 Open Source?

AlphaFold2's code and models were made open source in 2021, allowing anyone to access, modify, and deploy the model. Unlike its predecessor, AlphaFold3 is not fully open source.

Google DeepMind and Isomorphic Labs released AlphaFold3 as a managed service through platforms like the AlphaFold Server, making its key features free to the scientific community for noncommercial academic research. This allows researchers to generate molecular structure predictions without installing the model themselves, providing broad access but without full control over the software's underlying code.

While researchers can use AlphaFold3’s capabilities for free through designated platforms, they cannot directly access or modify the underlying source code as they could with AlphaFold2. However, this still provides significant value for academic and research purposes, especially in drug discovery and biological research.

Partnerships with Isomorphic Labs or other licensing options might be required for commercial use.

What Are Some Challenges to Using AlphaFold2?

High computational requirements: AlphaFold2 demands significant computational resources, including GPUs and large memory capacity, which can be challenging for institutions or researchers without access to high-performance computing (HPC) clusters.
Model setup and installation: Setting up AlphaFold2 locally can be technically challenging, requiring familiarity with tools like Docker, Python, and GPU frameworks (such as CUDA®). In addition, installation requires the correct configurations and dependencies to avoid compatibility issues.
Security and compliance: For industries such as pharma, using third-party cloud services (e.g., AWS or Google Cloud) can raise security and compliance concerns about sensitive proprietary datasets. These problems are often solved by deploying the model on company-owned and controlled servers.

Next Steps

AI for Drug Discovery

Learn more about the role of AI in drug discovery research.

Read Blog

Stay Up to Date

Stay Informed