AlphaFold is an artificial intelligence method for predicting protein structures that has been highly successful in recent tests. The alphafold command:
Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, et al. Nature. 2021 Aug;596(7873):583-589.
...and for multimer prediction:
Protein complex prediction with AlphaFold-Multimer. Evans R, O'Neill M, Pritzel A, et al. bioRxiv 2021. doi: https://doi.org/10.1101/2021.10.04.463034.
The AlphaFold Database contains modeled structures for protein sequences in UniProt:
...and is described in:
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Varadi M, Anyango S, Deshpande M, et al. Nucleic Acids Res. 2021 Nov 17:gkab1061. doi: 10.1093/nar/gkab1061.
The predicted structures vary in confidence levels and should be interpreted with caution. The database contains structures for single chains, not complexes; assembling the individual structures into a complex may give unphysical results where parts of the chains intersect or interact poorly with one another.
The alphafold command is also implemented as the AlphaFold tool. See also: Blast Protein, Modeller Comparative, AlphaFold vs. experiment, AlphaFold for cryoEM, AlphaFold for complexes, and ChimeraX videos:
Usage: alphafold fetch uniprot-id [ alignTo chain-spec [ trim true | false ]] [ colorConfidence true | false ] [ ignoreCache true | false ]
Usage: alphafold match sequence [ search true | false ] [ trim true | false ] [ colorConfidence true | false ] [ ignoreCache true | false ]
Usage: alphafold search sequence [ matrix similarity-matrix ] [ cutoff evalue ] [ maxSeqs M ]
alphafold fetch p29474
– OR – (equivalent)
open p29474 from alphafold
alphafold match #1Alternatively, the sequence can be given as any of the following:
alphafold match #3/B,D trim false
For a specified structure chain, a model is obtained for its exact UniProt entry if available, otherwise the single top hit identified by BLAT-searching the AlphaFold Database (details...). For each model with a corresponding structure chain from the alphafold match command or the alignTo option of alphafold fetch:
The matrix option indicates which amino acid similarity-matrix to use for scoring the hits (uppercase or lowercase can be used): BLOSUM45, BLOSUM50, BLOSUM62 (default), BLOSUM80, BLOSUM90, PAM30, PAM70, PAM250, or IDENTITY. The cutoff evalue is the maximum or least significant expectation value needed to qualify as a hit (default 1e-3). Results can also be limited with the maxSeqs option (default 100); this is the maximum number of unique sequences to return; more hits than this number may be obtained because multiple structures or other sequence-database entries may have the same sequence.
Superimpose the predicted structure from alphafold fetch onto a single chain in an already-open structure, and make its chain ID the same as that chain's. See also the trim option.
colorConfidence true | false
Whether to color the predicted structures by the pLDDT confidence measure in the B-factor field (default true):
– high accuracy expected
– backbone expected to be modeled well
– low confidence, caution
– should not be interpreted, may be disordered
...in other words, usingcolor bfactor palette alphafold
The Color Key graphical interface or a command can be used to draw a corresponding color key, for example:key red:low orange: yellow: cornflowerblue: blue:high [other-key-options]
ignoreCache true | false
The fetched models are stored locally in ~/Downloads/ChimeraX/AlphaFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache or ignoreCache is set to true, the file will be fetched and cached.
search true | false
When fetching models with alphafold match, whether to search the database for the most similar sequence if the UniProt accession number for a chain is not provided in the experimental structure's input file, or is provided but not found in the AlphaFold Database (true, default). The search uses a BLAT web service hosted by the UCSF RBVI. The closest sequence match for which a models is available will be retrieved, as long as the sequence identity is at least 25%. With search false, only the experimental structure's input file will be used as a potential source of UniProt accession numbers. When present, these are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.
prokaryote true | false
This option only applies to predicting complexes. When AlphaFold-Multimer is predicting a complex and finds multiple similar sequences from the same organism, it needs to decide which sequences are orthologous to which chain of the target. Since interacting genes tend to be colocalized into operons in prokaryotes, specifying prokaryote true uses smallest genomic distance (approximated by the lexicographic difference between UniProt identifiers) to identify interacting pairs of sequences within species. With prokaryote false (default), sequences from the same organism are instead paired in order of their rank similarity to the target sequences.
trim true | false
Whether to trim a predicted protein structure to the same residue range as the corresponding experimental structure given with the alphafold match command or the alignTo option of alphafold fetch. If true (default):
- Predictions with UniProt identifier determined by alphafold match from the experimental structure's input file will be trimmed to the same residue ranges as used in the experiment. These ranges are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.
- Predictions retrieved with alphafold fetch or found by alphafold match searching for similar sequences in the AlphaFold Database will be trimmed to start and end with the first and last aligned positions in the sequence alignment calculated by matchmaker as part of the superposition step.
Using trim false indicates retaining the full-length models for the UniProt sequences, which could be longer.
For monomer prediction:
Usage: alphafold predict sequence
The protein sequence to predict can be given as any of the following:
For multimer (protein complex) prediction:
Usage: alphafold predict sequence1,sequence2[,sequence3...,sequenceN] [ prokaryote true | false ]
The sequences of two or more protein chains can be specified either collectively as a chain-spec (for atomic-structure chains already open in ChimeraX), or individually within a comma-separated list using any combination of specifier types #2-4 listed above. The list should not contain any spaces. If the same protein chain occurs multiple times in the complex, its sequence should be repeated that number of times. For example, to predict a homodimer, the same sequence (or its specifier) would need to be given twice. Prediction may only be feasible for smaller complexes (details...).
AlphaFold calculations are run using Google Colab. A warning will appear saying that this Colab notebook is from github (was not authored by Google), with a button to click to run anyway. Users will need to have a Google account and to sign into it via a browser. Once that is done, the sign-in may be remembered depending on the user's browser settings; it is not kept in the ChimeraX preferences.
A single prediction run generally takes on the order of an hour or more. The process includes installing various software packages on a virtual machine, searching sequence databases, generating a multiple sequence alignment, predicting atomic coordinates, and energy-minimizing the best structure. Predicting a multimer (complex) structure may take longer than predicting the structure of a monomer with the same total number of residues. The free version of Colab limits jobs to 12 hours and may terminate them at shorter times at Google's discretion (see the FAQ). Those who want to run longer and/or more frequent calculations may wish to sign up for one of the paid Colab plans.
The model will be opened automatically and colored by confidence value. The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).