ChimeraX docs icon

Command: alphafold

AlphaFold is an artificial intelligence method for predicting protein structures that has been highly successful in recent tests. The alphafold command:

Users should cite:

Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, et al. Nature. 2021 Aug;596(7873):583-589.

...and for multimer prediction:

Protein complex prediction with AlphaFold-Multimer. Evans R, O'Neill M, Pritzel A, et al. bioRxiv 2021. doi: https://doi.org/10.1101/2021.10.04.463034.

The AlphaFold Database contains modeled structures for protein sequences in UniProt:

...and is described in:

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Varadi M, Anyango S, Deshpande M, et al. Nucleic Acids Res. 2021 Nov 17:gkab1061. doi: 10.1093/nar/gkab1061.

The predicted structures vary in confidence levels and should be interpreted with caution. The database contains structures for single chains, not complexes; assembling the individual structures into a complex may give unphysical results where parts of the chains intersect or interact poorly with one another.

The alphafold command is also implemented as the AlphaFold tool. See also: Blast Protein, Modeller Comparative, AlphaFold vs. experiment, AlphaFold for cryoEM, AlphaFold for complexes, and ChimeraX videos:

Getting Models from the AlphaFold Database

Usage: alphafold fetch  uniprot-id  alignTo  chain-spec [ trim  true | false ]]colorConfidence  true | false ] [ ignoreCache  true | false ]
Usage: alphafold match  sequence  [ search  true | false ] [ trim  true | false ] [ colorConfidence  true | false ] [ ignoreCache  true | false ]
Usage: alphafold search  sequence  [ matrix  similarity-matrix ] [ cutoff  evalue ] [ maxSeqs  M ]

Options

alignTo  chain-spec
Superimpose the predicted structure from alphafold fetch onto a single chain in an already-open structure, and make its chain ID the same as that chain's. See also the trim option.
colorConfidence  true | false
Whether to color the predicted structures by the pLDDT confidence measure in the B-factor field (default true):

...in other words, using

color bfactor palette alphafold

The Color Key graphical interface or a command can be used to draw a corresponding color key, for example:

key red:low orange: yellow: cornflowerblue: blue:high  [other-key-options]
ignoreCache  true | false
The fetched models are stored locally in ~/Downloads/ChimeraX/AlphaFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache or ignoreCache is set to true, the file will be fetched and cached.
search  true | false
When fetching models with alphafold match, whether to search the database for the most similar sequence if the UniProt accession number for a chain is not provided in the experimental structure's input file, or is provided but not found in the AlphaFold Database (true, default). The search uses a BLAT web service hosted by the UCSF RBVI. The closest sequence match for which a models is available will be retrieved, as long as the sequence identity is at least 25%. With search false, only the experimental structure's input file will be used as a potential source of UniProt accession numbers. When present, these are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.
prokaryote  true | false
This option only applies to predicting complexes. When AlphaFold-Multimer is predicting a complex and finds multiple similar sequences from the same organism, it needs to decide which sequences are orthologous to which chain of the target. Since interacting genes tend to be colocalized into operons in prokaryotes, specifying prokaryote true uses smallest genomic distance (approximated by the lexicographic difference between UniProt identifiers) to identify interacting pairs of sequences within species. With prokaryote false (default), sequences from the same organism are instead paired in order of their rank similarity to the target sequences.
trim  true | false
Whether to trim a predicted protein structure to the same residue range as the corresponding experimental structure given with the alphafold match command or the alignTo option of alphafold fetch. If true (default):

Using trim false indicates retaining the full-length models for the UniProt sequences, which could be longer.

Running an AlphaFold Prediction

For monomer prediction:

Usage: alphafold predict  sequence

The protein sequence to predict can be given as any of the following:

  1. a chain-spec corresponding to a single chain in an atomic structure open in ChimeraX
  2. the sequence-spec of a sequence in the Sequence Viewer, in the form:  alignment-ID:sequence-ID  (details...)
  3. a UniProt name or accession number
  4. plain text pasted directly into the command line

For multimer (protein complex) prediction:

Usage: alphafold predict  sequence1,sequence2[,sequence3...,sequenceN] [ prokaryote  true | false ]

The sequences of two or more protein chains can be specified either collectively as a chain-spec (for atomic-structure chains already open in ChimeraX), or individually within a comma-separated list using any combination of specifier types #2-4 listed above. The list should not contain any spaces. If the same protein chain occurs multiple times in the complex, its sequence should be repeated that number of times. For example, to predict a homodimer, the same sequence (or its specifier) would need to be given twice. Prediction may only be feasible for smaller complexes (details...).

AlphaFold calculations are run using Google Colab. A warning will appear saying that this Colab notebook is from github (was not authored by Google), with a button to click to run anyway. Users will need to have a Google account and to sign into it via a browser. Once that is done, the sign-in may be remembered depending on the user's browser settings; it is not kept in the ChimeraX preferences.

A single prediction run generally takes on the order of an hour or more. The process includes installing various software packages on a virtual machine, searching sequence databases, generating a multiple sequence alignment, predicting atomic coordinates, and energy-minimizing the best structure. Predicting a multimer (complex) structure may take longer than predicting the structure of a monomer with the same total number of residues. The free version of Colab limits jobs to 12 hours and may terminate them at shorter times at Google's discretion (see the FAQ). Those who want to run longer and/or more frequent calculations may wish to sign up for one of the paid Colab plans.

The model will be opened automatically and colored by confidence value. The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).

Caveats


UCSF Resource for Biocomputing, Visualization, and Informatics / December 2021