ChimeraX docs icon

Command: similarstructures

– under construction –

The similarstructures command performs several functions related to the Similar Structures tool. The tool searches for similar protein structures and facilitates exploring large sets of results by efficiently showing them in 3D as backbone traces and in 2D as sequence alignment plots and conformation-based scatter plots.

Similar Structures BLAST Search

similarstructures blast  chain-spec  [ database  pdb | afdb ] [ showTable  true | false ] [ evalueCutoff  max-evalue ] [ maxHits  N ] trim  true | false | chains,sequence,ligands ]alignmentCutoffDistance  dist ] [ saveDirectory  results-folder ]

Run a BLAST search of the PDB or AlphaFold Database with a protein structure chain as the query, using the web service hosted by the UCSF RBVI. The Foldseek (Similar Structures) tool provides a graphical interface to running this command.

The showTable option (default true) indicates whether to show the results in the Similar Structures tool, which facilitates exploring large sets of protein structures by efficiently showing them in 3D as backbone traces and in 2D as sequence alignment schematics or scatter plots based on conformation. The related blastprotein command and Blast Protein tool are different in that they can use sequence only as the query, not just a structure chain, and they show results in a different table that may contain several additional features of the hits (such as crystallographic resolution, ligand residue names, etc.); if a structure chain was used as the query, such results can be converted to a Similar Structures table with the command similarstructures fromblast.

A BLAST search may take several minutes, during which ChimeraX cannot be used for other tasks. A much faster (~100X) MMseqs2 search can be performed instead with the sequence search command, and a fast 3D structure search with the foldseek command.

The name assigned to a set of results (bl1 or bl2) is reported in the Log when the search is run.

Choices of database to search:

The hits can be limited with the following options:

The trim and alignmentCutoffDistance options do not act immediately when the search is run. Instead, they specify how to process the hit structures if later opened from the Similar Structures tool or with the command similarstructures open. The structures are fetched from the respective database (Protein Data Bank or AlphaFold Database) and processed as follows:

Search results are saved in a similar structures file (suffix .sms, a JSON file format specific to ChimeraX) with filename based on the query name and the database searched. The file will be listed in the File History for easy access, and simply opening it loads the set of results into the Similar Structures interface. The saveDirectory option allows specifying the save location, either directly or as the word browse to specify it interactively in a file browser window (default location ~/Downloads/ChimeraX/BLAST/).

similarstructures fromblastblast-results-name ] [ save  true | false ] [ saveDirectory  results-folder ]
Convert results from a previous search with the Blast Protein tool or blastprotein command (with structure chain as query) to show in the Similar Structures table. If more than one set of Blast Protein results is open, the blast-results-name (as shown in its title bar, e.g. bp1 or bp2) can be given. The save option indicates whether to also save the converted results as a similar structures file (default true). The saveDirectory option allows specifying the save location, either directly or as the word browse to specify it interactively in a file browser window (default location ~/Downloads/ChimeraX/BLAST/).

Similar Structures Analysis

similarstructures open  hit-ID  [ fromSet  set-name ] [ align  true | false ] trim  true | false | chains,sequence,ligands ]alignmentCutoffDistance  dist ] [ inFileHistory  true | false ] [ log  true | false ]
Fetch and open a hit structure.

Although only one set of results can be shown in the Similar Structures table at a time (the default set), other sets may be open and available for analysis. Another set can be specified with fromSet set-name. The name of a set of results is reported in the Log when a search is run, and the names of open sets can be listed in the Log with similarstructures list. However, the only way to get a set of results that is open but not shown in the table is to use the showTable false option of the search command. Results are closed when a new set of results replaces them in the Similar Structures table, when the tool is closed, or when the command similarstructures close is used to close them.

The hit-ID for a structure from the PDB is a combination of the PDB ID and chain ID (for example, 6cmi_B), and for a structure from the a AlphaFold Database, its UniProt accession number. The hit-ID is shown in the first column of the Similar Structures table.

By default, the opened structure will be Cα-aligned with the query using the sequence alignment provided by the search method, with fit iteration as described for the command align. The trim and alignmentCutoffDistance options are as described above. If a large number of structures are opened, it may be useful to omit them from the File History with inFileHistory false, and omit their descriptions from the Log with log false.

similarstructures sequencesfromSet  set-name ] [ showConserved  true | false ] [ conservedThreshold  fraction ] [ conservedColor  color-spec ] [ identityColor  color-spec ] [ lddtColoring  true | false ] [ order  evalue | cluster | identity | lddt ]
Show a sequence plot of all hits in the specified set. This plot provides an overview of which parts of the query sequence were matched and the depth of coverage. Each row of pixels in the image represents one hit sequence, and the columns correspond to the residues of the query. The plot is white where there is no residue aligned with the query:    

Several options specify how to color the positions with aligned residues:

The LDDT (local distance difference test) indicates the similarity of a hit residue to the aligned query residue in a neighborhood of 15 Å from the query residue α-carbon.

The sequences (rows in the plot) can be in order of:

Reissuing the command with different coloring options does not recolor the already open plot. To change the coloring, first close the plot and then reissue the command with the desired options.
similarstructures tracesfromSet  set-name ] [ ofStructures  hit-ID(s) ] [ alignWith  residue-spec ] [ alignmentCutoffDistance  adist ] [ show  all | close ] [ breakSegmentDistance  bdist ] [ minSegmentResidues  N ] [ distance  dist ] [ maxSegmentDistance  mdist ] [ replace  true | false ]
Display “licorice” (spaghetti-like) ribbons superimposed on the query structure for hits from the specified set of results, either all hits or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. These traces are meant to give an overview of the variability of a large number of stuctures and their coverage of the query. They are calculated from α-carbons only and do not show helix and strand assignments.

Search results from the structure-based Foldseek method automatically include the α-carbon coordinates of the hits. Search results from sequence methods (MMSeqs2 and BLAST) do not automatically include α-carbon coordinates, but trying to show traces will raise a dialog asking the user whether to fetch them, which could take several minutes. (Alternatively, similarstructures fetchcoords can be used beforehand to fetch the α-carbon coordinates for the structures of interest.) All of the hit structure α-carbons are loaded as a single atomic model, one chain per structure, with chain ID set to the database ID of the structure. The residue types of the hit are retained, but the residues are renumbered according to the paired residues of the query structure.

Each hit is superimposed on the query structure by fitting the α-carbons of the residues paired by the search method. The fitting is done as described for the align command, with iteration so that only α-carbon pairs within the specified alignmentCutoffDistance adist are used in the final fit (default as per the Similar Structures options, otherwise 2.0 Å). The alignWith option can be used to specify a subset of the query residues to use for superposition, instead of all paired positions.

With show close (default, where “close” means “nearby”), the traces are displayed as follows:

  1. The ribbon is broken into segments where two consecutive aligned α-carbons are > breakSegmentDistance  bdist apart (default 5.0 Å).
  2. Ribbons are shown for ≥ minSegmentResidues  N contiguous α-carbons within a segment (default 5) and within distance  dist of the corresponding query α-carbons (default 4.0 Å).
  3. Ribbons are shown for entire segments in which every α-carbon is within maxSegmentDistance  mdist of its counterpart (default 10.0 Å).
With show all, the traces are instead shown for all residues regardless of how far they are from the query structure.

The replace option indicates whether to overwrite a pre-existing trace model (true, default). If false, an additional model will be generated.

similarstructures cluster  reference-residues  [ fromSet  set-name ] [ ofStructures  hit-ID(s) ] [ alignWith  residue-spec ] [ alignmentCutoffDistance  adist ] [ clusterCount  N | clusterDistance  cdist ] [ colorBySpecies  true | false ] [ replace  true | false ]
Create a scatter plot based on backbone conformations of hits from the specified set of results, either all hits or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. The plot is generated as follows:
  1. Each hit is superimposed on the query structure by fitting the α-carbons of the paired residues as described above.
  2. After fitting, the x,y,z coordinates of the hit α-carbons paired with the query reference-residues are concatenated into a vector. Any hits without a residue aligned to one or more of the reference residues will be omitted from the plot, so it is best to use a relatively small set (<30) of query residues with highly populated columns in the sequence alignment of hits to the query.
  3. The vector is projected to a point in two dimensions with UMAP (Uniform Manifold Approximation and Projection). When similarstructures cluster is first run, it may take a minute to install the large UMAP Python package umap-learn.
  4. The points (circles) in the plot may be clustered and colored according to the additional parameters described below.
Search results from the structure-based Foldseek method automatically include the α-carbon coordinates of the hits. Search results from sequence methods (MMSeqs2 and BLAST) do not automatically include α-carbon coordinates, but trying to show clusters will raise a dialog asking the user whether to fetch them, which could take several minutes. (Alternatively, similarstructures fetchcoords can be used beforehand to fetch the α-carbon coordinates for the structures of interest.)

Two methods of clustering the points are available. Either one (but not both) of the following can be used:

If neither option is used, clustering will not be not done (they do not have default values).

Colors are chosen randomly. If the coloring is unpleasant, simply reissuing the command may give a better set of colors. By default (colorBySpecies false), the circles in the plot are colored by cluster, if clustering was done. With colorBySpecies true, they are colored by source species and the color corresponding to each species is reported in the Log.

The replace option indicates whether to overwrite a pre-existing cluster plot (true, default). If false, an additional plot will be generated.

similarstructures ligandsfromSet  set-name ] [ ofStructures  hit-ID(s) ] [ warn  true | false ] [ rmsdCutoff  rmsd ] [ alignmentRange  range ] [ minimumPaired  fraction ] [ combine  true | false ]
Copy ligands, ions, and solvent molecules (nonpolymer residues) from the hit structures onto corresponding locations on the query structure. Either all hits from the specified set of results can be used, or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. Copying the residues requires fetching the full coordinates of each structure. With warn true (default), a dialog will appear to ask the user whether to proceed with the fetch, which could take several minutes to complete.

Each ligand (ion, solvent) residue is evaluated for mapping onto the query structure, as follows:

  1. Protein residues within alignmentRange  range (default 5.0 Å) of the ligand are identified.
  2. If at least minimumPaired  fraction of those nearby protein residues are paired with query residues (default 0.5), the α-carbons of those pairs are fitted.
  3. If the resulting RMSD is ≤ rmsdCutoff  rmsd (default 3.0 Å), the ligand is copied to corresponding position relative to the query structure.

How many residues were copied and their residue types are reported in the Log. Often thousands of water molecules, and ions, and crystallization adjuvants are found, and they can be hidden to get a better view of more interesting ligands (details...).

With combine true (default), the copied ligand, ion, and solvent residues are loaded as a single atomic model, in which the chain ID of a residue is generated from the PDB ID and chain ID of its source structure (e.g., 2cml_B). Pausing the cursor over a residue in the graphics window shows its name and chain ID in a pop-up balloon. With combine false, a separate model is generated for each hit with mappable residues, containing the residues in their mapped positions with their original chain IDs.

Similar Structures Utilities

similarstructures fetchcoordsfromSet  set-name ]
similarstructures scrollto  hit-ID  [ fromSet  set-name ]
Scroll to the row for a specified hit in the Similar Structures table and highlight that row.
similarstructures pairing  model-spec  [ fromSet  set-name ]
Draw pseudobonds between the paired α-carbons of the (previously opened) atomic model of the hit and the query structure.
similarstructures seqalign  model-spec  [ fromSet  set-name ]
Show the sequence alignment between the (previously opened) atomic model of the hit and the query structure in the Sequence Viewer.
similarstructures list
List the names of currently available sets of search results in the Log. The name is reported in the Log when the results are generated. Although only one set of results can be shown in the Similar Structures table at a time, additional sets may be open and available for analysis with similarstructures commands. However, the only way to get a set of results that is open but not shown in the table is to use the showTable false option of the search command.
similarstructures close  set-name
Close a specified set of results. The names of currently open sets of results can be listed in the Log with similarstructures list. Results are also closed when a new set of results replaces them in the Similar Structures table or when the tool is closed.

UCSF Resource for Biocomputing, Visualization, and Informatics / November 2024