Command: similarstructures

The similarstructures command performs several functions related to the Similar Structures tool. The tool searches for similar protein structures and facilitates exploring large sets of results by efficiently showing them in 3D as backbone traces and in 2D as sequence alignment plots and conformation-based scatter plots.

Similar structures BLAST search (see also foldseek, sequence search):
- similarstructures blast – BLAST protein with structure chain as query, show results in Similar Structures
- similarstructures fromblast – convert results from a previous Blast Protein search (using a structure chain as the query) to Similar Structures results
Similar structures analysis:
- similarstructures open – open hit structures
- similarstructures sequences – show a sequence plot overview of query sequence coverage by all hits
- similarstructures traces – show α-carbon traces of hit structures
- similarstructures cluster – show a scatter plot (UMAP) of hits based on their atomic positions, colored by cluster
- similarstructures ligands – spatially map ligands, ions, and solvent from hit structures onto the query structure
Similar structures utilities:
- similarstructures fetchcoords – fetch α-carbon coordinates for hits from sequence searches
- similarstructures scrollto – scroll to the row for a specific hit in the Similar Structures table and highlight that row
- similarstructures pairing – draw pseudobonds between the paired α-carbons of the query and a hit structure
- similarstructures seqalign – show the sequence alignment of the query and a hit structure in the Sequence Viewer
- similarstructures list – list the names of available sets of search results
- similarstructures close – close a set of search results

Similar Structures BLAST Search

• similarstructures blast chain-spec [ database pdb | afdb ] [ showTable true | false ] [ evalueCutoff max-evalue ] [ maxHits N ] [ trim true | false | chains,sequence,ligands ] [ alignmentCutoffDistance dist ] [ saveDirectory results-folder ]

Run a BLAST search of the PDB or AlphaFold Database with a protein structure chain as the query, using the web service hosted by the UCSF RBVI. The Foldseek (Similar Structures) tool provides a graphical interface to running this command.
The showTable option (default true) indicates whether to show the results in the Similar Structures tool, which facilitates exploring large sets of protein structures by efficiently showing them in 3D as backbone traces and in 2D as sequence alignment schematics or scatter plots based on conformation. The related blastprotein command and Blast Protein tool are different in that they can use sequence only as the query, not just a structure chain, and they show results in a different table that may contain several additional features of the hits (such as crystallographic resolution, ligand residue names, etc.); if a structure chain was used as the query, such results can be converted to a Similar Structures table with the command similarstructures fromblast.
A BLAST search may take several minutes, during which ChimeraX cannot be used for other tasks. A much faster (~100X) MMseqs2 search can be performed instead with the sequence search command, and a fast 3D structure search with the foldseek command.
The name assigned to a set of results (bl1 or bl2) is reported in the Log when the search is run.
Choices of database to search:

pdb (default) – PDB
afdb – AlphaFold Database

The hits can be limited with the following options:

evalueCutoff max-evalue – maximum E-value of a hit to be retained (default 1e-3)
maxHits N – maximum number of hits retained (default 1000)

The trim and alignmentCutoffDistance options do not act immediately when the search is run. Instead, they specify how to process the hit structures if later opened from the Similar Structures tool or with the command similarstructures open. The structures are fetched from the respective database (Protein Data Bank or AlphaFold Database) and processed as follows:

The trim option indicates deleting all of the following (if true), or none of them (if false), or a comma-separated list of:

chains – for PDB entries, chains other than the hit chain
sequence – N- and C-terminal segments of the hit chain that were not included in the sequence alignment returned by the search method
ligands – ligands, solvent, and ions > 3 Å from the hit chain
The trim default is the current setting in the Similar Structures options, otherwise (if the tool is not open) true.

The hit chain is superimposed onto the query chain by least-squares fitting the α-carbons of the paired residues and iteratively pruning far-apart pairs as described for the align command. The alignmentCutoffDistance d is the alignment pruning distance so that only α-carbon pairs within the specified distance are used in the final fit (default as per the Similar Structures options, otherwise 2.0 Å)

Search results are saved in a similar structures file (suffix .sms, a JSON file format specific to ChimeraX) with filename based on the query name and the database searched. The file will be listed in the File History for easy access, and simply opening it loads the set of results into the Similar Structures interface. The saveDirectory option allows specifying the save location, either directly or as the word browse to specify it interactively in a file browser window (default location ~/Downloads/ChimeraX/BLAST/).

• similarstructures fromblast [ blast-results-name ] [ save true | false ] [ saveDirectory results-folder ]

Convert results from a previous search with the Blast Protein tool or blastprotein command (with structure chain as query) to show in the Similar Structures table. If more than one set of Blast Protein results is open, the blast-results-name (as shown in its title bar, e.g. bp1 or bp2) can be given. The save option indicates whether to also save the converted results as a similar structures file (default true). The saveDirectory option allows specifying the save location, either directly or as the word browse to specify it interactively in a file browser window (default location ~/Downloads/ChimeraX/BLAST/).

Similar Structures Analysis

Fetch and open a hit structure. The hit-ID for a structure from the PDB is a combination of the PDB ID and chain ID (for example, 6cmi_B), and for a structure from the a AlphaFold Database, its UniProt accession number. The hit-ID is shown in the first column of the Similar Structures table.
By default, the opened structure will be Cα-aligned with the query using the sequence alignment provided by the search method, with fit iteration as described for the command align. The trim and alignmentCutoffDistance options are as described above. If a large number of structures are opened, it may be useful to omit them from the File History with inFileHistory false, and omit their descriptions from the Log with log false.
Although only one set of results can be shown in the Similar Structures table at a time (the default set), other sets may be open and available for analysis. Another set can be specified with fromSet set-name. The name of a set of results is reported in the Log when a search is run, and the names of open sets can be listed in the Log with similarstructures list. However, the only way to get a set of results that is open but not shown in the table is to use the showTable false option of the search command. Results are closed when a new set of results replaces them in the Similar Structures table, when the tool is closed, or when the command similarstructures close is used to close them.

• similarstructures sequences [ showConserved true | false ] [ conservedThreshold fraction ] [ conservedColor color-spec ] [ identityColor color-spec ] [ lddtColoring true | false ] [ order evalue | cluster | identity | lddt ] [ fromSet set-name ]

Show a sequence plot of all hits in the specified set. This plot provides an overview of which parts of the query sequence were matched and the depth of coverage. Each row of pixels in the image represents one hit sequence, and the columns correspond to the residues of the query. The plot is white where there is no residue aligned with the query:

Several options specify how to color the positions with aligned residues:

with showConserved true:

use the specified identityColor (default

) for residues of the same amino acid type as the query in a column where at least conservedThreshold fraction (default 0.5) of the residues have that type (and the column contains at least 10 residues)
use the specified conservedColor (default

) for residues of the same amino acid type as the query, but not meeting the column criteria above
use black (

) for residues of a different amino acid type than the aligned query residue

with lddtColoring true: by LDDT of each aligned residue in each structure:
0 0.2 0.4 0.6 0.8

with both of the above true, only the positions that would otherwise be black are colored by LDDT instead

The LDDT (local distance difference test) indicates the similarity of a hit residue to the aligned query residue in a neighborhood of 15 Å from the query residue α-carbon (see Mariani et al., Bioinformatics 29:2722 (2013)).
The sequences (rows in the plot) can be in order of:

evalue – lowest to highest E-value (default, if clustering by coverage only gives one cluster, see below)
cluster – grouping the sequences by which part of the query they cover (default, if clustering by coverage gives >1 cluster)
identity – percent sequence identity compared to the query
lddt – average LDDT over all residues in a hit structure
Reissuing the command with different coloring options does not recolor the already open plot. To change the coloring, first close the plot and then reissue the command with the desired options.

• similarstructures traces [ ofStructures hit-ID(s) ] [ alignWith residue-spec ] [ alignmentCutoffDistance adist ] [ show all | close ] [ breakSegmentDistance bdist ] [ minSegmentResidues N ] [ distance dist ] [ maxSegmentDistance mdist ] [ replace true | false ] [ fromSet set-name ]

Display “licorice” (spaghetti-like) ribbons superimposed on the query structure for hits from the specified set of results, either all hits or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. These traces are meant to give an overview of the variability of a large number of stuctures and their coverage of the query. They are calculated from α-carbons only and do not show helix and strand assignments.
Search results from the structure-based Foldseek method automatically include the α-carbon coordinates of the hits. Search results from sequence methods (MMSeqs2 and BLAST) do not automatically include α-carbon coordinates, but trying to show traces will raise a dialog asking the user whether to fetch them, which could take several minutes. (Alternatively, similarstructures fetchcoords can be used beforehand to fetch the α-carbon coordinates for the structures of interest.) All of the hit structure α-carbons are loaded as a single atomic model, one chain per structure, with chain ID set to the database ID of the structure. The residue types of the hit are retained, but the residues are renumbered according to the paired residues of the query structure.
Each hit is superimposed on the query structure by fitting the α-carbons of the residues paired by the search method. The fitting is done as described for the align command, with iteration so that only α-carbon pairs within the specified alignmentCutoffDistance adist are used in the final fit (default as per the Similar Structures options, otherwise 2.0 Å). The alignWith option can be used to specify a subset of the query residues to use for superposition, instead of all paired positions.
With show close (default, where “close” means “nearby”), the traces are displayed as follows:

The ribbon is broken into segments where two consecutive aligned α-carbons are > breakSegmentDistance  bdist apart (default 5.0 Å).
Ribbons are shown for ≥ minSegmentResidues  N contiguous α-carbons within a segment (default 5) and within distance  dist of the corresponding query α-carbons (default 4.0 Å).
Ribbons are shown for entire segments in which every α-carbon is within maxSegmentDistance  mdist of its counterpart (default 10.0 Å).
With show all, the traces are instead shown for all residues regardless of how far they are from the query structure.
The replace option indicates whether to overwrite a pre-existing trace model (true, default). If false, an additional model will be generated.

• similarstructures cluster reference-residues [ ofStructures hit-ID(s) ] [ alignWith residue-spec ] [ alignmentCutoffDistance adist ] [ clusterCount N | clusterDistance cdist ] [ colorBySpecies true | false ] [ replace true | false ] [ fromSet set-name ]

Create a scatter plot based on backbone conformations of hits from the specified set of results, either all hits or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. The plot is generated as follows:

Each hit is superimposed on the query structure by fitting the α-carbons of the paired residues as described above.
After fitting, the x,y,z coordinates of the hit α-carbons paired with the query reference-residues are concatenated into a vector. Any hits without a residue aligned to one or more of the reference residues will be omitted from the plot, so it is best to use a relatively small set (<30) of query residues with highly populated columns in the sequence alignment of hits to the query.
The vector is projected to a point in two dimensions with UMAP (Uniform Manifold Approximation and Projection). When similarstructures cluster is first run, it may take a minute to install the large UMAP Python package umap-learn.
The points (circles) in the plot may be clustered and colored according to the additional parameters described below.
Search results from the structure-based Foldseek method automatically include the α-carbon coordinates of the hits. Search results from sequence methods (MMSeqs2 and BLAST) do not automatically include α-carbon coordinates, but trying to show clusters will raise a dialog asking the user whether to fetch them, which could take several minutes. (Alternatively, similarstructures fetchcoords can be used beforehand to fetch the α-carbon coordinates for the structures of interest.)
Two methods of clustering the points are available. Either one (but not both) of the following can be used:

If clusterCount N is given (say 5 clusters), the k-means algorithm will be used to produce that number of clusters.
If clusterDistance cdist is given (say 1.5 Å), points within that distance of each other in the 2D projection will be clustered together.
If neither option is used, clustering will not be not done (they do not have default values).
Colors are chosen randomly. If the coloring is unpleasant, simply reissuing the command may give a better set of colors. By default (colorBySpecies false), the circles in the plot are colored by cluster, if clustering was done. With colorBySpecies true, they are colored by source species and the color corresponding to each species is reported in the Log.
The replace option indicates whether to overwrite a pre-existing cluster plot (true, default). If false, an additional plot will be generated.

• similarstructures ligands [ ofStructures hit-ID(s) ] [ warn true | false ] [ rmsdCutoff rmsd ] [ alignmentRange range ] [ minimumPaired fraction ] [ combine true | false ] [ fromSet set-name ]

Copy ligands, ions, and solvent molecules (nonpolymer residues) from the hit structures onto corresponding locations on the query structure. Either all hits from the specified set of results can be used, or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. Copying the residues requires fetching the full coordinates of each structure. With warn true (default), a dialog will appear to ask the user whether to proceed with the fetch, which could take several minutes to complete.
Each ligand (ion, solvent) residue is evaluated for mapping onto the query structure, as follows:

Protein residues within alignmentRange  range (default 5.0 Å) of the ligand are identified.
If at least minimumPaired  fraction of those nearby protein residues are paired with query residues (default 0.5), the α-carbons of those pairs are fitted.
If the resulting RMSD is ≤ rmsdCutoff  rmsd (default 3.0 Å), the ligand is copied to corresponding position relative to the query structure.

How many residues were copied and their residue types are reported in the Log. Often thousands of water molecules, and ions, and crystallization adjuvants are found, and they can be hidden to get a better view of more interesting ligands (details...).
With combine true (default), the copied ligand, ion, and solvent residues are loaded as a single atomic model, in which the chain ID of a residue is generated from the PDB ID and chain ID of its source structure (e.g., 2cml_B). Pausing the cursor over a residue in the graphics window shows its name and chain ID in a pop-up balloon. With combine false, a separate model is generated for each hit with mappable residues, containing the residues in their mapped positions with their original chain IDs.

Similar Structures Utilities

• similarstructures fetchcoords [ ofStructures hit-ID(s) ] [ minAlignedCoords N ] [ updateTable true | false ] [ rewriteSmsFile true | false ] [ ask true | false ] [ fromSet set-name ]

Search results from the structure-based Foldseek method automatically include the α-carbon coordinates of the hits. Search results from sequence methods (MMSeqs2 and BLAST) do not automatically include α-carbon coordinates, but similarstructures fetchcoords can be used to fetch them for hits from the specified set of results, either all hits or a subset specified as a comma-separated list of hit identifiers with the ofStructures option. The minAlignedCoords option allows further excluding hits with fewer than N residues aligned to the query by the search method (default 10).
Fetching the α-carbon coordinates enables:

trace display and cluster plotting
with updateTable true (default), filling in the % Close and % Cover columns in the Similar Structures table of results
with rewriteSmsFile true (default), adding the α-carbon coordinates into the corresponding similar structures file (.sms); when a similar structures search is performed, the resulting .sms file is cached under ~/Downloads/ChimeraX/ (details...), and it is the file in this default location that is rewritten
The ask option (default false) is whether to prompt the user to confirm the fetch, since it may take some time. Typically this is set to true when the command is called by other functions, such as plotting clusters.

• similarstructures scrollto hit-ID [ fromSet set-name ]

Scroll to the row for a specified hit in the Similar Structures table and highlight that row.

• similarstructures pairing model-spec [ color color-spec ] [ radius r ] [ halfbondColoring true | false ] [ fromSet set-name ]

Draw dashed pseudobonds between the paired α-carbons of the (previously opened) atomic model of a hit from the specified set and the query structure. The pseudobonds are assigned the specified color (default

) and radius r (default 0.1 Å), except that giving halfbondColoring true will instead color the pseudobond halves to match the endpoint atoms (default is halfbondColoring false).

• similarstructures seqalign model-spec [ fromSet set-name ]

Show the pairwise sequence alignment between the (previously opened) atomic model of a hit from the specified set and the query in the Sequence Viewer.

• similarstructures list

List the names of currently available sets of search results in the Log. The name is reported in the Log when the results are generated. Although only one set of results can be shown in the Similar Structures table at a time, additional sets may be open and available for analysis with similarstructures commands. However, the only way to get a set of results that is open but not shown in the table is to use the showTable false option of the search command.

• similarstructures close set-name

Close a specified set of results. The names of currently open sets of results can be listed in the Log with similarstructures list. Results are also closed when a new set of results replaces them in the Similar Structures table or when the tool is closed.

UCSF Resource for Biocomputing, Visualization, and Informatics / November 2024