Superpositions and Alignments Tutorial

In this tutorial, MatchMaker is used to align protein structures (create a superposition) and Match -> Align is used to generate a multiple sequence alignment from the structural superposition. Sequence alignments are displayed in Multalign Viewer, which is covered in more detail in the Sequences and Structures tutorial.

Internet connectivity is required to fetch the structures used in this tutorial: 1tad, 121p, 1r2q, 1j2j, 1puj, 1tnd, 1tag

Background and Setup
Different but related proteins
Different conformations of the same protein

← Background and Setup

Protein structures are classified within databases such as SCOP, CATH, and HOMSTRAD. Classifications range from groups of highly similar and closely related proteins to larger, more diverse sets. Depending on what issues are being studied, it may be useful to superimpose structures that are classified together at any level up to fold. Although it is not always clear whether proteins with the same fold are evolutionarily related (homologous), they should still be superimposable. In general, more closely related proteins are easier to superimpose.

G proteins (guanine nucleotide-binding proteins) are used as examples. G proteins are important in signal transduction. They act as molecular switches, changing conformation and interaction partners depending on whether GTP or GDP is bound. Many diverse structures are known. The two main subsets are the small monomeric G proteins, such as Ras, and the larger heterotrimeric G proteins, which act immediately downstream of G-protein-coupled receptors. The α subunits of heterotrimeric G proteins are homologous to the small G proteins.

On Windows/Mac, click the chimera icon; on UNIX, start Chimera from the system prompt:

unix: chimera

A basic Chimera window should appear after a few seconds; resize it as desired. Open the Command Line (Tools... General Controls... Command Line).

Choose Favorites... Add to Favorites/Toolbar to place some icons on the toolbar. This opens the Tools section of the preferences, which recapitulates Chimera's Tools menu. In the On Toolbar column, check the boxes for:

Model Panel (under General Controls)
Side View (Viewing Controls)
MatchMaker (Structure Comparison)
Match -> Align (Structure Comparison)

You can also specify tools such as the Command Line to Auto Start (start when Chimera is started). If you want these settings to apply to subsequent uses of Chimera, click Save before closing the preferences.

Fetch a structure from the Protein Data Bank:

Command: open 1tad

The structure contains three copies of the α subunit of transducin, a heterotrimeric G protein. Delete solvent and two of the copies, chains B and C:

Command: del solvent
Command: del :.b-c

Move and scale the structures using the mouse and Side View as desired throughout the tutorial. The front and back clipping planes can be adjusted in the Side View.

← Different but Related Proteins

superimposed G proteins

We will superimpose a sample of G protein structures using MatchMaker, then create a sequence alignment from the superposition with Match -> Align.

The α subunit of the heterotrimeric G protein transducin was already opened in the setup. Fetch structures for the monomeric G proteins H-Ras, Rab5a, and ADP-ribosylation factor 1, respectively:

Command: open 121p
Command: open 1r2q
Command: open 1j2j

Simplify the display:

Menu: Presets... Interactive 1 (ribbons)

The ribbons preset displays not only ribbons, but also bound molecules, ions, and nearby sidechains.

The relative positions of the structures are not meaningful; each has its own coordinate system. The next step is to superimpose the proteins so that they can be compared. Start MatchMaker by clicking its icon. This tool superimposes pairs of structures by:

generating a sequence alignment using residue types (Ala, His, etc.) and secondary structure assignments (helix, strand, other)
matching the α-carbons of residues in the same columns of the sequence alignment

A variety of parameters control the sequence alignment step:

alignment algorithm: global (Needleman-Wunsch, default) or local (Smith-Waterman)
alignment scoring:
- what substitution matrix should be used in the residue similarity term (default BLOSUM-62)
- whether the score should include a secondary structure term (default: yes, with 30% weighting of this term and thus 70% weighting of the residue similarity term)
- gap penalties

Also by default, the fit is iterated so that pairs aligned in the sequence alignment but far apart in space are not included in the final match.

Turn off Show alignment(s)... since we will be performing many trial superpositions, but otherwise, keep the default settings. All of the structures should already be chosen as the Structure(s) to match; keep that the same, but choose 1tad as the Reference and click Apply to match all the others to it.

The number of α-carbon pairs and RMSD in the final iteration of each pairwise fit are reported in the Reply Log (under Favorites). However, simple visual inspection of the ribbons, although subjective, is often the most useful indicator of success.

Another visual indicator is how well analogous ligands superimpose. Hide all of the atoms and bonds except in residues classified as ligand, and label those residues:

Command: show ligand
Command: rlab ligand

Each of these structures includes GTP or an analog of GTP in the binding site. However, some other ligands were simply present in the crystallization solution and are not biologically relevant. CAC is cacodylate and GOL is glycerol; these can be removed:

Command: del :cac,gol
Command: ~rlab

We already used 1tad as the Reference structure. Try using each of the others (click its line, click Apply). With the default alignment parameters (Needleman-Wunsch, BLOSUM-62, 30% secondary structure scoring, etc.), the results are similar and basically correct no matter which structure is used as the reference. Detailed examination of the match statistics and guanine nucleotide positions suggests 1r2q is marginally better than the others as the reference with these settings.

Next, try a structure that is harder to superimpose:

Command: open 1puj
Menu: Presets... Interactive 1 (ribbons)
Command: show ligand
Command: repr sphere ligand & #4

Besides lacking sequence similarity, this protein is circularly permuted compared to the others: its N-terminal part structurally matches the C-terminal part of other G proteins and vice versa.

Change the Structure to match to only 1puj and try the others in turn as the Reference. Again, ligand positions can be used to help gauge the match.

Trials with the default alignment parameters are not successful. When proteins are very distantly related, switching to a lower-number BLOSUM matrix and/or increasing the proportion of secondary structure scoring may improve the results. In this case, using 121p as the reference structure and 90% secondary structure scoring (leaving the other alignment parameters as defaults) gives a reasonable result (see the figure). Keep in mind that when proteins are very distantly related, their backbones may diverge even in the best possible superposition.

When all five proteins are superimposed to your satisfaction, click Reset to defaults and then Cancel the MatchMaker dialog.

Next, we will create a multiple sequence alignment from the superposition using Match -> Align. It does not matter to this tool how the superposition was created, and only the distances between α-carbons are used, not residue types. Start Match -> Align by clicking its icon. All of the A chains should already be chosen in the dialog; the B chain of 1j2j is an unrelated peptide and should not be chosen. Use a cutoff of 5.0 Å, specify Residue aligned in column if within cutoff of [at least one other], and turn on Allow for circular permutation. Click OK to start the calculation.

It may take a minute or two to create the alignment; progress is reported in the status line. When the calculation is finished, the new alignment will be displayed in Multalign Viewer and can be saved to a file from that tool.

Match -> Align can make a multiple sequence alignment, whereas MatchMaker only generates pairwise alignments. Even when there are only two structures, however, the alignment created by Match -> Align after fitting is often better than the initial alignment from MatchMaker. Iteration in MatchMaker discards incorrect columns of the initial alignment from the fit. Subsequently, Match -> Align will only put residues that are close in space in the same column of the new sequence alignment.

The output multiple sequence alignment (example: 5gees.afa) shows that 1puj was correctly recognized as a circular permutation relative to the others. Match -> Align doubled its sequence to allow C-terminal residues (in the first copy of the sequence) to appear before more N-terminal residues (in the second copy) within the alignment.

Keep the sequence alignment, but close most of the structures:

start the Model Panel by clicking its icon
in the Model Panel, choose all of the models except 1tad on the left side and click the close button on the right (not at the bottom of the dialog!)
Close the Model Panel

If Multalign Viewer (the sequence alignment window) is hidden behind other windows, it can be resurrected by choosing MAV - alignment-name... Raise from near the bottom of the Tools menu. In Multalign Viewer:

use Preferences... Layout to change the font size and sequence wrapping in the display as desired
use Tools... Percent Identity to compute all-by-all pairwise sequence identities (<30% for these examples)
use Edit... Delete Sequences/Gaps to delete the sequence named 2 x 1puj, chain A and any resulting all-gap columns

Now the alignment clearly shows the large insertion in α-transducin (1tad) relative to the small monomeric G proteins. Select and display residues that are completely conserved in the sequence alignment:

Command: sel :/mavPercentConserved=100
Command: disp sel

Some of the conserved residues are Gly (no sidechain). Clear the selection by Ctrl-clicking in an empty area of the graphics window.

← Different Conformations of the Same Protein

(To jump to this section right after performing the setup, delete residues named CAC and open the sequence alignment file 4gees.afa included with this tutorial.)

GTP-binding switch
(1tagA, 1tndA, morph intermediate)

Now we will compare different structures of the same protein, transducin-α:

Command: open 1tnd
Command: open 1tag
Command: del solvent

If Multalign Viewer (the sequence alignment window) is hidden, bring it to the front by choosing MAV - alignment-name... Raise from near the bottom of the Tools menu.

In that window, a dashed green line is shown around the sequence name 1tad, chain A to indicate its association with multiple structures. Choose Structure... Match to superimpose the structures using the sequence alignment. One structure (it does not matter which) should be designated as the reference and all three can be designated as the structures to match. Check the option to Iterate by pruning... using a 2.0-Å cutoff and click OK.

Superposition of proteins with the same or nearly the same sequence is generally trivial. We used Multalign Viewer since we already had a sequence alignment, but MatchMaker (or its command equivalent) or the command match could have been used instead. These other methods are used and discussed in the Structure Analysis and Comparison tutorial.

Focus to show the entire structures:

Command: focus

Often structures include additional chains that are not associated with the sequence alignment and not needed for the intended analyses. These chains may be additional copies of the same protein or different macromolecules. Here is a trick for removing such unassociated chains:

With the mouse in the sequence alignment, draw a box that includes at least one associated residue from each structure. That will select the associated residues.
In this case, all three structures are associated with the 1tad sequence, so the box could be as small as one position in that sequence. Hover the cursor over the residue in the sequence to make sure it is associated with all three structures (the associated structure residues are reported near the bottom of the sequence window).
Click into the main graphics window and press the keyboard up arrow key to promote the selection from residues to chains.
Invert the selection to contain unwanted atoms:
Command: sel invert
Unfortunately, ligands and ions bound to a particular protein chain do not always have the same ID as that chain. The following commands are needed to rescue such compounds from deletion:
Command: ~sel ligand & ~ sel z<4
Command: ~sel ions & ~ sel z<4
In other words, deselect ligand/ions within 4 Å of atoms that are already not selected. These commands execute rather slowly because they involve many distance calculations. This step can be omitted if the ligands and ions share IDs with the protein chains to which they are bound, or if you do not care whether they will be deleted.
Finally, delete the selection:
Command: del sel

Simplify the display and label the ligand residues:

Command: preset apply int 1
Command: rlab ligand
Command: focus ligand

Open the Model Panel and use Shown checkboxes to view the structures individually.

The 1tad structure in white represents the activated form of a G protein; even though it includes GDP, the GDP and ALF (AlF₄-) residues together mimic the transition state of GTP hydrolysis. 1tnd (magenta) contains the GTP analog GSP and also represents the activated form. The third structure, 1tag (cyan), includes GDP and represents the nonactivated form.

Use the Model Panel checkboxes to show all three structures together. Remove the labels and focus on the overall structures:

Command: ~rlab
Command: focus

While there is high overall similarity, the nonactive conformation (cyan) differs from the activated ones (white and magenta) in specific areas, termed switch regions.

Multalign Viewer displays lines of information called headers above the sequences in the alignment. By default, a Consensus sequence and Conservation histogram are shown. Use the Headers menu to hide these two and show RMSD. The RMSD histogram shows the root-mean-square distance among the α-carbons (CA atoms) of structure residues associated with each column in the alignment.

The three most prominent "humps" in the RMSD header correspond to the known G protein switch regions at approximately residues 173-183, 195-215, and 227-238 of transducin-α. The third switch region is unique to heterotrimeric G proteins; it is an insertion relative to the monomeric G proteins. Placing the cursor over a position in the 1tad sequence lists the associated structure residues near the bottom of the sequence window, and dragging a box around residues in the sequence alignment selects the associated parts of the structures.

Close 1tad:

Command: close 0

The RMSD histogram looks much the same; now it simply shows the CA-CA distances between the two remaining structures.

Finally, morph between the two structures. Morphing involves calculating a series of intermediate structures. In Chimera, the series of structures is treated as a trajectory that can be replayed, saved to a coordinate file, or saved as a movie using MD Movie.

Start the morphing tool:

Command: start Morph Conformations

Click Add... and in the resulting list of models, doubleclick to choose #2, #1, and #2 again, corresponding to a morph trajectory from the nonactivated structure to the activated and back. Close the model list. In the main Morph Conformations dialog, set the Action on Create to hide Conformations, and then click Create.

The progress of the calculation is reported in the status line. When all the intermediate structures have been calculated, the input structures are hidden, the trajectory is opened as model #0, and the MD Movie tool appears.

The trajectory can be played continuously or one step at a time using the buttons on the tool. If the player dialog becomes obscured by other windows, it can be resurrected by choosing MD Movie - trajectory-name... Raise from near the bottom of the Tools menu. If you want to see the original structures again, use the Shown checkboxes in the Model Panel.

When you have finished viewing the morph trajectory, choose File... Quit from the menu to exit from Chimera.

meng@cgl.ucsf.edu / June 2008