LOVOALIGN: Protein Structural Alignment

User guide

LovoAlign can be used to to align pairs of proteins, to align a protein to a database of protein structures or to perform an all-on-all structural alignment of a database.

Index
1. Aligning a pair of proteins.
2. Aligning one structure to a protein database.
3. Performing an all-on-all database comparison.
4. Align general structures with customized selections
5. Advanced and optional parameters.
6. Scripts for analysing results.

Aligning a pair of proteins

[top]

The simplest way to run lovoalign to align a pair of protein structure is to run it by:

lovoalign -p1 pdb1.pdb -p2 pdb2.pdb -o pdbout.pdb

where pdb1.pdb and pdb2.pdb are the pdb files containing the structures of each protein. pdbout.pdb is the name of the output file that will contain the structure on pdb1.pdb aligned to the second structure.

If you want to align, for example, the chain "A" of the first protein A to the chain "C" of the second protein, use:

lovoalign -p1 pdb1.pdb -c1 A -p2 pdb2.pdb -c2 C -o pdbout.pdb

If "-c1" was not found in the command line, all chains of the first protein will be considered in the alignment. The same for "-c2" and the second protein.

When alignning a pair of protein, you might want to obtain the RMSF profile of the alignment. To do so, add the option:

-rmsf rmsf.dat

The rmsf.dat file will contain the RMSF plot.

Additionally, you might want the RMSF trend, defined by the profile of the fraction of atoms that are closer than each threshold. To obtain that plot, use:

-rmsftrend rmsftrend.dat

Aligning a protein structure to a structure database

[top]

The lovoalign package may be used to align a structure to a whole database of protein structures. Aligning one structure with about 300 CA atoms to the whole PDB (~30,000 structures) takes half an hour in a typical personal computer. A database of protein structures consists in a collection of structure files in the pdb format (as the pdb database itself). In order to obtain an alignment of a single protein to a whole database, two simple steps must be taken:

1. Obtain a list of the files to which the structure will be be compared. For example, if the database contains three pdb structures, the pdb list would be (lets call it list.dat):

file1.pdb file2.pdb file3.pdb

The list must contain the full path of the files considering the directory in which lovoalign is going to be run.

2. Considering that structure.pdb is the pdb file of the protein that is going to be compared to the database, lovoalign must be run by:

lovoalign -p1 structure.pdb -pdblist list.dat > align.log

In this case the align.log output file contain very concise results for each alignment:

PROTS: B1yh1.pdb B1vl6.pdb 334 373 METHOD: 4 1 10.000 14 0.70389E+00 0.59591E+00 SCORE1: 0.19185E+04 267 0.14279E+02 19
The PROTS line contains the protein names and their number of CA atoms.
The METHOD line contains, in order: a specifier of the method used (default: 6 for database comparisons and 4 for a pair alignment); a specificiation of the type of initial point used (default: 1); the score penalty for gaps; the number of iterations used in the alignment; the time used in this alignment in seconds; the time used not considering post-procesing of the data.
The SCORE1 line contains, in order: The STRUCTAL score obtained for this alignment; the number of atoms in the bijection; the RMSD of the bijected atoms; the number of gaps in the bijection.

Performing an all-on-all database comparison

[top]

Performing an all-on-all database structural alignment with lovoalign is very simple. First, obtain a pdb file for each protein. Second, obtain a file containing a list of the pdb files to be considered, as was explained above. Once you have the list of pdb files, run lovoalign with:

lovoalign -pdblist list.dat > align.log

The output will be similar to the one explained for the single-protein to database comparison.

Customizable alignments

[top]

Any selection of atoms (protein atoms or not) can be aligned with lovoalign. For doing so, you have some options:

1. Create different files for each atom selection you want to align. Then, align the structures using the "-all" option, with which all the atoms in the structure will be considered.

2. Modify your PDB file by changing the beta-factor and occupancy values. With the options "-beta1", "-beta2", "-ocup1", "-ocup2", only the atoms with beta-factors or occupancy values different than zero will be considered. For example, using

lovoalign -p1 prota.pdb -p2 protb.pdb -beta1 -ocup2

The atoms with beta-factor different than zero of protein 1 will be considered, and the atoms with occupancy values different than zero for protein 2 will be considered. Alignments of general structures can be performed this way. Section of residues can be selected with the "-rmin1" ... "-rmax2" options.

Advanced usage and optional parameters

[top]

The lovoalign package actually contains 6 methods for structural alignment implemented and several optional parameters to be set.

User interactive run:

The method used and other parameters may be set user-intearctivelly be running lovoalign without command-line arguments.

Setting optional parameters in the command-line:

Parameter	Possible values	Meaning

-print	0 or 1	Concise or extensive output
-g	real number	Penalty for gaps in the bijection
-all	none	Consider all atoms (not only CA) for the alignment
-m	1, 2 or 3	Method used (see below)
-maxtrial	integer	Maximum number of trials to obtain the global optimium alignment (default: 1000 for pairwise alignments, 4 for database comparisons)
-ismall	0 or 1	Value 1 forces the smallest protein to be protein 1 in a protein pair alignments (affects the time of methods 5 and 6).
-maxit	integer	Maximum number of iterations allowed for methods 2 to 6
-f	real number	Fraction of CA atoms to be considered for methods 5 and 6
-pfac	real number	Factor multiplying the internal distance in initial point
-bijeoff	none	Do not output the bijection between atoms
-beta1	none	Consider only atoms of first protein that have beta > 1.
-beta2	none	Consider only atoms of second protein that have beta > 1.
-ocup1	none	Consider only atoms of first protein that have occupancy > 1.
-ocup2	none	Consider only atoms of second protein that have occupancy > 1.
-rmin1	integer	Consider only atoms of first protein with residue number > rmin1
-rmax1	integer	Consider only atoms of first protein with residue number < rmax1
-rmin2	integer	Consider only atoms of first protein with residue number > rmin2
-rmax2	integer	Consider only atoms of first protein with residue number < rmax2
-seqfix	none	Use fixed sequence alignment based on input sequence of residues.
-seqnum	none	Use fixed sequence alignment based on residue numbering of the PDB files.
-fasta	filename	Use fixed sequence alignment based on fasta alignment file provided.
-noini	none	Do not use pseudoprotein initial point.
-nglobal	[integer]	Number of times the best alignment must be found until one is convinced of finding the global optimum. Default: 3. Runs faster if nglobal is smaller.
-rmsf	filename	Writes rmsf profile of the alignment to file.
-rmsftrend	filename	Writes rmsf trend profile to file (fraction of pairs with rmsf smaller than threshold).

Methods:

1: Maximize the STRUCTAL score.
2: Maximize the TM-score.
3: Maximize the Triangular score.

Scripts for analysing results

[top]

matrix.tcl: This TCL script reads the output file generated by an all-on-all comparison performed with lovoalign and writes a file containing symmetric score and rmsd matrices, which are often used as input in other classification packages.