globalS
NAME
globalS - compare two sequences to find best-scoring global
alignments.
SYNOPSIS
globalS db file2 match mismatch alpha beta [ flags ]
DESCRIPTION
globalS finds the best global alignment between sequences in
db and the sequence in file2, given a particular set of
scoring parameters.
The scoring parameters are all integer values, and all posi-
tive. The parameters mismatch, alpha and beta are sub-
tracted from the score; match is added to the score.
match the score for aligning identical letters.
mismatch the amount to subtract for a mismatch.
alpha the amount to subtract for the first letter of an
insertion or deletion sequence (indel).
beta is the amount to subtract for subsequent letters
in an indel. For example, if there is a five-
letter indel, k = 5, then alpha + beta * ( k - 1 )
= alpha + beta * (4) will be subtracted from the
score.
flags See manual page seqaln-intro (1) for a full
description of optional flags.
The format of sequence files db and file2 is our standard
format, the Pearson/FASTA format. The first line is the
sequence name, and should be used as a description. Subse-
quent lines contain the sequence to be used. The sequences
themselves may contain blanks, returns, and other whitespace
for readability. The sequence terminates at end-of-file,
`>' is read to begin a new sequence in the FASTA format.
Only multiple sequences in the first file will be processed.
EXAMPLES
The examples below compare the sequence CATATGGC with the
sequence TACGATCGGC, and demonstrate how different choices
for the scoring parameters can produce different alignments.
In both examples, only the best (i.e., highest scoring)
alignment is requested.
example% globalS f1 f2 10 5 7 15
finds 6 matches, 2 mismatches, and 2 one-letter indels, beginning at
position 1 for each sequence:
1 CA-TAT-GGC
| || |||
1 TACGATCGGC
for a score of 10(6) - 2(5) - 2(7) = 36.
example% globalS f1 f2 10 15 7 5
finds 6 matches, 4 one-letter indels, and 1 two-letter indel, preceding the
first sequence, and beginning at position 1 in the second sequence:
0 -CA--TAT-GGC
| || |||
1 T-ACG-ATCGGC
for a score of 10(6) - 4(7) - (7+7) = 18.
REFERENCES
S.B. Needleman and C.D. Wunsch. "A general method applica-
ble to the search for similarities in the amino acid
sequence of two proteins". Journal of Molecular Biol-
ogy, 48, (1970) 443-453.
Smith, T.F., M.S. Waterman, and W.M. Fitch. Comparative
biosequence metrics. J. Molecular Evolution, 18 (1981)
38-46.
M.S. Waterman. Introduction to Computational Biology: Maps,
sequences and genomes. Chapman & Hall. London: 1995. ISBN
0-412-99391-0.
SEE ALSO
seqaln-intro(1), mglobalS(1), globalD(1), mglobalD(
1),
sequence-file(5).