globalD
NAME
globalD - report the distance, as a score, between two
sequences.
SYNOPSIS
globalD db file2 mismatch alpha beta [ flags ]
DESCRIPTION
globalD reports the distance between the sequences in db and
the sequence in file2, given a particular set of scoring
parameters.
The scoring parameters are all integer values, and all posi-
tive. The parameters mismatch, alpha and beta are added to
the score; match has a score of zero.
mismatch the amount to add for a mismatch.
alpha the amount to add for the first letter of an
insertion or deletion sequence (indel).
beta is the amount to add for subsequent letters in an
indel. For example, if there is a five-letter
indel, i.e. k = 5, then alpha + beta * ( k - 1 )
= alpha + beta * (4) will be added to the score.
flags See manual page seqaln-intro (1) for a full
description of optional flags.
The format of sequence files db and file2 is our standard
format, the Pearson/FASTA format. The first line is the
sequence name, and should be used as a description. Subse-
quent lines contain the sequence to be used. The sequences
themselves may contain blanks, returns, and other whitespace
for readability. The sequence terminates at end-of-file,
`>' is read to begin a new sequence in the FASTA format.
Only multiple sequences in the first file will be processed.
EXAMPLES
The examples below compare the sequence CATATGGC with the
sequence TACGATCGGC, and demonstrate how different choices
for the scoring parameters can produce different alignments.
Matches are not counted in the score, as they do not
represent a distance between two segments, but rather a
similarity. All parameters that would be subtracted from a
similarity score (mismatch, alpha and beta) are added to a
difference score.
example% globalD f1 f2 10 11 5
finds 2 mismatches and two one-letter indels, beginning at position 1
for both sequences:
1 CA-TAT-GG
| || ||
1 TACGATCGG
for a score of 2*10 + 11 + 11 = 42.
example% globalD f1 f2 30 11 7
finds one 1-letter indel, one 2-letter indel, and one 3-letter indel,
beginning at postion 1 in the first sequence, and preceding (i.e.,
position 0 in) the second sequence:
1 CAT---AT-GGC
| || |||
0 --TACGATCGGC
for a score of 11 + (11 + 7) + (11 + 2*7) = 54.
REFERENCES
P.H. Sellers. "On the theory and computation of evolution-
ary distances". SIAM Journal of Applied Mathematics,
26 (1974) 787-793.
M.S. Waterman. Introduction to Computational Biology: Maps,
sequences and genomes. Chapman & Hall. London: 1995. ISBN
0-412-99391-0.
SEE ALSO
seqaln-intro(1), mglobalD(1), globalS(1), mglobalS(
1),
sequence-file(5).