globalD



NAME

     globalD - report the  distance,  as  a  score,  between  two
     sequences.


SYNOPSIS

     globalD  db file2 mismatch alpha beta [ flags ]


DESCRIPTION

     globalD reports the distance between the sequences in db and
     the  sequence  in  file2,  given a particular set of scoring
     parameters.

     The scoring parameters are all integer values, and all posi-
     tive.   The parameters mismatch, alpha and beta are added to
     the score; match has a score of zero.

     mismatch  the amount to add for a mismatch.

     alpha     the amount to add  for  the  first  letter  of  an
               insertion or deletion sequence (indel).

     beta      is the amount to add for subsequent letters in  an
               indel.   For  example,  if  there is a five-letter
               indel, i.e.  k = 5, then alpha + beta * ( k - 1  )
               = alpha + beta * (4) will be added to the score.


     flags     See  manual  page  seqaln-intro  (1)  for  a  full
               description of optional flags.

     The format of sequence files db and file2  is  our  standard
     format,  the  Pearson/FASTA  format.   The first line is the
     sequence name, and should be used as a description.   Subse-
     quent  lines contain the sequence to be used.  The sequences
     themselves may contain blanks, returns, and other whitespace
     for  readability.   The  sequence terminates at end-of-file,
     `>' is read to begin a new sequence  in  the  FASTA  format.
     Only multiple sequences in the first file will be processed.


EXAMPLES

     The examples below compare the sequence  CATATGGC  with  the
     sequence  TACGATCGGC,  and demonstrate how different choices
     for the scoring parameters can produce different alignments.
     Matches  are  not  counted  in  the  score,  as  they do not
     represent a distance between  two  segments,  but  rather  a
     similarity.   All parameters that would be subtracted from a
     similarity score (mismatch, alpha and beta) are added  to  a
     difference score.

          example% globalD f1 f2 10 11 5


          finds 2 mismatches and two one-letter indels, beginning at position 1
          for both sequences:

               1 CA-TAT-GG
                  |  || ||
               1 TACGATCGG

          for a score of 2*10 + 11 + 11 = 42.

          example% globalD f1 f2 30 11 7

          finds one 1-letter indel, one 2-letter indel, and one 3-letter indel,
          beginning at postion 1 in the first sequence, and preceding (i.e.,
          position 0 in) the second sequence:

               1 CAT---AT-GGC
                   |   || |||
               0 --TACGATCGGC

          for a score of 11 + (11 + 7) + (11 + 2*7) = 54.


REFERENCES

     P.H. Sellers.  "On the theory and computation of  evolution-
          ary  distances".   SIAM Journal of Applied Mathematics,
          26 (1974) 787-793.

     M.S. Waterman.  Introduction to Computational Biology: Maps,
     sequences  and  genomes. Chapman & Hall. London: 1995.  ISBN
     0-412-99391-0.


SEE ALSO

     seqaln-intro(1),   mglobalD(1),   globalS(1),   mglobalS(
1),
     sequence-file(5).