mglobalD



NAME

     mglobalD - report the distance,  as  a  score,  between  two
     sequences, using a penalty matrix.


SYNOPSIS

     mglobalD  db file2 matrix csub alpha beta [ flags ]


DESCRIPTION

     mglobalD reports the distance between the  sequences  in  db
     and the sequence in file2 (given a particular set of scoring
     parameters).

     matrix    is a lower-diagonal penalty matrix  with  26  rows
               and  columns,  corresponding  to the 26 letters of
               the alphabet.  This allows matrices  to  be  built
               for protein, DNA, and RNA sequences depending upon
               the letters used.  The most  common  use  of  this
               matrix  is to compare amino acid sequences in pro-
               teins, but the flexibility  of  the  matrix  input
               allows  other  types  of sequences to be compared.
               The matrix file name ends in ".mat";  this  suffix
               is  not  given.  If the matrix is not found in the
               current directory,  the  directory  given  by  the
               environment variable MATDIR will be examined.


     csub      is the lower limit for conservative substitutions,
               which  are non-matching substitutions printed with
               a `:' in alignment output.


     alpha     the amount to subtract for the first letter of  an
               insertion or deletion sequence (indel).

     beta      is the amount to subtract for  subsequent  letters
               in  an  indel.   For  example, if there is a five-
               letter indel, k = 5, then alpha + beta * ( k - 1 )
               =  alpha  + beta * (4) will be subtracted from the
               score.


     flags     See  manual  page  seqaln-intro  (1)  for  a  full
               description of optional flags.

     The format of sequence files db and file2  is  our  standard
     format,  the  Pearson/FASTA  format.   The first line is the
     sequence name, and should be used as a description.   Subse-
     quent  lines contain the sequence to be used.  The sequences
     themselves may contain blanks, returns, and other whitespace
     for  readability.   The  sequence terminates at end-of-file,
     `>' is read to begin a new sequence  in  the  FASTA  format.
     Only multiple sequences in the first file will be processed.


REFERENCES

     P.H. Sellers.  "On the theory and computation of  evolution-
          ary  distances".   SIAM Journal of Applied Mathematics,
          26 (1974) 787-793.

     M.S. Waterman.  Introduction to Computational Biology: Maps,
     sequences  and  genomes. Chapman & Hall. London: 1995.  ISBN
     0-412-99391-0.


SEE ALSO

     seqaln-intro(1),   globalD(1),   globalS(1),    mglobalS(1
),
     penalty-matrix(5), sequence-file(5).