mfitD



NAME

     mfitD - fit one sequence into another for  minimum  distance
     using a penalty matrix


SYNOPSIS

     mfitD  db file2 matrix csub alpha beta [ flags ]


DESCRIPTION

     mfitD reports the distance between the sequences in  db  and
     the  sequence  in  file2  (given a particular set of scoring
     parameters), using a 26 by 26-letter penalty matrix.

     matrix    is a lower-diagonal penalty matrix  with  26  rows
               and  columns,  corresponding  to the 26 letters of
               the alphabet.  This allows matrices  to  be  built
               for protein, DNA, and RNA sequences depending upon
               the letters used.  The most  common  use  of  this
               matrix  is to compare amino acid sequences in pro-
               teins, but the flexibility  of  the  matrix  input
               allows  other  types  of sequences to be compared.
               The matrix file name ends in ".mat";  this  suffix
               is  not  given.  If the matrix is not found in the
               current directory,  the  directory  given  by  the
               environment variable MATDIR will be examined.


     csub      is the lower limit for conservative substitutions,
               which  are non-matching substitutions printed with
               a `:' in alignment output.


     alpha     the amount to add  for  the  first  letter  of  an
               insertion or deletion sequence (indel).

     beta      is the amount to add for subsequent letters in  an
               indel.   For  example,  if  there is a five-letter
               indel, i.e.  k = 5, then alpha + beta * ( k - 1  )
               = alpha + beta * (4) will be added to the score.


     flags     See  manual  page  seqaln-intro  (1)  for  a  full
               description of optional flags.

     The format of sequence files db and file2  is  our  standard
     format,  the  Pearson/FASTA  format.   The first line is the
     sequence name, and should be used as a description.   Subse-
     quent  lines contain the sequence to be used.  The sequences
     themselves may contain blanks, returns, and other whitespace
     for  readability.   The  sequence terminates at end-of-file,
     `>' is read to begin a new sequence  in  the  FASTA  format.
     Only multiple sequences in the first file will be processed.


REFERENCES

     P.H. Sellers.  "On the theory and computation of  evolution-
          ary  distances".   SIAM Journal of Applied Mathematics,
          26 (1974) 787-793.

     M.S. Waterman.  Introduction to Computational Biology: Maps,
     sequences  and  genomes. Chapman & Hall. London: 1995.  ISBN
     0-412-99391-0.


SEE ALSO

     seqaln-intro(1),  fitD(1),  fit
S(1),   mfitS(1),   pfitS(1),
     penalty-matrix(5), sequence-file(5).