mglobalS



NAME

     mglobalS - compare two sequences to find best-scoring global
     alignments, using a penalty matrix.


SYNOPSIS

     mglobalS  db file2 matrix csub alpha beta [ flags ]


DESCRIPTION

     mglobalS finds the best global similarity alignment  between
     sequences  in db and the sequence in file2, given a particu-
     lar set of scoring parameters.

     matrix    is a lower-diagonal penalty matrix  with  26  rows
               and  columns,  corresponding  to the 26 letters of
               the alphabet.  This allows matrices  to  be  built
               for protein, DNA, and RNA sequences depending upon
               the letters used.  The most  common  use  of  this
               matrix  is to compare amino acid sequences in pro-
               teins, but the flexibility  of  the  matrix  input
               allows  other  types  of sequences to be compared.
               The matrix file name ends in ".mat";  this  suffix
               is  not  given.  If the matrix is not found in the
               current directory,  the  directory  given  by  the
               environment variable MATDIR will be examined.


     csub      is the lower limit for conservative substitutions,
               which  are non-matching substitutions printed with
               a `:' in alignment output.


     alpha     the amount to subtract for the first letter of  an
               insertion or deletion sequence (indel).

     beta      is the amount to subtract for  subsequent  letters
               in  an  indel.   For  example, if there is a five-
               letter indel, k = 5, then alpha + beta * ( k - 1 )
               =  alpha  + beta * (4) will be subtracted from the
               score.


     flags     See  manual  page  seqaln-intro  (1)  for  a  full
               description of optional flags.

     The format of sequence files db and file2  is  our  standard
     format,  the  Pearson/FASTA  format.   The first line is the
     sequence name, and should be used as a description.   Subse-
     quent  lines contain the sequence to be used.  The sequences
     themselves may contain blanks, returns, and other whitespace
     for  readability.   The  sequence terminates at end-of-file,
     `>' is read to begin a new sequence  in  the  FASTA  format.
     Only multiple sequences in the first file will be processed.


REFERENCES

     S.B. Needleman and C.D. Wunsch.  "A general method  applica-
          ble  to  the  search for similarities in the amino acid
          sequence of two proteins".  Journal of Molecular  Biol-
          ogy, 48, (1970) 443-453.

     Smith, T.F., M.S.  Waterman,  and  W.M.  Fitch.  Comparative
          biosequence metrics.  J. Molecular Evolution, 18 (1981)
          38-46.

     M.S. Waterman.  Introduction to Computational Biology: Maps,
     sequences  and  genomes. Chapman & Hall. London: 1995.  ISBN
     0-412-99391-0.


SEE ALSO

     seqaln-intro(1),   globalS(1),   globalD(1),    mglobalD(1
),
     penalty-matrix(5), sequence-file(5).