globalS



NAME

     globalS - compare two sequences to find best-scoring  global
     alignments.


SYNOPSIS

     globalS  db file2 match mismatch alpha beta [ flags ]


DESCRIPTION

     globalS finds the best global alignment between sequences in
     db  and  the  sequence  in  file2, given a particular set of
     scoring parameters.

     The scoring parameters are all integer values, and all posi-
     tive.   The  parameters  mismatch,  alpha  and beta are sub-
     tracted from the score; match is added to the score.

     match     the score for aligning identical letters.

     mismatch  the amount to subtract for a mismatch.

     alpha     the amount to subtract for the first letter of  an
               insertion or deletion sequence (indel).

     beta      is the amount to subtract for  subsequent  letters
               in  an  indel.   For  example, if there is a five-
               letter indel, k = 5, then alpha + beta * ( k - 1 )
               =  alpha  + beta * (4) will be subtracted from the
               score.


     flags     See  manual  page  seqaln-intro  (1)  for  a  full
               description of optional flags.

     The format of sequence files db and file2  is  our  standard
     format,  the  Pearson/FASTA  format.   The first line is the
     sequence name, and should be used as a description.   Subse-
     quent  lines contain the sequence to be used.  The sequences
     themselves may contain blanks, returns, and other whitespace
     for  readability.   The  sequence terminates at end-of-file,
     `>' is read to begin a new sequence  in  the  FASTA  format.
     Only multiple sequences in the first file will be processed.


EXAMPLES

     The examples below compare the sequence  CATATGGC  with  the
     sequence  TACGATCGGC,  and demonstrate how different choices
     for the scoring parameters can produce different alignments.
     In  both  examples,  only  the  best (i.e., highest scoring)
     alignment is requested.

          example% globalS f1 f2 10 5 7 15


          finds 6 matches, 2 mismatches, and 2 one-letter indels, beginning at
          position 1 for each sequence:

               1 CA-TAT-GGC
                  |  || |||
               1 TACGATCGGC

          for a score of 10(6) - 2(5) - 2(7) = 36.

          example% globalS f1 f2 10 15 7 5

          finds 6 matches, 4 one-letter indels, and 1 two-letter indel, preceding the
          first sequence, and beginning at position 1 in the second sequence:

               0 -CA--TAT-GGC
                   |   || |||
               1 T-ACG-ATCGGC

          for a score of 10(6) - 4(7) - (7+7) = 18.


REFERENCES

     S.B. Needleman and C.D. Wunsch.  "A general method  applica-
          ble  to  the  search for similarities in the amino acid
          sequence of two proteins".  Journal of Molecular  Biol-
          ogy, 48, (1970) 443-453.

     Smith, T.F., M.S.  Waterman,  and  W.M.  Fitch.  Comparative
          biosequence metrics.  J. Molecular Evolution, 18 (1981)
          38-46.

     M.S. Waterman.  Introduction to Computational Biology: Maps,
     sequences  and  genomes. Chapman & Hall. London: 1995.  ISBN
     0-412-99391-0.


SEE ALSO

     seqaln-intro(1),   mglobalS(1),   globalD(1),   mglobalD(
1),
     sequence-file(5).