overS



NAME

     overS - find overlaps between two sequences.


SYNOPSIS

     overS  db file2 match mismatch alpha beta [ flags ]


DESCRIPTION

     overS finds the best similarity  alignment  of  the  overlap
     between the sequences in db and the sequence in file2, given
     a particular set of scoring parameters.

     The scoring parameters are all integer values, and all posi-
     tive.   The  parameters  mismatch,  alpha  and beta are sub-
     tracted from the score; match is added to the score.

     match     the score for aligning identical letters.

     mismatch  the amount to subtract for a mismatch.

     alpha     the amount to subtract for the first letter of  an
               insertion or deletion sequence (indel).

     beta      is the amount to subtract for  subsequent  letters
               in  an  indel.   For  example, if there is a five-
               letter indel, k = 5, then alpha + beta * ( k - 1 )
               =  alpha  + beta * (4) will be subtracted from the
               score.


     flags     See  manual  page  seqaln-intro  (1)  for  a  full
               description of optional flags.

     Some flags of particular use with the overlap software are:

     -1        Don't report best alignment ending at the  end  of
               the first sequence.

     -2        Don't report best alignment ending at the  end  of
               the second sequence.

     +1        Report best alignment ending at  the  end  of  the
               first sequence.

     +2        Report best alignment ending at  the  end  of  the
               second sequence.

     +3        Report the highest score ending at both the  first
               and second sequences.

     The format of sequence files db and file2  is  our  standard
     format,  the  Pearson/FASTA  format.   The first line is the
     sequence  name,  and  should  be  used  as  a   description.
     Subsequent  lines  contain  the  sequence  to  be used.  The
     sequences themselves may contain blanks, returns, and  other
     whitespace  for  readability.   The  sequence  terminates at
     end-of-file, `>' is read to begin  a  new  sequence  in  the
     FASTA  format.   Only  multiple  sequences in the first file
     will be processed.


REFERENCES

     M.S. Waterman.  Introduction to Computational Biology: Maps,
     sequences  and  genomes. Chapman & Hall. London: 1995.  ISBN
     0-412-99391-0.


SEE ALSO

     seqaln-intro(1), moverS(1), sequence-file(5).