aacomp
aacomp is a small tool to analyze the amino acid composition of
protein sequences.
To analyze the amino acid composition of
protein sequences, feed a protein sequence to aacomp in raw or
FASTA format. The tool will display various
aspects of the sequence, e.g. an alphabetical listing of all
amino acids with their total number in the sequence, the
relative abundance and the relative amount based on the
molecular weight. You can get a listing sorted by the
properties of the side chains, or you can request
specifically e.g. only all aromatic amino
acids. Furthermore, aacomp can spill other statistical stuff
over the screen, e.g. all amino acids that occur 4 or 27
times in the sequence, regardless of whether this knowledge
is very helpful or not. The output is essentially a
tab-delimited list of the requested amino acids and their
frequency data (see the example
below). Any additional lines are "commented out" with a #,
so automatic processing of the result should be fairly
easy.
aacomp is implemented in ANSI C and should compile on all
systems with an ANSI C compliant compiler. It was tested on
Linux and CygWin/WinNT using gcc in both cases. The program
is a command-line filter application, i.e. it reads the
sequence input from stdin and outputs the results on
stdout.
The software is released under the GNU Public License
(GPL). This entitles you to redistribute and modify the software
for private or commercial uses with certain constraints. See GNU General Public
License for the small print.
The latest version of aacomp (currently v 1.7 2000/07/11) is
always shipped as aacomp.tar.gz (15 kb).
The package contains the sources, a precompiled binary
for CygWin (to
run on Win95/WinNT), a precompiled binary for Linux (glibc),
and a man page. To build the software from the sources, a
gcc -O -s -o aacomp aacomp.c should be all it
takes. To "install" the binary, just move it into a folder
which is in your $PATH, something like /usr/local/bin on
Unix systems.
The following example analyzes the sequence of a protein that I
had the pleasure to work with:
markus@~/programming/aacomp# ./aacomp -s -p < husGCalpha.seq
# Molecular weight: 77363.000000
# Total number of amino acids: 690
# Amino acids occurring 0 times: ASX GLX TER UNK
# Amino acids occurring 2 times: TRP
# Amino acids occurring 15 times: HIS
# Amino acids occurring 16 times: TYR
# Amino acids occurring 19 times: MET
# Amino acids occurring 23 times: CYS
# Amino acids occurring 24 times: ASN
# Amino acids occurring 30 times: ARG
# Amino acids occurring 32 times: ASP GLN
# Amino acids occurring 33 times: THR
# Amino acids occurring 36 times: PRO
# Amino acids occurring 37 times: ALA
# Amino acids occurring 39 times: ILE
# Amino acids occurring 40 times: GLY PHE
# Amino acids occurring 48 times: VAL
# Amino acids occurring 51 times: GLU LYS
# Amino acids occurring 54 times: SER
# Amino acids occurring 68 times: LEU
# Acidic side chains
# AA number relcontent MW-content
all 83 0.120290 0.132637
ASP 32 0.046377 0.047579
GLU 51 0.073913 0.085058
# Basic side chains
# AA number relcontent MW-content
all 96 0.139130 0.171471
ARG 30 0.043478 0.060504
HIS 15 0.021739 0.026568
LYS 51 0.073913 0.084399
# Uncharged polar side chains
# AA number relcontent MW-content
all 159 0.230435 0.225885
ASN 24 0.034783 0.035374
GLN 32 0.046377 0.052956
SER 54 0.078261 0.060745
THR 33 0.047826 0.043094
TYR 16 0.023188 0.033717
# Nonpolar side chains
# AA number relcontent MW-content
all 352 0.510145 0.470007
ALA 37 0.053623 0.033969
CYS 23 0.033333 0.030630
GLY 40 0.057971 0.029485
ILE 39 0.056522 0.056978
LEU 68 0.098551 0.099347
MET 19 0.027536 0.032179
PHE 40 0.057971 0.076019
PRO 36 0.052174 0.045150
TRP 2 0.002899 0.004809
VAL 48 0.069565 0.061441
# Other side chains
# AA number relcontent MW-content
all 0 0.000000 0.000000
ASX 0 0.000000 0.000000
GLX 0 0.000000 0.000000
markus@~/programming/aacomp#
|