Logo
root@/software/scisoft/aacomp# cursor
home
software
  sgml
  scientific sw
    aacomp
    med2rm
    tricarb
    uvikon
  cygwin ports
  hacks
  history
photography
about
contact

aacomp

aacomp is a small tool to analyze the amino acid composition of protein sequences.

ruler

Overview

To analyze the amino acid composition of protein sequences, feed a protein sequence to aacomp in raw or FASTA format. The tool will display various aspects of the sequence, e.g. an alphabetical listing of all amino acids with their total number in the sequence, the relative abundance and the relative amount based on the molecular weight. You can get a listing sorted by the properties of the side chains, or you can request specifically e.g. only all aromatic amino acids. Furthermore, aacomp can spill other statistical stuff over the screen, e.g. all amino acids that occur 4 or 27 times in the sequence, regardless of whether this knowledge is very helpful or not.

The output is essentially a tab-delimited list of the requested amino acids and their frequency data (see the example below). Any additional lines are "commented out" with a #, so automatic processing of the result should be fairly easy.

aacomp is implemented in ANSI C and should compile on all systems with an ANSI C compliant compiler. It was tested on Linux and CygWin/WinNT using gcc in both cases. The program is a command-line filter application, i.e. it reads the sequence input from stdin and outputs the results on stdout.

ruler

License information

The software is released under the GNU Public License (GPL). This entitles you to redistribute and modify the software for private or commercial uses with certain constraints. See GNU General Public License for the small print.

ruler

Download information

The latest version of aacomp (currently v 1.7 2000/07/11) is always shipped as aacomp.tar.gz (15 kb).

The package contains the sources, a precompiled binary for CygWin (to run on Win95/WinNT), a precompiled binary for Linux (glibc), and a man page. To build the software from the sources, a gcc -O -s -o aacomp aacomp.c should be all it takes. To "install" the binary, just move it into a folder which is in your $PATH, something like /usr/local/bin on Unix systems.

ruler

Example output

The following example analyzes the sequence of a protein that I had the pleasure to work with:

markus@~/programming/aacomp# ./aacomp -s -p < husGCalpha.seq 
# Molecular weight: 77363.000000
# Total number of amino acids: 690
# Amino acids occurring 0 times: ASX GLX TER UNK
# Amino acids occurring 2 times: TRP
# Amino acids occurring 15 times: HIS
# Amino acids occurring 16 times: TYR
# Amino acids occurring 19 times: MET
# Amino acids occurring 23 times: CYS
# Amino acids occurring 24 times: ASN
# Amino acids occurring 30 times: ARG
# Amino acids occurring 32 times: ASP GLN
# Amino acids occurring 33 times: THR
# Amino acids occurring 36 times: PRO
# Amino acids occurring 37 times: ALA
# Amino acids occurring 39 times: ILE
# Amino acids occurring 40 times: GLY PHE
# Amino acids occurring 48 times: VAL
# Amino acids occurring 51 times: GLU LYS
# Amino acids occurring 54 times: SER
# Amino acids occurring 68 times: LEU

# Acidic side chains
# AA	number	relcontent	MW-content
all	83	0.120290	0.132637
ASP	32	0.046377	0.047579
GLU	51	0.073913	0.085058

# Basic side chains
# AA	number	relcontent	MW-content
all	96	0.139130	0.171471
ARG	30	0.043478	0.060504
HIS	15	0.021739	0.026568
LYS	51	0.073913	0.084399

# Uncharged polar side chains
# AA	number	relcontent	MW-content
all	159	0.230435	0.225885
ASN	24	0.034783	0.035374
GLN	32	0.046377	0.052956
SER	54	0.078261	0.060745
THR	33	0.047826	0.043094
TYR	16	0.023188	0.033717

# Nonpolar side chains
# AA	number	relcontent	MW-content
all	352	0.510145	0.470007
ALA	37	0.053623	0.033969
CYS	23	0.033333	0.030630
GLY	40	0.057971	0.029485
ILE	39	0.056522	0.056978
LEU	68	0.098551	0.099347
MET	19	0.027536	0.032179
PHE	40	0.057971	0.076019
PRO	36	0.052174	0.045150
TRP	2	0.002899	0.004809
VAL	48	0.069565	0.061441

# Other side chains
# AA	number	relcontent	MW-content
all	0	0.000000	0.000000
ASX	0	0.000000	0.000000
GLX	0	0.000000	0.000000
markus@~/programming/aacomp#