sgrep
A text search tool for structured documents:
sgrep is a tool for searching the contents of text files,
similar to grep. In contrast to the latter, sgrep works on
regions which are either defined by offsets to the file
start or by text patterns which are matched either case
insensitive or exact. This allows to search for text
occurring within given delimiters, e.g. to search for a
certain string in the header but not the body part of a
HTML document. It also allows to extract overlapping or
excluding regions within delimiters. A special feature is
the nearness condition, which allows to search for text
occurring within a certain offset from the starting
region. A search for "Romeo" and "Juliet" with less than
30 characters between them may be more useful than a
search for a file that simply contains both words.
sgrep is very useful for SGML/XML/HTML files, but is by no
means limited to them. It can be used to query program
source code as well as email or usenet news. The tool
comes in very handy if you need to assemble SGML or HTML
files from parts of other files.
sgrep was written by Jani Jaakkola and Pekka Kilpeläinen
at the University of Helskinki. The CygWin version is based on
sgrep-1.92a.
First run ./configure. This will generate Makefile. Edit this
Makefile as follows:
Change the line:
bin_PROGRAMS = sgrep
to:
bin_PROGRAMS = sgrep.exe
Change the line:
sgrep: $(sgrep_OBJECTS) $(sgrep_DEPENDENCIES)
to:
sgrep.exe: $(sgrep_OBJECTS) $(sgrep_DEPENDENCIES)
add the -s flag to CFLAGS
Then make and make install run without problems as long as a
Unix-like directory structure (/usr/local/bin,
/usr/local/man, /usr/local/share) exists.
System requirements
The following setup worked in my case:
- Windows NT 4.0 SP3 Workstation
- CygWinB20.1
- A directory
/usr/local/bin and symlinks in /usr/local called man and share, pointing to the cygwinb20/man and cygwin-b20/share directories, respectively. Equivalent mounts will work as well.
Get the original sgrep
sources here.
The precompiled binary for CygwinB20 with the data files are shipped as cygwinb20-sgrep-1.92a.tar.gz (69 kb).
|