Some general remarks on DTDs and catalogs

For first-time users it is often hard to understand how DTDs, catalog files, OpenJade/onsgmls, and PSGML interact, and to understand what is wrong when they refuse to interact properly. This section briefly describes the general setup of DTDs, before we will install the DocBook DTD and HTML DTDs as examples.

When we installed the SP suite, we created an environment variable called SGML_CATALOG_FILES. This is simply the full path of a catalog file or a list of such full paths. These catalog files map public identifiers of DTDs to actual files that a SGML-processing application can access. It is mainly a matter of taste whether you use one catalog file or more. Using one catalog file keeps all information in one place, but it requires more manual work if you add or update DTDs.

In the simple example file test.sgml that you created initially, we wrote a small SGML document that carried its document type definition at the beginning of the document file. This is fine for small and custom DTDs, but it is inefficient if many documents use the same DTD. Therefore SGML allows to keep the DTDs in separate files and to reference these external files at the beginning of a SGML document. This reference may look like this:

      
      <!DOCTYPE HTML SYSTEM "html.dtd">
      
   

This translates to: The document type of this document is HTML, and the DTD which describes this document type is available on the local system in a file called html.dtd. It is assumed that html.dtd is in the same directory as the document. This basically works fine, but has one major drawback: These documents are not easily portable.

The use of catalog files overcomes this limitation. The corresponding prolog may look like this:

      
      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
      
   

This translates to: The document type of this document is HTML, and the DTD has the formal public identifier -//IETF//DTD HTML 2.0//EN. This is where catalog files come into play. It is expected that a catalog file (accessible via the SGML_CATALOG_FILES environment variable) has an entry which resolves this identifier to a local file or an URL. The corresponding entry in a catalog file might look like this:

      
      PUBLIC "-//IETF//DTD HTML 2.0//EN" "html/html.dtd"
      
   

This now translates to: The DTD which is referenced with the formal public identifier -//IETF//DTD HTML 2.0//EN is stored locally in the file html.dtd. The exact path of this file is always relative to the catalog file. In this example the file html.dtd can be found in the html subdirectory of the directory that contains the catalog file.

Taken together, the idea of installing a DTD on the local system is to:

  1. Put the DTD file(s) in some directory

  2. Provide a new or edit an existing catalog file which resolves the identifier to a local file

  3. Make the catalog file accessible via the SGML_CATALOG_FILES environment variable.

The procedure described so far will enable openade, onsgmls, and PSGML to access DTDs via catalog files. Before a DTD can be used, it has to be parsed to create a representation in the memory. This parsing can take quite a long time, which is especially a problem with PSGML: the internal parser is implemented in the interpreted language Emacs Lisp. Therefore PSGML provides a mechanism to store a memory representation of a parsed DTD in a separate file and read this instead of the original DTD. Just as the original DTDs, the pre-parsed DTDs can be accessed via catalog files as well. As only PSGML uses this kind of DTD, there is no environment variable to locate these catalog files. This is set in the _emacs file instead. The basic steps to make use of this PSGML feature are:

  1. Load a document that uses the DTD (or create a new, empty one) and parse the DTD with PSGML

  2. Save the parsed DTD with PSGML into the directory that contains the DTD

  3. Create or edit a catalog file (usually called ecatalog to distinguish it from regular catalog files) which resolves the DTD identifier to the parsed DTD.

  4. Modify your _emacs to tell PSGML where the ecatalog files can be found.

Now, whenever you open or create a SGML document, PSGML first checks whether a parsed version of the given DTD exists and uses it if present. If no parsed version exists, it uses the DTD itself and saves a parsed version for the next time you open a file with this DTD. Whether or not the additional hassle with ecatalog files pays off depends on the speed of your computer and the size of the DTDs that you use.

Note: This automatic saving of the parsed version may fail if you do not have sufficient access rights. In that case, you should log in as an administrator and perform the first two steps in the procedure above for each DTD that you want to access through an ecatalog entry.

We will now go ahead and use this knowledge to install a few useful DTDs.