What is SAX, Part 1

by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page

Dateline: 05/23/99

introduction
Several previous articles introduced you to XML. I concluded those articles with the following statement:

"XML by itself really isn't very useful. On the bottom line, XML is nothing more than a specification for how to create structured documents and data. To be useful, the XML document must be combined with a program designed to do something useful with that document.

Java is a strong contender for the writing of programs that do useful things with XML documents."

programmer wanted
XML can be a very useful tool, but in order for it to become useful, lots of computers must be equipped with programs that understand, speak, read, and write XML. (The above link describes some programs that are currently in development.)

Where do these programs come from? Obviously programmers write them. Programmers need programming tools, and SAX is a programming tool.

so, what is SAX, anyway?
Computer programs are written using programming languages, such as C, C++, and Java. There was a time when we started every new program from scratch and reinvented everything on a daily basis. Fortunately, some modern programmers have learned that starting from scratch every time is not always the best approach. Some among us have learned the value of "reusable code."

to OOP or not to OOP
There is a body of technology, often referred to as Object Oriented Programming that has as one of its major advantages a strong emphasis on the reusability of code. In this technology area, reusable code typically comes in the form of class libraries that make it fairly easy to do difficult tasks the same way every time without the need to reinvent the code every time.

A good example of code reuse through a class library is the creation and rendering of a typical button in a graphical user interface (GUI).

behold, the interface
Object oriented programming (of the Java variety at least) also brings with it another higher-level concept known as an interface. In a nutshell, an interface definition in a programming language such as Java specifies the programming interface to a module of code. And this is where SAX comes in.

SAX is a set of interface definitions
For the most part, SAX is a set of interface definitions. They specify one of the ways that application programs can interact with XML documents.

(There are other ways for programs to interact with XML documents as well. Prominent among them is the Document Object Model, or DOM, which will be the topic for a later article.)

event-driven programming
Another modern programming concept is event-driven programming. In a nutshell, event-driven programming describes a programming style where the program waits for some interesting event to happen, and then takes the appropriate action. An event can be anything interesting for a particular application, such as a change in the price of a stock, or a mouse click on a GUI button.

SAX brings us both
SAX brings both of these important programming concepts to the programmer who is interested in writing programs for processing XML documents:

SAX is not a commercial product intended for sale. Rather, SAX is a technical product specification that explains how those who develop software products for XML document processing should go about it.

specific information
SAX is a technical product specification provided by Megginson Technologies Ltd.

As of 5/23/99,

you may already have it
If you are using a SAX-based parser for the development of XML processing programs, the SAX libraries and documentation may be contained in the libraries and documentation for the parser. A separate download of SAX may not be necessary.

I will have more to say about a particular SAX based parser that is available from IBM in the next article in this series.

what is a parser?
A parser, in this context, is a software tool that preprocesses an XML document in some fashion, handing the results over to an application program. The primary purpose of the parser is to do most of the hard work up front and to provide the application program with the XML information in a form that is easier to work with.

what does Megginson have to say?
Quoting the folks at Megginson Technologies Ltd.

SAX is a standard interface for event-based XML parsing, developed collaboratively by the members of the XML-DEV mailing list. SAX 1.0 was released on Monday 11 May 1998, and is free for both commercial and non-commercial use.

SAX implementations are currently available in Java and Python, with more to come. SAX 1.0 support in both parsers and applications is growing fast: see the Parsers and Applications page for details.

what is an API?
API is the common jargon for Application Programming Interface. An API usually contains a variety of features that make it easier for the application programmer to write difficult programs (such as GUI programs).

two types of XML parsers
There are at least two types of XML parser APIs commonly used for the development of programs to process XML documents:

tree-based API
A tree-based API compiles an XML document into an internal tree structure. This makes it possible for an application program to navigate the tree to achieve its objective. The Document Object Model (DOM) working group at the W3C is developing a standard tree-based API for XML.

One of my earlier articles gave you some links regarding a tree-based API from Microsoft. I will pursue the tree-based topic more fully, concentrating on the DOM, in subsequent articles.

However, this article isn't about DOM and tree-based APIs. It is about SAX, which provides an event-based API

event-based API
An event-based API reports parsing events (such as the start and end of elements) to the application using callbacks. The application implements and registers event handlers for the different events. Code in the event handlers is designed to achieve the objective of the application. The process is similar (but not identical) to creating and registering event listeners in the Java Delegation Event Model.

event processing can be more efficient
In some cases, an event-based API can be more efficient than a tree-based API. In addition, Java programmers familiar with the use of event-driven programming may find the event-based API to be more familiar ground. Generally, an event-based API provides a simpler, lower-level access to an XML document.

In a subsequent article, I will introduce you to a sample Java program that illustrates how to use the SAX API within an event-based parser.

advantages of using SAX
If I decide to use an event-based parser, (instead of a tree-based parser) why should I care whether or not the parser is based on SAX?

There are several advantages to using a parser based on SAX. Foremost among them is the aspect of standardization. If I learn how to use one SAX based parser, then I will know how to use most SAX based parsers.

Another advantage is code portability among parsers. Code written for one SAX based parser should be compatible with another SAX based parser with few modifications required.

Richard G. Baldwin

coming attractions...

My next article will continue the discussion of SAX-based parsers, and in particular will introduce you to a software product from IBM named XML for Java or XML4J for short. This product supports both SAX and DOM, and lets the application programmer combine the two approaches in a single application. You can download this parser from IBM free of charge.

Credits: These HTML pages were produced using the WYSIWYG features of Microsoft Word 97. The computer image used on this page was used with permission from the Microsoft Word 97 Clipart Gallery.

310913

Copyright 2000, Richard G. Baldwin

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin@austin.cc.tx.us
Baldwin's Home Page

-end-