What is SAX, Part 2

by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page

Dateline: 06/07/99

prolog

In Part 1 of this series of articles, I promised that I would continue the discussion of SAX in a future article. I also promised to introduce you to a software product from IBM named XML Parser for Java (XML4J). This product supports both SAX and DOM, and lets the application programmer combine the two approaches in a single application.

what is XML4J?

IBM's XML parser for Java is a validating XML parser written in 100% pure Java. As of this writing, it can be downloaded free of charge from the IBM alphaWorks site. This is the parser that I will use (in most cases) in the sample programs that I will provide for the articles on SAX and DOM.

Version 1 of IBM's XML Parser for Java was the highest rated Java XML parser in Java Report's February 1999 review of XML parsers.

support for standards

Very important to application programmers is the fact that the parser supports the following standards:

Support of SAX is of particular interest in this article. Support for DOM will become important in subsequent articles.

what does SAX support really mean?

What does it mean to say that the parser supports SAX 1.0? The SAX specification consists primarily of a set of interface definitions. There are few, if any concrete class and method definitions in SAX. Therefore, SAX is really a definitive specification as to how an event-based parser should behave from the programming interface viewpoint.

Therefore, a parser that implements SAX 1.0 will implement the interfaces defined in SAX 1.0, providing concrete class and method definitions for the methods declared in the various SAX interfaces. Exactly how those interfaces get implemented is up to the designer of the parser product.

Furthermore, those classes and methods will be implemented in such a way that application programmers can gain access to the various event-based capabilities of the parser using the method signatures declared in the SAX interface definitions.

One vendor may implement those methods differently from another vendor insofar as the names of the classes and the inner workings of the methods are concerned. However, the programming interface and the resulting behavior of the methods will be as defined in SAX.

example event handler

For example, SAX declares a method named startDocument(). This is the declaration for an event-handler method that will be invoked when the parser encounters the beginning of the XML document.

the programmer's responsibility

It is the responsibility of the application programmer to override this method to provide the desired behavior when the parser begins parsing a document.

If the terminology "override this method" is new to you, see my online Java tutorials for an explanation of this and other Object-Oriented Programming concepts.

the parser's responsibility

It is the responsibility of the parser software to invoke this overridden method when the parser begins parsing a document. This invocation causes the behavior of the overridden method to manifest itself.

It is also the responsibility of the parser software to provide a default version of this method that will be invoked when the application programmer chooses not to override it.

other capabilities of IBM's XML4J

In addition to supporting SAX and DOM, the IBM parser also provides a number of other capabilities.

According to IBM:

The rich generating and validating capabilities allow the XML4J Parser to be used for:

  • Building XML-savvy Web servers
  • The next generation of vertical applications, which will use XML as their data format
  • On-the-fly validation for creating XML editors
  • Ensuring the integrity of e-business data expressed in XML
  • Building truly internationalized XML applications

As of this writing, a FAQ is available inside the XML4J download package that should answer many of the questions that you may have about the parser.

other SAX and DOM compliant parsers

The IBM parser is not the only SAX and DOM compliant parser available. Sun provides a parser that you can download free of charge. You will have to become a registered member of the Java Developer Connection to download this parser, but registration is free. This is what Sun has to say about their parser:

This package features a fast validating and non-validating XML parser that fully conforms to the W3C XML 1.0 recommendation and SAX 1.0 API, and supports the W3C DOM-compliant object model tree for manipulating and writing XML structured data.

Another interesting parser is the OpenXML parser that you can download free of charge from OpenXML. The documentation indicates that this parser is also SAX and DOM compliant.

coming attractions...

My plan for the next article is to continue the discussion of SAX-based parsers. I will show you how to write a simple Java program that uses XML4J to parse the following XML document.

The program will deliver a series of events to the appropriate event handler methods as the parser traverses the XML document. The event handler methods will extract and display information about the XML document.

<?xml version="1.0"?>

<bookOfPoems>

<poem PoemNumber="1" 
      DummyAttribute="dummy value">
<line>Roses are red,</line>
<line>Violets are blue.</line>
<line>Sugar is sweet,</line>
<line>and so are you.</line>
</poem>

<poem PoemNumber="2"
      DummyAttribute="dummy value">
<line>Twas the night before Christmas,</line>
<line>And all through the house,
<line>Not a creature was stirring,</line>
<line>Not even a mouse.</line>
</poem>

</bookOfPoems>

the XML octopus

Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.

Credits

This HTML page was produced using the WYSIWYG features of Microsoft Word 97. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.

310913

Copyright 2000, Richard G. Baldwin

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin@austin.cc.tx.us
Baldwin's Home Page

-end-