by Richard G. Baldwin
Baldwin's Home Page
In Part 1 of this series of articles, I promised that I would continue the discussion of SAX in a future article. I also promised to introduce you to a software product from IBM named XML Parser for Java (XML4J). This product supports both SAX and DOM, and lets the application programmer combine the two approaches in a single application.
IBM's XML parser for Java is a validating XML parser written in 100% pure Java. As of this writing, it can be downloaded free of charge from the IBM alphaWorks site. This is the parser that I will use (in most cases) in the sample programs that I will provide for the articles on SAX and DOM.
Version 1 of IBM's XML Parser for Java was the highest rated Java XML parser in Java Report's February 1999 review of XML parsers.
Very important to application programmers is the fact that the parser supports the following standards:
Support of SAX is of particular interest in this article. Support for DOM will become important in subsequent articles.
What does it mean to say that the parser supports SAX 1.0? The SAX specification consists primarily of a set of interface definitions. There are few, if any concrete class and method definitions in SAX. Therefore, SAX is really a definitive specification as to how an event-based parser should behave from the programming interface viewpoint.
Therefore, a parser that implements SAX 1.0 will implement the interfaces defined in SAX 1.0, providing concrete class and method definitions for the methods declared in the various SAX interfaces. Exactly how those interfaces get implemented is up to the designer of the parser product.
Furthermore, those classes and methods will be implemented in such a way that application programmers can gain access to the various event-based capabilities of the parser using the method signatures declared in the SAX interface definitions.
One vendor may implement those methods differently from another vendor insofar as the names of the classes and the inner workings of the methods are concerned. However, the programming interface and the resulting behavior of the methods will be as defined in SAX.
For example, SAX declares a method named startDocument(). This is the declaration for an event-handler method that will be invoked when the parser encounters the beginning of the XML document.
It is the responsibility of the application programmer to override this method to provide the desired behavior when the parser begins parsing a document.
If the terminology "override this method" is new to you, see my online Java tutorials for an explanation of this and other Object-Oriented Programming concepts.
It is the responsibility of the parser software to invoke this overridden method when the parser begins parsing a document. This invocation causes the behavior of the overridden method to manifest itself.
It is also the responsibility of the parser software to provide a default version of this method that will be invoked when the application programmer chooses not to override it.
In addition to supporting SAX and DOM, the IBM parser also provides a number of other capabilities.
According to IBM:
The rich generating and validating capabilities allow the XML4J Parser to be used for:
As of this writing, a FAQ is available inside the XML4J download package that should answer many of the questions that you may have about the parser.
The IBM parser is not the only SAX and DOM compliant parser available. Sun provides a parser that you can download free of charge. You will have to become a registered member of the Java Developer Connection to download this parser, but registration is free. This is what Sun has to say about their parser:
This package features a fast validating and non-validating XML parser that fully conforms to the W3C XML 1.0 recommendation and SAX 1.0 API, and supports the W3C DOM-compliant object model tree for manipulating and writing XML structured data.
Another interesting parser is the OpenXML parser that you can download free of charge from OpenXML. The documentation indicates that this parser is also SAX and DOM compliant.
My plan for the next article is to continue the discussion of SAX-based parsers. I will show you how to write a simple Java program that uses XML4J to parse the following XML document.
The program will deliver a series of events to the appropriate event handler methods as the parser traverses the XML document. The event handler methods will extract and display information about the XML document.
<?xml version="1.0"?> <bookOfPoems> <poem PoemNumber="1" DummyAttribute="dummy value"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <poem PoemNumber="2" DummyAttribute="dummy value"> <line>Twas the night before Christmas,</line> <line>And all through the house, <line>Not a creature was stirring,</line> <line>Not even a mouse.</line> </poem> </bookOfPoems>
Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.
This HTML page was produced using the WYSIWYG features of Microsoft Word 97. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.
Copyright 2000, Richard G. Baldwin
Baldwin's Home Page