by Richard G. Baldwin
Baldwin's Home Page
In Part 2 of this series of articles on SAX, I promised to show you how to write a Java program that uses XML4J to parse a simple XML document.
I promised that the program will deliver a series of events to the appropriate event handler methods as the parser traverses the XML document, and that the event handler methods will extract and display information about the XML document.
I will continue the discussion in the next article where I will show the actual Java code used to parse the XML file and to respond to and handle SAX parser events.
The XML file is shown below:
<?xml version="1.0"?> <bookOfPoems> <poem PoemNumber="1" DummyAttribute="dummy value"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <poem PoemNumber="2" DummyAttribute="dummy value"> <line>Twas the night before Christmas,</line> <line>And all through the house, <line>Not a creature was stirring,</line> <line>Not even a mouse.</line> </poem> </bookOfPoems>
As you can see from the above listing, the XML file used with this sample program represents the rudimentary aspects of a book of poems. It contains one verse each from two well-known poems.
Sometimes I find it easier to visualize the overall element structure of an XML document by removing everything but the tags. The following is a representation of the element structure with the attributes and the content of each element removed.
<?xml version="1.0"?> <bookOfPoems> <poem> <line></line> <line></line> <line></line> <line></line> </poem> <poem> <line></line> <line> <line></line> <line></line> </poem> </bookOfPoems>
The XML markup for the first poem was correct from a syntax viewpoint.
A syntax error was purposely introduced into the second poem to illustrate the error-handling capability of SAX and the IBM parser.
The error is highlighted in bold in the listing shown above. The highlighted element is missing its end tag (
This program uses the IBM Parser for Java (XML4J) along with the XML file shown above to illustrate the trapping and handling of parser events along with customized error handling.
The purpose of the program was to
As mentioned earlier, the first poem had the correct XML syntax. The second poem was purposely missing an end tag midway through the poem.
The program was tested using JDK 1.2 from Sun under Win95 using the XML4J version 2.0 parser from IBM.
I manually inserted some line breaks to force the output material shown below to fit in this format. I also deleted some blank lines to reduce the overall size of the output listing.
The first part of the output from the program is shown below. This part deals only with the beginning of the
Document element, the beginning of the
bookOfPoems element, and the first
poem element. A later section of output deals with the remainder of the XML file.
If you compare this output with the raw XML document shown above, you will see that the first poem was parsed and displayed successfully. The output produced by the program included
Start Document Start element: bookOfPoems Start element: poem Attribute: PoemNumber, Value = 1, Type = CDATA Attribute: DummyAttribute, Value = dummy value, Type = CDATA Start element: line Roses are red, End element: line Start element: line Violets are blue. End element: line Start element: line Sugar is sweet, End element: line Start element: line and so are you. End element: line End element: poem
Each portion of output was the result of an event handler being invoked by the parser. Each event handler extracted and displayed information about that portion of the XML document with which it was concerned when it was invoked.
For example, the first line of output that reads
Start Document was the result of the parser detecting the beginning of the document and invoking the appropriate event handler.
Except for the
Document element, and the
bookOfPoems element, the result of detecting the beginning and the end of each element was included in the output shown above.
The endings of the
bookOfPoems elements are not shown above because, as mentioned earlier, this output does not describe the entire document. This output only describes the beginning of the
Document, the beginning of the
bookOfPoems element, and the first
poem element. Additional output is shown later.
As mentioned earlier, a syntax error was purposely introduced into the second poem in the XML file. The second poem was displayed as shown below. (This output is a continuation of the output shown above.)
I highlighted the line with the missing end element using boldface in the following output so that you can see where the problem occurs.
Attribute: PoemNumber, Value = 2, Type = CDATA Attribute: DummyAttribute, Value = dummy value, Type = CDATA Start element: line Twas the night before Christmas, End element: line Start element: line And all through the house, Start element: line Not a creature was stirring, End element: line Start element: line Not even a mouse. End element: line systemID: file:/G:/Baldwin/AA-School/JavaProg/Combined /Java/Sax01.xml [Fatal Error] Sax01.xml:17:7: "</line>" expected. Terminating
Note that a fatal error occurred at the point where the parser was able to determine that the end tag was missing from one of the lines in the poem.
The error was detected and error processing began following the last line in the second poem. The output from error processing began with the line that reads
systemID: (also highlighted in boldface).
As you can see from the positions of the two sets of boldface characters, this determination was not made until several lines beyond the actual missing tag. A customized error message was produced showing the line number and character number where the error was detected along with the nature of the error.
This delay in detecting the problem resulted from the fact that no DTD was provided and a non-validating parser was used. (Actually, the XML4J parser was used in its non-validating mode.) Therefore, the parser initially believed that the appearance of a start tag ahead of an expected end tag indicated a nesting condition. It wasn't until the parser was later able to determine that this was not an allowable nesting condition that it was able to determine that there was a missing end tag.
Presumably, if there had been a DTD specifying that
<line> tags may not be nested inside of
<line> tags, a validating-parser would have recognized the error as soon as it occurred. If I have the time, I will try to demonstrate this in a subsequent article.
My plan for the next article is to continue the discussion of this program. I will show you the actual Java code in the program that was used to produce the output shown above.
Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.
This HTML page was produced using the WYSIWYG features of Microsoft Word 97. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.
Copyright 2000, Richard G. Baldwin
Baldwin's Home Page