Richard G Baldwin (512) 223-4758, baldwin@austin.cc.tx.us, http://www2.austin.cc.tx.us/baldwin/

XML Document Processing using a SAX-Based XML Parser

Java Programming, Lecture Notes # 823, Revised 5/21/99.


Preface

Students in Prof. Baldwin's Advanced Java Programming classes at ACC will be responsible for knowing and understanding all of the material in this lesson beginning with the summer semester of 1999.

This lesson was originally written on April 7, 1999 and has been updated several times since then.

What is SAX?

General information

SAX is a product of Megginson Technologies Ltd., http://www.megginson.com/index.html.

As of 5/7/99,

You may already have it

However, if you are using a SAX based parser, the SAX libraries and documentation are probably contained in the libraries and documentation for the parser and a separate download of SAX may not be necessary. I will have more to say about a particular SAX based parser later in this lesson.

What does Megginson have to say?

Quoting the folks at Megginson Technologies Ltd.

SAX is a standard interface for event-based XML parsing, developed collaboratively by the members of the XML-DEV mailing list. SAX 1.0 was released on Monday 11 May 1998, and is free for both commercial and non-commercial use.

SAX implementations are currently available in Java and Python, with more to come. SAX 1.0 support in both parsers and applications is growing fast: see the Parsers and Applications page for details.

Two types of XML parsers

There are at least two types of XML parser APIs:

Tree-based API

A tree-based API compiles an XML document into an internal tree structure. This makes it possible for an application to navigate the tree to achieve its objective. The Document Object Model (DOM) working group at the W3C is developing a standard tree-based API for XML.

An earlier lesson on XML gave you a glimpse of the use of a tree-based API from Microsoft. I will pursue this topic more fully, concentrating on the DOM, in subsequent lessons.

Event-based API

This lesson deals with the use of the SAX event-based API

An event-based API reports parsing events (such as the start and end of elements) to the application using callbacks. The application implements and registers event handlers for the different events. Code in the event handlers is designed to achieve the objective. The process is similar (but not identical) to creating and registering event listeners in the Java Delegation Event Model.

Event processing can be more efficient

In some cases, an event-based API is more efficient than a tree-based API. In addition, Java programmers familiar with the use of event-driven programming may find the event-based API to be somewhat more familiar ground. Generally, an event-based API provides a simpler, lower-level access to an XML document.

A sample program later in this lesson will illustrate how to use the SAX API within an event-based parser.

Advantages of using SAX

So, if I decide to use an event-based parser, why should I care whether or not the parser is based on SAX?

There are several advantages to using a parser based on SAX. Foremost among them is the aspect of standardization. If I learn how to use one SAX based parser, then I will know how to use most SAX based parser.

Another advantage is code portability among parsers. Code written for one SAX based parser should be compatible with another SAX based parser with few or no modifications.

Using parsers from different vendors

Although I haven't attempted to confirm this, I believe that the sample program in this lesson could be used with any parser that faithfully implements SAX 1.0 simply by changing the following line of code:

static final String parserClass = 
                           "com.ibm.xml.parsers.SAXParser";

This line of code identifies the vendor of the parser software, which in this case is IBM. Had I used a command-line argument to specify the vendor, even this line of code wouldn't require changing in order to use a SAX based parser from a different vendor.

What is IBM's XML Parser for Java?

XML4J is the acronym

IBM's XML Parser for Java is a validating XML parser written in 100% pure Java. As of 5/7/99, it can be downloaded free of charge from http://www.alphaworks.ibm.com/formula/xml. This is the parser that I will use in the sample programs in this and the next several lessons.

To throw in a little plug for IBM, "Version 1 of XML Parser for Java was the highest rated Java XML parser in Java Report's February 1999 review of XML parsers."

Support for standards

Very important to us is the fact that the parser supports the following:

Support of SAX is of particular importance in this lesson. Support for DOM will become important in subsequent lessons.

What does SAX support really mean?

What does it mean to say that the parser supports SAX 1.0? The SAX specification consists primarily of a set of interface definitions. There are few, if any concrete class and method definitions in SAX. Therefore, SAX is really a definitive specification as to how an event-based parser should behave from the programming interface viewpoint.

Therefore, a parser that implements SAX 1.0 will implement the interfaces defined in SAX 1.0, providing concrete class and method definitions for the methods declared in the various SAX interfaces. Exactly how those interfaces get implemented is up to the designer of the parser product.

Furthermore, those classes and methods will be implemented in such a way that application programmers can gain access to the various event-based capabilities of the parser using the method signatures declared in the SAX interface definitions.

One vendor may implement those methods differently from another vendor insofar as the names of the classes and the inner workings of the methods are concerned. However, the programming interface and the resulting behavior of the methods will be as defined in SAX.

Example event handler

For example, SAX declares a method named startDocument(). This is an event-handler method.

The programmer's responsibility

It is the responsibility of the application programmer to override this method to provide the desired behavior when the parser begins parsing a document.

The parser's responsibility

It is the responsibility of the parser software to invoke this overridden method when the parser begins parsing a document in order to cause the behavior of the overridden method to manifest itself.

It is also the responsibility of the parser software to provide a default version of this method that will be invoked when the application programmer chooses not to override it.

Other capabilities of IBM's XML4J

In addition to supporting SAX and DOM, the IBM parser also provides a number of other capabilities.

According to IBM:

The rich generating and validating capabilities allow the XML4J Parser to be used for:

  • Building XML-savvy Web servers
  • The next generation of vertical applications which will use XML as their data format.
  • On-the-fly validation for creating XML editors
  • Ensuring the integrity of e-business data expressed in XML
  • Building truly internationalized XML applications.

As of 5/7/99, a FAQ is available inside the XML4J download package that should answer many of the questions that you may have about the parser.

The XML File

A book of poems

As you can see from the following listing, the XML file used with this sample program represents the rudimentary aspects of a book of poems containing two well-known poems. The XML markup for the first poem is correct from a syntax viewpoint.

A syntax error was purposely introduced into the second poem to illustrate the error- handling capability of SAX and the IBM parser. The error is highlighted in bold. The highlighted element is missing its end tag (</line>).

<?xml version="1.0"?>

<bookOfPoems>

  <poem PoemNumber="1" DummyAttribute="dummy value">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>

  <poem PoemNumber="2" DummyAttribute="dummy value">
    <line>Twas the night before Christmas,</line>
    <line>And all through the house,
    <line>Not a creature was stirring,</line>
    <line>Not even a mouse.</line>
  </poem>

</bookOfPoems>

The manner in which the sample program processes this XML file is described in the next section.

Sample Program

Handling parser events and errors

This program uses the IBM parser along with the XML file from the previous section to illustrate the trapping and handling of parser events along with customized error handling.

Purpose of the program

This program processes an XML file containing two poems. The purpose of the processing is to display the elements, the attributes, and the text of the poems.

The first poem has the correct syntax. The second poem is missing an end tag midway through the poem.

Processing results

The first poem is parsed and displayed successfully along with the element names and attribute values.

The second poem is also displayed but a fatal error occurs at the point where the parser is able to determine that the end tag is missing. Note, however, that this determination is not made until several lines beyond the actual missing tag.

A non-validating parser was used

This delay in detecting the problem results from the fact that no DTD was provided and a non-validating parser was used. Therefore, the parser initially believes that the appearance of a start tag ahead of an expected end tag indicates a nesting condition. It isn't until the parser is later able to determine that this is not a valid nesting condition that it is able to determine that there is a missing end tag.

Presumably, if there had been a DTD specifying that <line> tags may not be nested inside of <line> tags, a validating-parser would have recognized the error as soon as it occurred. If I have the time, I will try to demonstrate this is a subsequent lesson.

Program output

The program was tested using JDK 1.2 under Win95 using the XML4J version 2.0 parser from IBM.

The output from the program is as shown below. Note that the error messages were sent to System.out instead of System.err so that they could be captured and reproduced here. Note also that some line breaks were manually inserted to force the material to fit in this format. I have highlighted the place in the output where the error actually occurred. I have also deleted blank lines to reduce the overall size of the listing.

Start Document
Start element: bookOfPoems

Start element: poem
Attribute: PoemNumber, Value = 1, Type = CDATA
Attribute: DummyAttribute, Value = dummy value, 
           Type = CDATA

Start element: line
Roses are red,
End element: line

Start element: line
Violets are blue.
End element: line

Start element: line
Sugar is sweet,
End element: line

Start element: line
and so are you.
End element: line

End element: poem


Start element: poem
Attribute: PoemNumber, Value = 2, Type = CDATA
Attribute: DummyAttribute, Value = dummy value, 
           Type = CDATA

Start element: line
Twas the night before Christmas,
End element: line

Start element: line
And all through the house,

Start element: line
Not a creature was stirring,
End element: line

Start element: line
Not even a mouse.
End element: line

systemID: file:/G:/Baldwin/AA-School/JavaProg/Combined
    /Java/Sax01.xml
[Fatal Error] Sax01.xml:17:7: "</line>" expected.
Terminating

The next section discusses some of the interesting code fragments that make up this program. A complete listing of the program is provided near the end of the lesson.

Interesting Code Fragments

Some required import directives

The first fragment shows import directives simply to illustrate that the program imports packages that are part of the IBM parser library and are not part of the standard Java API.

import org.xml.sax.*;
import org.xml.sax.helpers.ParserFactory;

Identifying the parser package

The next fragment shows the controlling class and the beginning of the main method.

The class begins by defining a String that identifies the class from which the parser will be instantiated. The particular string used here identifies the IBM parser. As mentioned earlier, I believe that this is the only statement that would need to be modified in order to use this program with a SAX based parser from a different vendor.

class Sax01 {
  static final String parserClass = 
                           "com.ibm.xml.parsers.SAXParser";

  public static void main (String args[])throws Exception{
    Parser parser = ParserFactory.makeParser(parserClass);

A SAX factory method

The first statement inside the main method shown above uses a SAX factory method along with the identification of the parser vender to create an object of type Parser. This is actually an object of type Interface org.xml.sax.Parser.

All SAX parsers must implement this interface. It allows applications to register handlers for different types of events and to initiate a parse from a URI, or a character stream.

A short side trip

Completely as an aside, in case you, like many others, are having difficulty separating URI, URL, URN, and URC in your mind, here is a quote from a W3C document that explains the differences in the terms.

URI -- Uniform Resource Identifier. The generic set of 
all names/addresses that are short strings that refer 
to resources. (specified 1994; ratified as Internet 
Draft Standard 1998) 

URL -- Uniform Resource Locator. The set of URI schemes
that have explicit instructions on how to access the
resource on the internet. Full definition is given in 
the URL specification. 

URN -- Uniform Resource Name.
1.An URI that has an institutional commitment to
persistence, availability, etc. Note that this sort of
URI may also be a URL. See, for example, PURLs.

2.A particular scheme which is currently (1991,2,3,4,5,6,7)
under development in the IETF (see discussion forums
below), which should provide for the resolution using
internet protocols of names which have a greater
persistence than that currently associated with internet
host names or organizations. When defined, a URN(2) will
be an example of a URI.

URC -- Uniform Resource Citation, or Uniform Resource
Characteristics. A set of attribute/value pairs 
describing a resource. Some of the values may be URIs 
of various kinds. Others may include, for example,
authorship, publisher, datatype, date, copyright status
and shoe size. Not normally discussed as a short string,
but a set of fields and values with some defined free
formatting.

Now back to the main road

Here is what the documentation has to say about the class named org.xml.sax.helpers.ParserFactory.

Java-specific class for dynamically loading SAX parsers.

This class is not part of the platform-independent definition of SAX; it is an additional convenience class designed specifically for Java XML application writers. SAX applications can use the static methods in this class to allocate a SAX parser dynamically at run-time based either on the value of the `org.xml.sax.parser' system property or on a string containing the class name.

Here is what Clifford J. Berg, author of advanced JAVA Development for Enterprise Applications has to say about factory methods in general:

A class you have defined that has a method createInstance() -- or any method -- that has the function of creating an instance based on runtime or configuration criteria such as property settings..

The bottom line on makeParser()

The bottom line is that the makeParser() method of the ParserFactory class creates an instance (object) of a class that implements the Parser interface.

The object is based on a String that specifies the class libraries provided by the vendor of the SAX based parser software.

This parser object can then be used to perform the routine processing of the XML file, generating a series of document events and potentially error events based on the information in the file.

A DocumentHandler object

The next fragment instantiates an object of the DocumentHandler type to handle events and errors. Note that DocumentHandler is an interface and is not a class.

I will explain how this object performs its work in conjunction with a discussion of the EventHandler class later.

    DocumentHandler handler = new EventHandler();

Just to confuse you

The two statements in the next fragment can be confusing to persons who have become used to Java Beans design patterns. Generally the design patterns indicate:

However, the two statements in the next fragment invoke methods that begin with the word set to register two different listeners on the Parser object.

One of those handlers listens for document events such as the start or end of an element. The other handler listens for events caused by errors in the XML data.

Different interfaces for events and errors

Document event methods and error event methods are declared in two different interfaces. The handler object instantiated above is of the type EventHandler. A superclass of that class implements both interfaces making it possible for an object of that type to listen for both types of events. However, it does give rise to the requirement to cast the handler object to type ErrorHandler before registering it on the parser object.

    parser.setDocumentHandler(handler);
    parser.setErrorHandler((ErrorHandler)handler);

Generating events

The single executable statement in the next fragment is what makes it all happen. This statement executes the parse() method on the object of type Parser to make a pass through the XML document specified by the parameter.

While making the pass through the document, this method generates a variety of document events and error events as the various tags, attributes, and data values in that document are encountered.

Handling events

This, in turn, causes event and error handling methods overridden by the application programmer to be executed, providing the functional behavior of the program.

This statement ends the main() method and also ends the controlling class.

    parser.parse("Sax01.xml");

The EventHandler class

The next fragment begins the definition of the class containing overridden methods for handling document events and error events.

This class extends the class named HandlerBase. The class named HandlerBase, which is the default base class for handlers, implements the default behavior for four different SAX interfaces:

The first two of these interfaces are of interest to us in this lesson. We will pursue the other two interfaces in subsequent lessons.

Use of HandlerBase is optional

The use of the HandlerBase class is optional. Application writers can extend this class when they need to implement only part of an interface.

Parser writers can instantiate this class to provide default handlers when the application has not supplied its own.

Overriding methods to provide functionality

The following EventHandler class overrides the event handling methods of the DocumentHandler interface and the ErrorHandler interface to provide the desired functionality for the program.

The next fragment shows the beginning of the class along with the first two overridden event-handling methods.

Start and end document events

The Parser object invokes these two overridden methods when the parse process encounters the beginning and the end of the XML document.

The default versions of these two methods return quietly doing nothing. Application writers can override the startDocument() method to take specific actions at the beginning of a document (such as creating an output file).

Similarly the application writer can override endDocument() to take specific action at the end of a document (such as closing a file).

Note that these methods don't receive any parameters.

In this sample program, these overridden methods simply announce the beginning and the end of the document.

class EventHandler extends HandlerBase{

  public void startDocument(){//handle startDocument event
    System.out.println("Start Document");
  }//end startDocument()
    
  public void endDocument(){//handle endDocument event
    System.out.println("End Document");
  }//end endDocument()

The start element event

The next overridden handler method is more complicated than most in this lesson. This method is invoked at the start of every element. For review, the beginning of an element might look like this in an XML document:

<poem PoemNumber="1" DummyAttribute="dummy value">

The boldface portions are commonly referred to as attributes. An element can contain none, one, or more attributes.

In this case, the element named poem contains two attributes named PoemNumber and DummyAttribute (the name of the attribute is unrelated to the name of the element).

Each attribute also has a value, which is enclosed in double quotation marks. In this case, the values for the two attributes are 1 and dummy value.

The startElement() event handler method

The event handler method that gets called when the parser encounters a new element is startElement(), as shown in the next fragment.

This method receives two parameters. The first parameter is a String containing the name of the element. The second parameter is a reference to an object of type AttributeList containing information about the attributes.

The code in the following fragment iterates through the AttributeList object, extract and displaying information about each of the attributes described by that object.

  public void startElement(String name,AttributeList atts){
    System.out.println("Start element: " + name);
    if (atts != null) {
      int len = atts.getLength();
      //process all attributes
      for (int i = 0; i < len; i++) {
        String attName = atts.getName(i);
        String type = atts.getType(i);
        String value = atts.getValue(i);
        System.out.println("Attribute: " + attName 
              + ", Value = " + value + ", Type = " + type);
      }//end for loop on attributes
    }//end if
  }//end startElement()

The AttributeList interface

AttributeList is an interface that is implemented by the parser vendor.

An AttributeList object includes only attributes that have been specified or defaulted: #IMPLIED attributes are not included.

Here is what an earlier lesson had to say about attributes of the #IMPLIED type:

The XML document may provide a value for the attribute but is not required to do so. In this case, if no value is provided, an application-dependent value will be used. For example, for an IMPLIED attribute named backgroundColor, an XML processor might accept a value if provided in the XML document, and might cause the background color to be green if an attribute value is not provided. A different XML processor might cause the same default background color to be red. That is what I mean by "application-dependent value."

Getting information from the AttributeList

There are two ways for the application to obtain information from the AttributeList. First, it can iterate through the entire list as in the above fragment. Alternatively, the application can request the value or type of specific attributes as in the following code that is not used in this sample program:

public void startElement (String name, AttributeList atts){
  String identifier = atts.getValue("id");
  String label = atts.getValue("label");
   [...]
}

A portion of the program output

The output produced for the first element and the attributes of that element for each of the poems in this lesson is shown in the following box. Note that line breaks and spaces were manually inserted to format the information for cosmetic purposes.

Start element: poem
Attribute: PoemNumber,Value = 1,Type = CDATA
Attribute: DummyAttribute,Value = dummy value,Type = CDATA
...

Start element: poem
Attribute: PoemNumber,Value = 2,Type = CDATA
Attribute: DummyAttribute,Value = dummy value,Type = CDATA

What is type CDATA?

The name and value of the attribute is pretty obvious, but what about the type CDATA? Here is what a previous lesson had to say about this, although I doubt that this will mean much when taken out of context:

CDATA means that the value of this attribute may be any string of characters (as well as an empty string) and should be ignored by the parser. CDATA is used in situations where it is impossible to force more strict limitations on the attribute value with one of the following keywords...

The earlier lesson goes on to explain that there are three allowable types for an attribute:

I'm going to drop this discussion at this point. If you would like to pursue it further, go back and read the earlier lesson that contains detailed information about DTDs.

The endElement() handler method is much simpler

Because it doesn't need to deal with attributes, the overridden endElement() event handler is much simpler. This method is invoked when the parser encounters an end tag for an element.

This method receives a single parameter that is the name of the element. This overridden version simply announces that the event has occurred and displays the name of the element.

  public void endElement (String name){
    System.out.println("End element: " + name);
  }//end endElement()

The content of an element

The content of an XML element is the text that appears between the beginning and ending tags. The next fragment shows the event handler that is invoked by the parser when the parser encounters content. The name of the content handler method is characters().

Here is what the documentation has to say about this method:

public void characters(char[] ch,
                         int start,
                         int length)
                  throws SAXException

Receive notification of character data. 

The Parser will call this method to report each chunk of
character data. SAX parsers may return all contiguous
character data in a single chunk, or they may split it
into several chunks; however, all of the characters in 
any single event must come from the same external entity,
so that the Locator provides useful information.

The application must not attempt to read from the array
outside of the specified range.

Note that some parsers will report whitespace using the
ignorableWhitespace() method rather than this one
(validating parsers must do so).

Parameters:
     ch - The characters from the XML document.
     start - The start position in the array.
     length - The number of characters to read from
       the array.

The characters() method in a nutshell

In a nutshell, this method receives a character array containing the content of an element. The overridden version of the method in this sample program simply converts the array to a String object and displays it.

  public void characters(char[] ch,int start,int length){
    System.out.println(new String(ch, start, length));
  }//end characters()

This overridden method produced the boldface lines in the following output for the first poem.

Start element: line
Roses are red,
End element: line

Start element: line
Violets are blue.
End element: line

Start element: line
Sugar is sweet,
End element: line

Start element: line
and so are you.
End element: line

Additional event handler methods

That completes my discussion of overridden methods of the DocumentHandler interface. The above examples have shown all of the methods of this interface except for the following:

I will leave it as an exercise for the student to investigate the first two methods in this list. The third method will be used later in this sample program.

The ErrorHandler interface

That brings us to the methods that are declared in the interface named ErrorHandler. This interface, which declares three different handler methods, is the Basic interface for SAX error handlers.

A SAX application that needs to implement customized error handling, must implement this interface. Then it must register an object of the interface type with the SAX parser using the parser's setErrorHandler() method. The parser will then report all errors and warnings through this interface.

Avoiding exceptions

When the handler object is registered on the parser, the parser will use this interface instead of throwing an exception. It is then up to the application to decide what to do about the problem, including whether to throw an exception for different types of errors and warnings.

Note that there is no requirement for the parser to continue to provide useful information after a call to the fatalError() method.

A default error handling implementation

The HandlerBase class provides a default implementation of this interface, ignoring warnings and recoverable errors and throwing a SAXParseException for fatal errors. An application can extend that class, as was done in this sample program, rather than to implement the complete interface itself.

Overridden error handler methods

The overridden versions of all three of the error handler methods are shown in the next fragment. All three of the methods make a call to the method named getLocationString() to determine the location of the problem in the XML document and display that information along with the nature of the message.

In addition, the FatalError() method terminates the program after displaying a termination message.

  public void warning(SAXParseException ex){
    System.out.println("[Warning] " +
              getLocationString(ex)+": "+ ex.getMessage());
  }//end warning()
  //-----------------------------------------------------//

  public void error(SAXParseException ex) {
    System.out.println("[Error] "+
              getLocationString(ex)+": "+ ex.getMessage());
  }//end error()
  //-----------------------------------------------------//

  public void fatalError(SAXParseException ex)
                                      throws SAXException {
    System.out.println("[Fatal Error] "+
              getLocationString(ex)+": "+ ex.getMessage());
    System.out.println("Terminating");
    System.exit(1);
  }//end fatalError()

The getLocationString() method

The next fragment shows the beginning of a private utility method named getLocationString().

This method is called by each of the error handling methods to determine the location in the XML file where the error was detected by the parser

Constructing an information String

The method declares a StringBuffer object that is later used to construct a String containing the desired information to return to the calling method

private String getLocationString(SAXParseException ex){
    StringBuffer str = new StringBuffer();

Getting the name of the XML file

The first task undertaken by this method is to determine the name of the XML file being processed when the error occurred. This information, and other useful information as well, is contained in the SAXParseException object received by the error handler and passed on to this method as a parameter.

Some methods of the SAXParseException class

The following methods of the SAXParseException class are of interest in this lesson:

Error message output

I'm going to begin the discussion by showing you the output produced on my computer by purposely omitting an end tag from one of the lines:

systemID: file:/G:/Baldwin/AA-School/JavaProg/Combined
    /Java/Sax01.xml
[Fatal Error] Sax01.xml:17:7: "</line>" expected.
Terminating

The beginning portions of the code that produced this output are shown below. However, the complete output was produced by a combination of this method and the FatalError() method shown earlier. The portion produced by this method is shown in boldface above. The portion of the output that is not in boldface was produced by the FatalError() method using the String object returned by this method.

getSystemId() returns a URL

As you can see, the String that was returned by the getSystemID() method is the URL for the XML file on the local drive (G:).

Although there is quite a bit of code involved, all that it does is extract the filename from the end of the URL and append it at the beginning of the StringBuffer object being constructed for return to the calling method.

    String systemId = ex.getSystemId();
      if(systemId != null){
        System.out.println("systemID: " + systemId);
        //get file name from end of systemID
        int index = systemId.lastIndexOf('/');
        if(index != -1){
          systemId = systemId.substring(index + 1);
        }//end if(index..
        str.append(systemId);
      }//end if(systemID...

Getting line and column numbers

The next fragment completes the construction of the StringBuffer object by getting the line and column number of the location of the problem in the XML file using the two methods described earlier.

This information is appended onto the StringBuffer object with some colons added for cosmetic purposes.

Returning the String

Then the StringBuffer object is converted to a String object and returned to the calling error handler method where it is displayed on the screen.

      str.append(':');
      str.append(ex.getLineNumber());
      str.append(':');
      str.append(ex.getColumnNumber());

      return str.toString();

    }//end getLocationString()

Once again: What is SAX?

So, there you have it; the answer to the burning question: What is SAX? Now you know what SAX is, and why it is important to Java programmers writing applications to process XML documents.

Subsequent lessons will provide more useful illustrations of the capability provided by SAX.

Subsequent lessons will also provide a similar treatment for DOM.

Program Listing

A complete listing of the program and the XML file is contained in this section.

<?xml version="1.0"?>

<bookOfPoems>

  <poem PoemNumber="1" DummyAttribute="dummy value">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>

  <poem PoemNumber="2" DummyAttribute="dummy value">
    <line>Twas the night before Christmas,</line>
    <line>And all through the house,
    <line>Not a creature was stirring,</line>
    <line>Not even a mouse.</line>
  </poem>

</bookOfPoems>

.

/*File Sax01.java
Illustrates parser events and customized error handling.

An XML file contains two poems.  The first has the correct
syntax. The second is missing an end tag midway through the
poem.

The first poem is parsed and displayed successfully along 
with the element names and attribute values.

The second poem is also displayed but a fatal error occurs
at the point where the parser is able to determine that the
end tag is missing.  Note, however, that this determination
is not made until several lines beyond the actual missing
tag.

The program was tested using JDK 1.2 under Win95.

Output from the program is as shown below.  Note that the
error messages were sent to System.out instead of 
System.err so that they could be captured and reproduced
here.  Note also that some line breaks were manually
inserted to force the material to fit in this format.

Note that the end tag was missing following the line in
the poem that reads "And all through the house,"
  
Start Document
Start element: bookOfPoems



Start element: poem
Attribute: PoemNumber, Value = 1, Type = CDATA
Attribute: DummyAttribute, Value = dummy value, 
           Type = CDATA

  
Start element: line
Roses are red,
End element: line

  
Start element: line
Violets are blue.
End element: line

  
Start element: line
Sugar is sweet,
End element: line

  
Start element: line
and so are you.
End element: line


End element: poem



Start element: poem
Attribute: PoemNumber, Value = 2, Type = CDATA
Attribute: DummyAttribute, Value = dummy value, 
           Type = CDATA

  
Start element: line
Twas the night before Christmas,
End element: line

  
Start element: line
And all through the house,
  
Start element: line
Not a creature was stirring,
End element: line

  
Start element: line
Not even a mouse.
End element: line


systemID: file:/G:/Baldwin/AA-School/JavaProg/Combined
    /Java/Sax01.xml
[Fatal Error] Sax01.xml:17:7: "</line>" expected.
Terminating
**********************************************************/
import org.xml.sax.*;
import org.xml.sax.helpers.ParserFactory;

class Sax01 {
  static final String parserClass = 

                           "com.ibm.xml.parsers.SAXParser";

  public static void main (String args[])throws Exception{
    Parser parser = ParserFactory.makeParser(parserClass);
    //Instantiate an event and error handler
    DocumentHandler handler = new EventHandler();
    //Register the event handler and the error handler
    parser.setDocumentHandler(handler);
    parser.setErrorHandler((ErrorHandler)handler);
    //Parse the document to create the events.
    parser.parse("Sax01.xml");
  }//end main
}//end class Sax01
//=======================================================//

//Methods of this class are listeners for document events
// and error events.  Note that HandlerBase implements
// the ErrorHandler interface.
class EventHandler extends HandlerBase{
  public void startDocument(){//handle startDocument event
    System.out.println("Start Document");
  }//end startDocument()
    
  public void endDocument(){//handle endDocument event
    System.out.println("End Document");
  }//end endDocument()

  //handle startElement event displaying attributes
  public void startElement(String name,AttributeList atts){
    System.out.println("Start element: " + name);
    if (atts != null) {
      int len = atts.getLength();
      //process all attributes
      for (int i = 0; i < len; i++) {
        String attName = atts.getName(i);
        String type = atts.getType(i);
        String value = atts.getValue(i);
        System.out.println("Attribute: " + attName 
              + ", Value = " + value + ", Type = " + type);
      }//end for loop on attributes
    }//end if
  }//end start element

  //handle endElememt event
  public void endElement (String name){
    System.out.println("End element: " + name);
  }//end endElement()
      
  //handle characters event
  public void characters(char[] ch,int start,int length){
    System.out.println(new String(ch, start, length));
  }//end characters()
    
  //Begin error handlers here.  These methods are declared
  // in the ErrorHandler interface that is implemented by
  // the HandlerBase class and extended by this class.
  
  //Handle a warning
  public void warning(SAXParseException ex){
    System.out.println("[Warning] " +
              getLocationString(ex)+": "+ ex.getMessage());
  }//end warning()

  //handle an error
  public void error(SAXParseException ex) {
    System.out.println("[Error] "+
              getLocationString(ex)+": "+ ex.getMessage());
  }//end error()

  //handle a fatal error
  public void fatalError(SAXParseException ex)
                                      throws SAXException {
    System.out.println("[Fatal Error] "+
              getLocationString(ex)+": "+ ex.getMessage());
    System.out.println("Terminating");
    System.exit(1);
  }//end fatalError()
  
  //Private method called by error handlers to return
  // information regarding the point in the document where
  // the error was detected by the parser.  
  private String getLocationString(SAXParseException ex){
    StringBuffer str = new StringBuffer();

    //get SystemId, display it, and use it to get the 
    // name of the file being parsed
    String systemId = ex.getSystemId();
      if(systemId != null){
        System.out.println("systemID: " + systemId);
        //get file name from end of systemID
        int index = systemId.lastIndexOf('/');
        if(index != -1){
          systemId = systemId.substring(index + 1);
        }//end if(index..
        str.append(systemId);
      }//end if(systemID...
      //now get and append location information
      str.append(':');
      str.append(ex.getLineNumber());
      str.append(':');
      str.append(ex.getColumnNumber());

      return str.toString();

    }//end getLocationString()

}//end class EventHandler
//=======================================================//

-end-