Richard G Baldwin (512) 223-4758, baldwin@austin.cc.tx.us, http://www2.austin.cc.tx.us/baldwin/

XML Document Processing using the Document Object Model (DOM)

Java Programming, Lecture Notes # 829, Revised 6/12/99.


Preface

Students in Prof. Baldwin's Advanced Java Programming classes at ACC will be responsible for knowing and understanding all of the material in this lesson beginning with the summer semester of 1999.

This lesson was originally written on June 11, 1999 and has been updated several times since then.

Introduction

You know about SAX

Previous lessons have explained SAX and have showed you how to use SAX for manipulating the data in XML files.

DOM, an alternate approach

This lesson illustrates another approach to manipulating XML files: the Document Object Model (DOM).

The sample program in this lesson parses an XML file and extracts the data contained in the file into an object of type Document. Many methods are available to manipulate the information in a Document object. However, since this program is intended to be as simple as possible, while illustrating the essentials of DOM, manipulation of the data is not performed in this program.

The data in the Document object is then used to produce an output XML file named junk.xml. Since no changes were made to the data while it was in the Document object, the output XML file should replicate the input XML file.

A more general approach

The manner in which the output XML file is produced in this lesson is much more general than the methodology used in a previous lesson on SAX. In that lesson, the methodology was tied to a specific XML format. The methodology used in this lesson is generally independent of the format of the XML file.

If you are familiar with the processing of trees in general and recursion in particular, you should find the material in this lesson to be very straightforward. Otherwise, you may find it to be quite difficult, and it may be necessary for you to do some prior study before tackling this lesson.

The XML File

A book of poems

The XML file used with this sample program represents the rudimentary aspects of a book of poems. The first listing below shows a schematic of the element structure of the XML file along with the attributes of the elements. The second listing shows the actual XML file that was used to test the program, including the content of each of the elements.

<?xml version="1.0"?>
<bookOfPoems>
  <poem PoemNumber="1" DumAtr="dum val">
    <line>...</line>
    <line>...</line>
    <line>...</line>
    <line>...</line>
  </poem>
  <poem PoemNumber="2" DumAtr="dum val">
  ...
  </poem>
</bookOfPoems>

XML file structure

The XML file uses the following types of elements as highlighted above:

The bookOfPoems element

The XML document contains a single bookOfPoems element.

The bookOfPoems element contains one or more poem elements.

The poem elements

The poem elements have an attribute that specifies the poem number associated with each poem element. There is also a dummy attribute that has no real significance other than to illustrate that an element can have none, one, or more attributes.

Each poem element contains an arbitrary number of line elements.

The line elements

The line elements contain the actual text of the poem as the element content.

The XML file

A listing of the actual XML file used to test the sample program in this lesson is shown below.

As you can see, the bookOfPoems element contains two poem elements, each of which contains four line elements. The content of each of the line elements is highlighted in boldface. Although this is a relatively simple XML file, it is sufficient to illustrate the concepts of this lesson.

<?xml version="1.0"?>
<bookOfPoems>
  <poem PoemNumber="1" DumAtr="dum val">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>
  <poem PoemNumber="2" DumAtr="dum val">
    <line>Roses are pink,</line>
    <line>Dandelions are yellow,</line>
    <line>If you like Java,</line>
    <line>You are a good fellow.</line>
  </poem>
</bookOfPoems>

The manner in which the sample program processes this XML file is described in the following sections.

Sample Program

The parser

This program uses the IBM parser (XML4J) along with the XML file from the previous section.

Purpose of the program

This program, consists of the following source files:

Complete Listings of all three files are provided near the end of the lesson.

Taken together, the files illustrate the parsing an XML file into an object of type Document, and the ability to use an object of type Document to write an output XML file.

Miscellaneous comments

No particular effort was expended to make this program robust. In particular, if it encounters an XML file with a format error, it may throw an exception, or it may continue to run, but it probably will not work properly.

The program was tested using JDK 1.2 under Win95. It also requires the IBM XML4J (or some suitable substitute) parser.

The program was tested with the XML file named Dom01.xml listed earlier

Processing results

Print statements were included in the writeNode() method of the Dom01Writer class to help in understanding the recursive process being used. As a result, the program produced the following output on the screen.

As you can see, a message is displayed each time control enters or leaves an instance (invocation) of the writeNode() method. Each time the method is entered, the name and value of the node being processed is displayed. Each time the method is exited, the name of the node being processed is displayed.

I will have more to say about this output later when discussing the recursive process.

Enter writeNode for node:#document-null-
Enter writeNode for node:bookOfPoems-null-
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Enter writeNode for node:poem-null-
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Roses are red,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Violets are blue.-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Sugar is sweet,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-and so are you.-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Exit writeNode for node:poem
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Enter writeNode for node:poem-null-
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Roses are pink,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Dandelions are yellow,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-If you like Java,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-You are a good fellow.-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Exit writeNode for node:poem
Enter writeNode for node:#text-

Exit writeNode for node:#text
Exit writeNode for node:bookOfPoems
Exit writeNode for node:#document

Program output

The program produced an output file named junk.xml containing the following. Since the program did not modify the data after parsing the original XML file, this output should be a replica of the original XML file. The most obvious difference is that the attributes for each element have been sorted by name in the output. This sorting happened automatically with no effort on my part.

<?xml version="1.0"?>
<bookOfPoems>
  <poem DumAtr="dum val" PoemNumber="1">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>
  <poem DumAtr="dum val" PoemNumber="2">
    <line>Roses are pink,</line>
    <line>Dandelions are yellow,</line>
    <line>If you like Java,</line>
    <line>You are a good fellow.</line>
  </poem>
</bookOfPoems>

 

Interesting Code Fragments

The entire program consists of a driver file named Dom01.java and several helper files as listed earlier. I'm going to begin with the driver file since it is the simplest.

Dom01.java

I will discuss the driver file named Dom01.java by breaking it into fragments. A complete listing is provided near the end of the lesson.

This is a driver program that illustrates how to use a DOM parser, and how to traverse a Document object created by the DOM parser.

The program requires access to the following class files:

Dom01Parser

The program instantiates a DOM parser object of the class named Dom01Parser. The parser object is based on the IBM XML4J parser operating in its non-validating mode.

The program uses the parse() method of the parser object to parse an XML file specified on the command line.

The parse() method returns an object of type Document that represents the parsed XML file.

The program passes the Document object to a method named writeXmlFile() on an object of a class named Dom01Writer.

Dom01Writer

The purpose of the Dom01Writer method is to write an XML file corresponding to the information contained in the Document object. A program intended to manipulate the data in the XML file would typically manipulate the data in the Document object before passing the object to writeXmlFile().

The writeXmlFile() method also generates output on the screen designed to help the student understand the recursive process involved in processing the DOM tree to write the file.

In this case, the XML file is written to a file named junk.xml, but the name of the output file could easily be made into an input parameter.

The program was tested using JDK 1.2 and the IBM XML4J parser.

Required import directive

The first fragment shows a required import directive simply to alert you to the fact that it is necessary to import this package when using the DOM features of XML4J.

The fragment also shows the beginning of the class definition along with a String constant that specifies the name of the output XML file.

import org.w3c.dom.*;

public class Dom01 {
  private static final String xmlFile = "junk.xml";

Test for proper command-line input

The next fragment shows the beginning of the main() method including a test to confirm that the user has provided valid input information on the command line. The input requirement is to provide the URI where the input XML file can be located.

  public static void main(String argv[]) {
    if (argv.length != 1) {
      System.err.println("usage: java Dom01 uri");
      System.exit(1);
    }//end if

Get a parser object

The next fragment gets a parser object. You might want to note that this is considerably less complex than the code required to get a parser object in the previous lessons on SAX that required the use of a factory method, etc.

    Dom01Parser parser = new Dom01Parser();    

Parse the file

The next fragment invokes the parse() method on the parser object to parse the input XML file and return a Document object that represents the contents of the XML file.

As mentioned earlier, if you were going to manipulate the content of the XML document, this is where you would insert the code to do that. You wouldn't manipulate the XML document directly. Rather, you would manipulate the Document object that represents the contents of the XML file.

    try{
      Document document = parser.parse(argv[0]);

Write the XML file

The next fragment instantiates an anonymous object of type Dom01Writer and invokes the writeXmlFile() method on that object. The name of the output XML file is passed to the constructor for the object. A reference to the Document object is passed to the method.

      new Dom01Writer(xmlFile).writeXmlFile(document);

Cleanup code

The remaining code in this class is not germane to what I am illustrating here, so I haven't included it as a fragment. You can view it in the complete listing of the program near the end of the lesson.

The Dom01Parser class

This class is defined in a file named Dom01Parser.java. This is a wrapper class for the IBM XML4J DOM parser operating in a non-validating mode.

An object of this class functions as a DOM parser with SAX error handling capability.

The parse() method of this class returns a Document object representing the parsed XML file.

Required import directives

The next fragment shows the beginning of the class definition along with several required import directives.

Note that the class implements the ErrorHandler interface in order to provide SAX error handling. SAX error handling has been discussed in earlier lessons.

import org.w3c.dom.Document;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class Dom01Parser implements ErrorHandler{

Create the actual parser object

The single statement in the next fragment creates a non-validating parser using the IBM XML4J class library. Note that this statement is not included in a method. Therefore, it is a declaration of an initialized instance variable of the class.

  com.ibm.xml.parsers.NonValidatingDOMParser parser = 
    new com.ibm.xml.parsers.NonValidatingDOMParser();

Constructor

The constructor for the object is shown in the next fragment. All it does is to cause the object to be its own error handler. This requires that the class define the methods declared in the ErrorHandler interface.

  public Dom01Parser() {//constructor
    parser.setErrorHandler(this);
  }//end constructor

The parse() method

The method defined in the next fragment invokes the parse() method on the parser object to parse the XML file specified as a parameter.

Then it gets and returns the object of type Document created by parsing the input file.

  public Document parse(String uri) throws Exception {
    parser.parse(uri);
    return parser.getDocument();
  }//end parse()

Error handling

There is quite a bit more code defined in this class to override the error handling methods of the ErrorHandler interface. However, that code is essentially the same as code discussed in previous lessons. Therefore, I won't discuss it further here. You can view that code in the complete listing of the program near the end of this lesson.

Producing the output XML file

We're now about two-thirds of the way through our discussion of the sample program. There is one major area left to cover -- the creation of an XML file based on the contents of the Document object.

The Dom01Writer class

This is accomplished by the class named Dom01Writer that is defined in the file named Dom01Writer.java. The purpose of this class is to provide a utility capability to convert the data stored in the Document object to XML format and to write that data into a new XML file.

This class provides a utility method named writeXmlFile() that receives a DOM Document object as a parameter and writes an output XML file that matches the data contained in the Document object.

The method also produces screen output designed to help in understanding the recursive process.

The name of the output XML file is established as a parameter to the constructor of the class.

The constructor

The following fragment shows the beginning of the class along with some necessary import directives and the constructor for the class.

The constructor simply uses standard stream I/O classes to get a reference to a PrintWriter object linked to the required output file.

import java.io.PrintWriter;
import java.io.FileOutputStream;

import org.w3c.dom.*;

public class Dom01Writer {
  private PrintWriter out;

  //-----------------------------------------------------//

  public Dom01Writer(String xmlFile) {//constructor
    try {
      out = new PrintWriter(new FileOutputStream(xmlFile));
    }catch (Exception e) {
      e.printStackTrace(System.err);
    }//end catch
  }//end constructor

The writeXmlFile() method

This method converts an incoming Document object to an output XML file. This is accomplished by passing a reference to the Document object to a recursive utility method named writeNode(). The writeNode() method expects to receive a reference to an object of type Node as an input parameter. It manipulates the data in that object to format and write the output XML file.

The Document object can be treated as a Node object because the Document interface extends the Node interface.

  public void writeXmlFile(Document document) {
    try {
      writeNode(document);
    }catch (Exception e) {
      e.printStackTrace(System.err);
    }//end catch
  }//end writeXmlFile()

The writeNode() method

This method is fairly complex, particularly if you aren't accustomed to tree processing using recursion. The method is used to recursively convert Node data into XML format and to write the XML formatted data to the output file.

The next fragment shows the beginning of the method along with a test to confirm that the incoming parameter is a valid Node object. If not, the method simply displays a message and returns quietly.

  public void writeNode(Node node) {
    if (node == null) {
      System.err.println("Nothing to do, node is null");
      return;
    }//end if

Display node information

Because some of you may not be strong in the concept of recursion, I have provided a pair of print statements that display information on the screen which I hope will help you to track your way through the recursion process. One of these statements, as shown in the following fragment, is executed when control enters the methods. The other statement, which will be shown later, is executed when control exits the method.

This statement displays the name of the node along with the value of the node. In order to make it easier to recognize the situation where the value of the node is a series of newline and space characters, the value is purposely bracketed on each end by a dash ("-") character.

    System.out.println("Enter writeNode for node:" 
                            + node.getNodeName() + "-" 
                              + node.getNodeValue() + "-");

Earlier in the lesson, I provided a listing of the screen output for a particular XML file. I promised to discuss it further later.

The Document tree

To begin with, the Document object represents the XML file as a hierarchical tree structure consisting of nodes and leaves (very similar to the structure of files and directories on a typical hard drive). In that situation, a directory or folder would be considered a node and a file would be considered a leaf. In general, nodes can have children; leaves cannot have children.

Node types

Node objects also have a type, which I will exploit later in converting the node data to XML data. Here is a list of the node types as defined in the documentation for the Node class.

Although the general tree concept says that nodes can have children, some of the above nodes (such as TEXT_NODE) cannot have children according to XML format specifications.

The XML file

A small portion of the beginning of the XML file is reproduced below. I have shown just enough of the file to illustrate the parent-child structure of the file.

<?xml version="1.0"?>
<bookOfPoems>
  <poem PoemNumber="1" DumAtr="dum val">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    ...

The entire document is represented by a node of type DOCUMENT_NODE. The document has a child node of type ELEMENT_NODE named bookOfPoems.

In this fragment of the XML file, the bookOfPoems node has one child node of type ELEMENT_NODE named poem. (It has additional nodes in the entire XML file.)

In this fragment, the poem node has two child nodes of type ELEMENT_NODE, each named line.

The first line node has a child node of type TEXT_NODE having the value "Roses are red," and the second line node has a child node of type TEXT_NODE having the value "Violets are blue."

Not immediately obvious

What isn't immediately obvious is that several of the nodes also have child nodes of type TEXT_NODE whose value consists of the newline at the end of one line of text and the spaces that provide the indentation for the next line of text.

Blocks of text that appear between the beginning and ending tags of an element constitute the content of that element and are treated as nodes of type TEXT_NODE by the parser, even if that text was placed there strictly for cosmetic purposes (such as indentation).

The screen output

A small portion of the screen output is shown below. I have provided just enough of the output to illustrate the parent-child structure of the tree. This output matches the XML fragment shown above.

The most important aspect of this output, insofar as understanding the recursive nature of the process is concerned, is the sequence of Enter and Exit words. I have manually highlighted these words in the listing shown below to make it easier to separate the two words visually.

The key point

Once control enters an instance (invocation) of the writeNode() method to process a given node, it does not exit that instance of the method until that node and all of its child nodes have been processed.

A new instance of the method is created (figuratively speaking) and entered to process each child node (actually, the same method is re-entered).

The outermost structure

For example, control enters an instance of the method for node #document at the beginning of the processing sequence and doesn't exit that instance of the method until the end of the processing sequence.

The #text nodes

Since the #text nodes don't have children, we see that control enters and then immediately exits an instance of the method to process each of the nodes of the #text type. Some of the nodes of the #text type have a value that is a newline and some spaces. Recall that I purposely surrounded #text characters with ("-") characters to make them easier to recognize.

Enter writeNode for node:#document-null-
Enter writeNode for node:bookOfPoems-null-
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Enter writeNode for node:poem-null-
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Roses are red,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Violets are blue.-
Exit writeNode for node:#text
Exit writeNode for node:line
...
Exit writeNode for node:bookOfPoems
Exit writeNode for node:#document

Processing nodes by type

The method contains a switch statement to process each node based on its type.

The next fragment gets the type of the node being processed and then enters the switch statement.

Type DOCUMENT_NODE

The code in the switch statement for case DOCUMENT_NODE begins simply by writing a required line of text in the XML file that specifies the XML version, etc.

After that, it invokes the getDocumentElement() method on the node object to get the root element of the document. According to the documentation, "This is a convenience attribute that allows direct access to the child node that is the root element of the document."

(I'm wondering if they really meant method instead of attribute.)

The method returns a reference to an object of type Element, which is an interface that extends the Node interface.

For this sample case, this method will return a reference to an object that represents the bookOfPoems element.

The critical recursive method call

Then the code makes a recursive call to the same writeNode() method, passing the newly acquired node as a parameter.

Important: The statement that reads out.flush() and all of the remaining code in this method will not be executed until the recursive call to writeNode() returns.

    int type = node.getNodeType();
    
    switch (type) {
      //Process the Document node
      case Node.DOCUMENT_NODE: {
        out.print("<?xml version=\"1.0\"?>\n");
        writeNode( ((Document)node).getDocumentElement() );
        out.flush();
        break;
      }//end case Node.DOCUMENT_NODE

Type ELEMENT_NODE

The next fragment begins the processing for nodes of type ELEMENT_NODE. This is a fairly long and complex body of code, so I will subdivide it into fragments.

The next fragment is simple enough.

      case Node.ELEMENT_NODE: {
        out.print('<');//begin the start tag
        out.print(node.getNodeName());

Attribute processing

An element can have none, one, or more attributes. As indicated in the following taken from the XML4J documentation, they have a special character relative to the Document tree.

"The Attr interface represents an attribute in an Element. Typically the allowable values for the attribute are defined in a document type definition.

Attr objects inherit the Node interface, but since they are not actually child nodes of the element they describe, the DOM does not consider them part of the document tree. Thus, the Node attributes parentNode, previousSibling, and nextSibling have a null value for Attr objects. The DOM takes the view that attributes are properties of elements rather than having a separate identity from the elements they are associated with; this should make it more efficient to implement such features as default attributes associated with all elements of a given type. Furthermore, Attr nodes may not be immediate children of a DocumentFragment. However, they can be associated with Element nodes contained within a DocumentFragment. In short, users and implementors of the DOM need to be aware that Attr nodes have some things in common with other objects inheriting the Node interface, but they also are quite distinct."

The Attr interface extends the Node interface.

Get the attributes

The next fragment gets the attributes belonging to the element and writes them into the output file in the correct XML format.

The code begins by getting the attributes and storing them in an array of type Attr. Although the code to do this is rather convoluted, you should have no difficulty understanding how it works once you review the documentation for getAttributes() and review the code for getAttrArray() presented later in this lesson.

The getAttributes() method

getAttributes() returns a reference to an object of type NamedNodeMap containing the attributes of the node (if it is an Element) and null otherwise.

Objects implementing the NamedNodeMap interface are used to represent collections of nodes that can be accessed by name.

With one exception, the code to process the array, getting the name and value of each attribute and writing them into the output XML file is straightforward. All it really amounts to is creating the correct sequence of text, spaces, and punctuation characters.

The exception has to do with the call to the method named strToXML(). This method is used to replace extraneous angle brackets, ampersands, and quotation marks in the text with the corresponding XML entities. I have discussed this method in several previous lessons and will not repeat that discussion here.

        Attr attrs[] = getAttrArray(node.getAttributes());

        for (int i = 0; i < attrs.length; i++) {
          Attr attr = attrs[i];
          out.print(' ');//write a space
          out.print(attr.getNodeName());
          out.print("=\"");//write ="
          out.print(strToXML(attr.getNodeValue()));
          out.print('"');//write the closing quotation mark
        }//end for loop

        out.print('>');//write the end of the start tag

What about children?

At this point, I have to deal with the possibility that this node may have children, and must process them if they exist.

I begin by invoking the getChildNodes() method on the current node to get an object of type NodeList containing a collection of the children of this node.

The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or constraining how this collection is implemented.

Getting the nodes in the list

The items in the NodeList are accessible via an integral index, starting from 0. The NodeList method named item(int index) is used for this purpose.

The getChildNodes() method returns null if there are no children. If the method does not return null, then there are child nodes that need to be processed. The code to do this is straightforward:

        NodeList children = node.getChildNodes();

        if (children != null) {//there are nested elements
          int len = children.getLength();

          for (int i = 0; i < len; i++) {
            writeNode(children.item(i));
          }//end for loop

        }//end if
        break;
      }//end case Node.ELEMENT_NODE

That ends the processing for the case ELEMENT_NODE.

ENTITY_REFERENCE_NODE

The next fragment simply sandwiches the name of a node of type ENTITY_REFERENCE_NODE between an ampersand and a semicolon and writes the combination into the output file. This produces an entity reference in the output XML file.

In a nutshell, an entity reference is a reference to something that has been defined elsewhere. Since this lesson is not intended to teach entities, I will drop it at that. The sample XML file used to test this program didn't contain any entity references.

      case Node.ENTITY_REFERENCE_NODE: {
        out.print('&');
        out.print(node.getNodeName());
        out.print(';');
        break;
      }//end case Node.ENTITY_REFERENCE_NODE

Type TEXT_NODE

Earlier in this lesson, I discussed the inclusion of text inside an XML element. Without going into detail as to why, a block of text can be represented by either of two node types:

(Only one of those types appears in my XML test file.)

The processing is simple

      case Node.CDATA_SECTION_NODE: 
      case Node.TEXT_NODE: {
        out.print(strToXML(node.getNodeValue()));
        break;
      }//end case Node.TEXT_NODE

PROCESSING_INSTRUCTION_NODE

If you know about XML processing instructions, the code in the next fragment will be obvious to you. Otherwise, you will need to review the XML specifications regarding processing instructions.

This fragment contains the end of the switch statement being used to process nodes on the basis of their type.

      case Node.PROCESSING_INSTRUCTION_NODE: {
        out.print("<?");
        out.print(node.getNodeName());
        String data = node.getNodeValue();
        if (data != null && data.length() > 0) {
          out.print(' ');
          out.print(data);
        }//end if
        out.print("?>");
        break;
      }//end case Node.PROCESSING_INSTRUCTION_NODE
    }//end switch

Close the element

There is one more thing that needs to be done before exiting the writeNode() method being used to process a node. If the node being processed is an element, the end tag for the element needs to be created and written to the output file. That is accomplished in a straightforward manner in the next fragment.

    if (type == Node.ELEMENT_NODE) {
      out.print("</");
      out.print(node.getNodeName());
      out.print('>');
    }//end if

More screen output

You will recall that in order to help you understand the recursive process being used, I placed print statements at the entry to and the exit from the writeNode() method.

The next fragment displays information about the node being processed immediately prior to exiting the instance of the method.

Remember that the reason that I refer to the instance of the method is because this method is called recursively. At any point in time, several instances (invocations) of the method may be active.

In other words, control may have temporarily moved from inside to the method and re-entered the method. (Methods that will accommodate this type of behavior have long been known as reentrant methods.)

Instances may not the best terminology to use here, but it is the best that I could come up with to try to explain what is happening. If you already understood recursion, the terminology shouldn't matter.

If you didn't previously understand recursion, I hope that this has helped to give you some understanding of that very important programming concept.

The next fragment also terminates the writeNode() method.

    //Display screen output to help in understanding the 
    // recursive process being used.
    System.out.println("Exit writeNode for node:" 
                                     + node.getNodeName());

  }//end writeNode(Node)

Utility methods

That brings us to some utility methods that are invoked by the code that we discussed above.

The strToXML() method

The purpose of this method is to modify and return a String object replacing angle brackets, ampersands, and quotation marks with XML entities.

As mentioned earlier, I have discussed this method in previous lessons, so I won't discuss it further here. You can view the method in the program listing near the end of the lesson.

The getAttrArray() method

Earlier in this lesson, we saw code that invoked this method and I promised to discuss it later.

This method converts an object of type NamedNodeMap into an array of type Attr.

The procedure is straightforward:

  private Attr[] getAttrArray(NamedNodeMap attrs) {
    int len = (attrs != null) ? attrs.getLength() : 0;
    Attr array[] = new Attr[len];
    for (int i = 0; i < len; i++) {
      array[i] = (Attr)attrs.item(i);
    }//end for loop
    
    return array;
  }//end getAttrArray()

That ends the discussion of the class named Dom01Writer.

Program Listings

A complete listing of the programs and the XML file is contained in this section.

<?xml version="1.0"?>
<bookOfPoems>
  <poem PoemNumber="1" DumAtr="dum val">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>
  <poem PoemNumber="2" DumAtr="dum val">
    <line>Roses are pink,</line>
    <line>Dandelions are yellow,</line>
    <line>If you like Java,</line>
    <line>You are a good fellow.</line>
  </poem>
</bookOfPoems>

.

/*File Dom01.java Copyright 1999 R.G.Baldwin
This is a driver program that illustrates how to use a DOM
parser, and how to traverse a Document object created
by the DOM parser.

The program requires access to the following class files:
Dom01Parser.class
Dom01Writer.class

The program instantiates a DOM parser object of the class
named Dom01Parser.  The parser object is based on the
IBM XML4J non-validating parser.

The program uses the parse() method of the parser object 
to parse an XML file specified on the command line.  

The parse method returns an object of type Document that
represents the parsed XML file.

The program passes the Document object to a method named
writeXmlFile() on an object of a class named Dom01Writer.
The purpose of this method and this class is to write
an XML file corresponding to the information contained
in the Document object.

The method also generates output on the screen designed to
help the student understand the recursive process involved
in processing the DOM tree to write the file.

In this case, the XML file is written to a file named 
junk.xml, but the name of the output file could easily
be made into an input parameter.

Tested using JDK 1.2 and the IBM XML4J parser

When tested using an XML file that read as follows:
  
<?xml version="1.0"?>
<bookOfPoems>
  <poem PoemNumber="1" DumAtr="dum val">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>
  <poem PoemNumber="2" DumAtr="dum val">
    <line>Roses are pink,</line>
    <line>Dandelions are yellow,</line>
    <line>If you like Java,</line>
    <line>You are a good fellow.</line>
  </poem>
</bookOfPoems>
  
The following output was produced on the screen:
  
Enter writeNode for node:#document-null-
Enter writeNode for node:bookOfPoems-null-
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Enter writeNode for node:poem-null-
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Roses are red,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Violets are blue.-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Sugar is sweet,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-and so are you.-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Exit writeNode for node:poem
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Enter writeNode for node:poem-null-
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Roses are pink,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-Dandelions are yellow,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-If you like Java,-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
    -
Exit writeNode for node:#text
Enter writeNode for node:line-null-
Enter writeNode for node:#text-You are a good fellow.-
Exit writeNode for node:#text
Exit writeNode for node:line
Enter writeNode for node:#text-
  -
Exit writeNode for node:#text
Exit writeNode for node:poem
Enter writeNode for node:#text-

Exit writeNode for node:#text
Exit writeNode for node:bookOfPoems
Exit writeNode for node:#document
  
And an output file was produced containing the following:
  
<?xml version="1.0"?>
<bookOfPoems>
  <poem DumAtr="dum val" PoemNumber="1">
    <line>Roses are red,</line>
    <line>Violets are blue.</line>
    <line>Sugar is sweet,</line>
    <line>and so are you.</line>
  </poem>
  <poem DumAtr="dum val" PoemNumber="2">
    <line>Roses are pink,</line>
    <line>Dandelions are yellow,</line>
    <line>If you like Java,</line>
    <line>You are a good fellow.</line>
  </poem>
</bookOfPoems>

**********************************************************/

import org.w3c.dom.*;

public class Dom01 {
  private static final String xmlFile = "junk.xml";
  //-----------------------------------------------------//

  public static void main(String argv[]) {
    if (argv.length != 1) {
      System.err.println("usage: java Dom01 uri");
      System.exit(1);
    }//end if

    //Get a parser object
    Dom01Parser parser = new Dom01Parser();    

    try{
      //Parse the XML file to get a Document object
      // that represents the XML file.
      Document document = parser.parse(argv[0]);
      //Write the XML file
      new Dom01Writer(xmlFile).writeXmlFile(document);
    }catch(Exception e){
      e.printStackTrace(System.err);
    }//end catch

  }// end main()
} // class Dom01

.

/*File Dom01Parser.java, 

This is a wrapper class for the IBM XML4J non-validating
DOM parser.

An object of this class functions as a DOM parser with
SAX error handling capability.

The parse() method of this class returns a Document object
representing the parsed XML file.

Tested using JDK 1.2 and the IBM XML4J parser
**********************************************************/
import org.w3c.dom.Document;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;


public class Dom01Parser implements ErrorHandler{
  //Create the actual parser object
  com.ibm.xml.parsers.NonValidatingDOMParser parser = 
    new com.ibm.xml.parsers.NonValidatingDOMParser();
  //-----------------------------------------------------//
  
  public Dom01Parser() {//constructor
    parser.setErrorHandler(this);
  }//end constructor
  //-----------------------------------------------------//

  //Parse the specified URI and return a Document object
  // that represents the XML file being parsed.
  public Document parse(String uri) throws Exception {
    parser.parse(uri);
    return parser.getDocument();
  }//end parse()
  //-----------------------------------------------------//

  // The following methods handle SAX errors.


  public void warning(SAXParseException ex) {
    System.err.println("[Warning] "+
              getLocationString(ex)+": "+ ex.getMessage());
  }//end warning()
  //-----------------------------------------------------//

  public void error(SAXParseException ex) {
    System.err.println("[Error] "+
              getLocationString(ex)+": "+ ex.getMessage());
  }//end error()
  //-----------------------------------------------------//

  public void fatalError(SAXParseException ex)
                                      throws SAXException {
    System.err.println("[Fatal Error] "+
              getLocationString(ex)+": "+ ex.getMessage());
    throw ex;
  }//end fatalError()
  //-----------------------------------------------------//

  //This method provides location information that is 
  // displayed in the error messages generated by the
  // error handling methods above.
  private String getLocationString(SAXParseException ex){
    StringBuffer str = new StringBuffer();

    String systemId = ex.getSystemId();
    if (systemId != null){
      int index = systemId.lastIndexOf('/');
      if (index != -1){
        systemId = systemId.substring(index + 1);
      }//end if
      str.append(systemId);
    }//end if
    str.append(':');
    str.append(ex.getLineNumber());
    str.append(':');
    str.append(ex.getColumnNumber());

    return str.toString();

  }//end getLocationString()
  //-----------------------------------------------------//
} // class Dom01Parser

.

/*File Dom01Writer.java Copyright 1999 R.G.Baldwin

This class provides a utility method named writeXmlFile()
that receives a DOM Document object as a parameter and
writes an output XML file that matches the information
contained in the Document object.

The method also produces screen output designed to help
in understanding the recursive process involved.

The name of the XML file is established as a parameter
to the constructor of the class.

Tested using JDK 1.2 and the IBM XML4J parser
**********************************************************/

import java.io.PrintWriter;
import java.io.FileOutputStream;

import org.w3c.dom.*;

public class Dom01Writer {
  private PrintWriter out;

  //-----------------------------------------------------//

  public Dom01Writer(String xmlFile) {//constructor
    try {
      out = new PrintWriter(new FileOutputStream(xmlFile));
    }catch (Exception e) {
      e.printStackTrace(System.err);
    }//end catch
  }//end constructor
  //-----------------------------------------------------//

  //This method converts an incoming Document object to
  // an output XML file
  public void writeXmlFile(Document document) {
    try {
      //Write the contents of the Document object into
      // an output file in XML file format
      writeNode(document);
    }catch (Exception e) {
      e.printStackTrace(System.err);
    }//end catch
  }//end writeXmlFile()
  //-----------------------------------------------------//

  //This method is used recursively to convert node data
  // to XML format and write the XML format data to the
  // output file.
  public void writeNode(Node node) {
    if (node == null) {
      System.err.println("Nothing to do, node is null");
      return;
    }//end if
    
    //Display screen output to help in understanding the 
    // recursive process being used.
    System.out.println("Enter writeNode for node:" 
                            + node.getNodeName() + "-" 
                              + node.getNodeValue() + "-");

    //Process the node based on its type.
    int type = node.getNodeType();
    
    switch (type) {
      //Process the Document node
      case Node.DOCUMENT_NODE: {
        //Write a required line for an XML document
        out.print("<?xml version=\"1.0\"?>\n");
        //Get and write the root element of the Document
        // Note that this is a recursive call.
        writeNode( ((Document)node).getDocumentElement() );
        out.flush();
        break;
      }//end case Node.DOCUMENT_NODE

      //Write an element with attributes
      case Node.ELEMENT_NODE: {
        out.print('<');//begin the start tag
        out.print(node.getNodeName());
        
        //Get and write the attributes belonging
        // to the element.  First get the attributes in
        // the form of an array.
        Attr attrs[] = getAttrArray(node.getAttributes());

        //Now process all of the attributes in the array.
        for (int i = 0; i < attrs.length; i++) {
          Attr attr = attrs[i];
          out.print(' ');//write a space
          out.print(attr.getNodeName());
          out.print("=\"");//write ="
          //Convert <,>,&, or quotation char to entities
          // and write the text containing the entities
          out.print(strToXML(attr.getNodeValue()));
          out.print('"');//write the closing quotation mark
        }//end for loop
        out.print('>');//write the end of the start tag
        
        //Deal with the possibility that there may be
        // other elements nested in this element.
        NodeList children = node.getChildNodes();
        if (children != null) {//there are nested elements
          int len = children.getLength();
          //Iterate on the NodeList of child nodes
          for (int i = 0; i < len; i++) {
          //Write each of the nested elements recursively
          writeNode(children.item(i));
          }//end for loop
        }//end if
        break;
      }//end case Node.ELEMENT_NODE

      //Handle entity reference nodes
      case Node.ENTITY_REFERENCE_NODE: {
        out.print('&');
        out.print(node.getNodeName());
        out.print(';');
        break;
      }//end case Node.ENTITY_REFERENCE_NODE

      //Handle text
      case Node.CDATA_SECTION_NODE: 
      case Node.TEXT_NODE: {
        //Eliminate <,>,& and quotation marks and write
        out.print(strToXML(node.getNodeValue()));
        break;
      }//end case Node.TEXT_NODE

      //Handle processing instruction
      case Node.PROCESSING_INSTRUCTION_NODE: {
        out.print("<?");
        out.print(node.getNodeName());
        String data = node.getNodeValue();
        if (data != null && data.length() > 0) {
          out.print(' ');
          out.print(data);
        }//end if
        out.print("?>");
        break;
      }//end case Node.PROCESSING_INSTRUCTION_NODE
    }//end switch

    //Now write the end tag for element nodes
    if (type == Node.ELEMENT_NODE) {
      out.print("</");
      out.print(node.getNodeName());
      out.print('>');
    }//end if
    
    //Display screen output to help in understanding the 
    // recursive process being used.
    System.out.println("Exit writeNode for node:" 
                                     + node.getNodeName());

  }//end writeNode(Node)
  //-----------------------------------------------------//

  //The following methods are utility methods

  //This method inserts entities in place of <,>,&,
  // and quotation mark
  private String strToXML(String s) {
    StringBuffer str = new StringBuffer();

    int len = (s != null) ? s.length() : 0;
    
    for (int i = 0; i < len; i++) {
      char ch = s.charAt(i);
      switch (ch) {
        case '<': {
          str.append("&lt;");
          break;
        }//end case '<'
        case '>': {
          str.append("&gt;");
          break;
        }//end case '>'
        case '&': {
          str.append("&amp;");
          break;
        }//end case '&'
        case '"': {
          str.append("&quot;");
          break;
        }//end case '"'
        default: {
          str.append(ch);
        }//end default
      }//end switch
    }//end for loop

    return str.toString();

  }//end strToXML()
  //-----------------------------------------------------//

  //This method converts a NamedNodeMap into an array 
  // of type Attr 
  private Attr[] getAttrArray(NamedNodeMap attrs) {
    int len = (attrs != null) ? attrs.getLength() : 0;
    Attr array[] = new Attr[len];
    for (int i = 0; i < len; i++) {
      array[i] = (Attr)attrs.item(i);
    }//end for loop
    
    return array;
  }//end getAttrArray()
  
  //-----------------------------------------------------//

} // class Dom01Writer

-end-