XML and IE5, Part 1

by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page

Dateline: 12/30/99

WYSIWYG HTML Editor: In my previous article, I mentioned the fact that I have started using the WYSIWYG HTML editor that is included in Microsoft Word 2000. I also mentioned that the new editor has lots of connections to XML.

For example, I have discovered that (with one set of Word 2000 options), when I include references to local image files in my HTML document, the document produced by Word 2000 contains something similar to the following inside the <head>element.

A different set of options produces the following:

You can see this by viewing the page source for this document.

Editor creates a subdirectory: In addition, under one set of options, the editor creates a subdirectory as a child of the current directory. The name of the subdirectory matches the name following “./” in the above HTML statement, which in turn, is based on the name of the HTML file. In this case, the name of the subdirectory is aa123099_files.

Editor creates an XML file: With either set of options, the editor creates several new files and places them either in the current directory or in the new subdirectory.

One of the new files created by Word 2000 is an XML file named filelist.xml. The editor also creates duplicates of the image files and places them in the directory. Then it refers to the duplicate copies of the files in the HTML rather than to the original files with HTML statements such as the following:

<v:imagedata src="graphics/aa123099_image001.gif" o:title="JoeComputer"/>

In this case, the name of the original image file is graphics/JoeComputer.gif, and the name of the duplicate file is graphics/aa123099_image001.gif.

Why does it create the duplicate files? At this point in time, I honestly don’t know. In fact, I’m not even certain that I like it, because it makes it more difficult to publish the new HTML file on the web site and it also consumes lots of disk space.

So far I haven’t figured out how to prevent it. This may simply be part of the price of having access to a good WYSIWYG HTML editor. If anyone has any insights on this, I would welcome hearing from them.

Apparently not IE5 compatible: The really peculiar thing is that when I use Word 2000 to create this document and the files that it creates, and I post all of those files on my web site, everything works OK with Netscape 4.5. However, when I view the page on the web site with IE5.0, none of the images are visible. (The images are visible when I view the HTML file as a local file using IE5.0.)

So, even though this set of files is being produced using Microsoft Word, if you are using IE5.0 to view this HTML file, you probably won’t be able to see any of the images. -- No one ever said that web site maintenance would be easy! --

If you know how to solve this problem, please let me know.

Even if you are using IE5, you can still perform the experiment described below and achieve the primary objective of this article, which involves the viewing of XML pages using IE5.

UPDATE: I believe that I was able to resolve this apparent IE5 compatibility problem by downloading and installing an HTML Filter from Microsoft. This makes it possible to export a compact HTML file from Word 2000, which seems to be compatible with IE5.

Why did I mention this at all? Actually, this ties very closely to the main topic of this article. In this article, I plan to introduce you to the use of IE5 for viewing XML files. One of the XML files that I will use as an illustration is the file named filelist.xml that is automatically created by Word2000.

Screenshot of filelist.xml: Just to start things off, I am going to show you a screen shot of what that file looks like when viewed with IE5. Note that this is not the final version of the file. The following image is an intermediate version that I captured during the development of this article.

I realize that the contents of this file won’t mean much at this point. My primary purpose is to illustrate the default viewing format produced by IE5.

Those of you using IE5 who can’t see the above screenshot in this document should be able to view the XML file directly by pointing your browser to here. However, you probably shouldn’t expect this to provide anything interesting with other browsers. (Actually, on my computer, when I click on the above link with Netscape, a dialog box appears that allows me to save or open the file. When I select open, Netscape uses IE5.0 to open the file.)

Now let’s talk about IE5: According to Microsoft,

“You can use Microsoft® Internet Explorer 5 to view XML documents in the browser just as you would view HTML pages. There are several ways to navigate to an XML document on a Web site. You can follow a link to that document, type the URL of the XML document into the address bar, select an XML document from your history or favorites, double-click an XML document from the desktop, and so on. To view the XML source, select View Source from the File menu.

What does the future hold? This is the first in a series of articles in which I will explore the XML-browsing capabilities that are offered by IE5. I will show the default viewing format for an XML file in this article, and will expand on that viewing format in subsequent articles.

An example XML file: The primary purpose of this article is to make it possible for you to experiment with IE5 for viewing a simple XML file in the default viewing format. If you don’t already have IE5 installed, you will need to download and install it in order to perform the following experiment.

Create the XML file: Create a new text file with an extension of .xml. Then copy and paste the following simple XML text into that file.

<book>
<chap number="1">
<page number="1">
Text for Chapter 1, Page 1
</page>
<page number="2">
Text for Chapter 1, Page 2
</page>
<page number="3">
Text for Chapter 1, Page 3
</page>
</chap>
<chap number="2">
<page number="1">
Text for Chapter 2, Page 1
</page>
<page number="2">
Text for Chapter 2, Page 2
</page>
<page number="3">
Text for Chapter 2, Page 3
</page>
</chap>
</book>

View the XML file: Now, open your new XML file in IE5. If everything works properly, you should see your XML file displayed in the following default tree format.

Collapse the XML tree: Note that this is not a static display. If you click on the minus-sign characters on the left, you can cause the tree to collapse in a manner similar to Windows Explorer. After partially collapsing your XML tree, your view should look something like the following.

Note that a plus-sign character now appears on the left near the bottom. You can click on it to expand the element to which it refers. Thus, the default viewing format of IE5 is a tree that can be collapsed and expanded.

Must be well-formed: In order to produce this output, your XML file must be well-formed. As another part of the experiment, remove one of the end tags so that the file won’t be well-formed, and view the file again using IE5. Now you should see an error message that reads something like the following:

The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct
the error and then click the Refresh button, or try again later.

The following tags were not closed: book

Coming attractions...

I plan to explore some of the technical details behind all of this in the next lesson. In subsequent lessons, I will discuss the use of style sheets to cause the display format to be something other than the default tree format shown above.

The XML octopus

Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.

Credits

This HTML page was partially produced using the WYSIWYG features of Microsoft Word 2000. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two. He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin@austin.cc.tx.us
Baldwin's Home Page

-end-

rev00082147