A Layman's View of XML, Part 1

by Richard G. Baldwin
baldwin@austin.cc.tx.us
Baldwin's Home Page

Dateline: 09/18/99

Prolog

A few days ago, I received an email message that read "How can I find out about XML in layman's language?" Initially, the technical snob in me responded by thinking that this message didn't deserve much consideration. After all, this site contains links to more than 800 high-quality links on XML. Certainly the author of the message should be able to find something among these 800+ links that would answer his question.

A similar situation closer to home

During the same time period, I was trying to work my way through the maze of tax material on a State of Texas web site to find information on sales tax in conjunction with a small business that I operate.

After spending several hours on the site, unable to find the answer to an especially important question on the remittance of sales tax, I found myself calling a 1-800 information number and (after being put on hold for a very long period of time) asking essentially the same question, "How can I find out about the sales tax laws in layman's language?"

After all, I said to myself, I shouldn't have to be an accountant, or shouldn't have to hire an accountant, simply to collect (and send to state tax authorities) 8.25-percent of the sales price on the electronic Java books that I sell to residents of Texas.

Bingo! Then it hit me

Suddenly the similarity of the two situations hit me. I realized that the Email message that I had received contained a perfectly valid question from someone just being introduced to XML. And telling that person to go my web site and search through 800+ links on XML (which unfortunately is what I did) is not a very satisfactory answer to the question.

After all, why should a person have to have a degree in computer science (or the persistence of a pit bull) just to understand the rudimentary aspects of something that is supposed to be available for the use of the publishing masses, so to speak?

So, I decided to attempt to write a series of articles explaining XML in layman's language, being particularly careful to avoid the use of technical jargon.

A brief definition

XML gives us a way to create and maintain structured documents in plain text that can be rendered in a variety of different ways.

Oops! There is the jargon creeping in again. Since I can't avoid the jargon entirely, I had better at least explain the meaning of the jargon.

What do I mean by a "structured document?"

I will attempt to answer this question by providing an example. A book is a structured document. In its simplest form, a book may be composed of chapters. The chapters may be composed of sections. The sections may contain illustrations and tables. The tables are composed of rows and columns. Thus, it would be possible to draw a picture that illustrates the structure of a book.

What do I mean by "plain text?"

Characters such as the letters of the alphabet and punctuation marks are represented in the computer by numeric values, similar to a simple substitution code that a child might devise. For example in one popular encoding scheme (ASCII), the upper-case version of the character "A" is represented by the value 65, a "B" is represented by the value 66, a "C" is represented by 67, etc.

The actual correspondence between the characters and the specific numeric values representing the characters has been described by several different encoding schemes over the years. One of the most common and enduring schemes is a scheme that was devised a number of years ago by an organization known as the American Standards Committee on Information Interchange. This encoding scheme is commonly known as the ASCII code. Here is what one author has to say about the ASCII code.

"This stands for American Standards Committee on Information Interchange. What it means in practice is plain text, that is to say text which is readable directly without using any special software. The advantage of ASCII is that it is a lowest common denominator which can be displayed on any platform. The disadvantage is that it is rather limited and somewhat boring. The text cannot display bold, italics or underlined fonts, and there is no scope for graphics or hypertext. However, it is simple, ... and is almost idiot-proof as a means of information exchange. To see a short example of ASCII click HERE, or to see a journal article in ASCII click HERE."

To be accurate, I must point out that XML is not confined to the use of the ASCII encoding scheme. Several different encoding schemes can be used. However, all of them, have been selected to make it possible to read a raw XML document without the requirement for any special software.

What do I mean by a raw XML document?

Here I am referring to the string of sequential characters that makes up the document, before any specific rendering has been applied to the document.

What do I mean by rendering?

The most common modern use of the word rendering probably means to present something for human consumption (not withstanding the fact that my grandmother who lived in the mountains of Virginia used to render pork fat to produce lard or something like that).

When we speak of rendering a drawing or an image, we usually mean that we are going to present it in a way that makes it look like a drawing or an image to a human observer. When we speak of rendering a document, we usually mean that we are going to present it in a way that a human will recognize it as a book, newspaper, or other document style, and can read it.

Consider a newspaper, for example

These days, there are at least two different ways to render a newspaper. One way is to print the information (daily news), mostly in black and white, on large sheets of low-grade paper commonly known as newsprint. This is the rendering format that ends up on my driveway each morning.

Another way to render a newspaper is to present the information on a computer screen, usually in full color, with the information content trying to fight its way through dozens of animated advertisements. This is the rendering format that ends up on my computer screen each day when I check for Email messages.

The base information for the newspaper doesn't (or shouldn't) change for these two renderings. After all, news is news and the content of the news shouldn't depend on how it is presented. What does change is the manner in which that information is presented.

A newspaper is a structured document

A newspaper is a structured document consisting of pages, columns, etc. When the information content of a newspaper is created and maintained in XML, that same information content can be rendered either on newsprint paper, or on your computer screen without having to rewrite the information content.

Not necessarily boring

If you visit the above link to the journal article rendered solely in ASCII, you will probably agree that from a presentation viewpoint it is pretty boring (no offense intended to the author of the article). However, through the use of XML, documents created and maintained in plain text need not necessarily be boring. Rather, it is possible to render those documents in rich and exciting ways.

Speaking of being bored...

Since by now, you are probably getting pretty bored with this article, I am going to cut it off at this point, and continue the discussion in my next article. Come back then to learn how XML uses boring plain text to render documents in ways that can be very interesting and informative.

Coming attractions...

In my next article I will discuss, in layman's language, the mechanism by which XML uses plain text to display richly-formatted structured documents.

The XML octopus

Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows up. XML has many tentacles, reaching out in all directions. But, that's what makes it fun. As your XML host, I will do my best to lead you to the information that you need to keep the XML octopus under control.

Credits

This HTML page was produced using the WYSIWYG features of Microsoft Word 97. The images on this page were used with permission from the Microsoft Word 97 Clipart Gallery.

180757

Copyright 2000, Richard G. Baldwin

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin@austin.cc.tx.us
Baldwin's Home Page

-end-