baldwin.jpg

Programming with Adobe Flex

XML - Well-Formed and Valid Documents

Published: December 12, 2009
Validated with Amaya

by Richard G. Baldwin

Flex Programming Notes # 0084


Preface

General

This tutorial lesson is part of a continuing series dedicated to programming with Adobe Flex.

Flex is a programming language based on XML. Therefore, in order to program with Flex, you must first understand XML. The previous lesson, this lesson, and lessons that follow provide a brief introduction to XML.

Viewing tip

I recommend that you open another copy of this document in a separate browser window and use the following links to easily find and view the listings while you are reading about them.

Listings

Supplemental material

I recommend that you also study the other lessons in my extensive collection of online programming tutorials.  You will find a consolidated index at www.DickBaldwin.com.

Well-formed and valid documents

In previous lessons, I have discussed tags, elements, content, and attributes in detail. The time has come to take up the following topics:

Valid documents and the DTD

What is a DTD?

According to the XML FAQ:

"A DTD is usually a file (or several files to be used together) which contains a formal definition of a particular type of document. This sets out what names can be used for elements, where they may occur, and how they all fit together. For example, if you want a document type to describe <LIST>s which contain <ITEM>s, part of your DTD would contain something like

<!ELEMENT item (#pcdata)>
<!ELEMENT list (item)+>

This defines items containing text, and lists containing items.

It's a formal language which lets processors automatically parse a document and identify where every element comes and how they relate to each other, so that stylesheets, navigators, browsers, search engines, databases, printing routines, and other applications can be used."

The bad news - DTDs are complicated

I included the above quotation to emphasize one very important point – DTDs are complicated. The bad news is that the creation of a DTD of any significance is a very complex task.

The good news!

The good news is that many of you will never need to worry about having to create a DTD for two reasons:

  1. In the most fundamental sense, XML does not require the use of a DTD.
  2. Even when it is advisable to use a DTD with XML, someone else may already have created the DTD on your behalf.

A validating XHTML editor

For example, I am writing this HTML document using a validating XHTML editor named Amaya. Even though the editor uses a DTD to confirm that my document is a valid XHTML document (and warns me if it isn't), it wasn't necessary for me to write the DTD. The people who wrote the editor also wrote the DTD.

Three Parts

It is reasonable to think of an XML document of consisting of three parts, some of which are optional. I’m gong to refer to the parts as files just so I will have something to call them (but they don't have to be separate physical files).

One file contains the information content of the document (words, pictures, etc.). This is the part containing tags, elements, content, and attributes that the author wants to expose to the client. I have discussed this part in previous lessons.

A second file is the DTD, which meets the definition given above.

A third file is a stylesheet that establishes how the content that conforms to the DTD is to be rendered on the output device. This is how the author wants the material to be presented to the client.

Rendering

For example a tag with an attribute of "red" might cause something to be presented bright red according to one stylesheet and dull red according to another stylesheet. (It might even be presented as some shade of green according to still another stylesheet.)

With XML, the DTD is optional but the stylesheet (or some processing mechanism that substitutes for a stylesheet) is required. Something has to be able to render the content in the manner that the author intended it to be rendered.

A DTD can be very complex

Once again, according to the XML FAQ:

"... the design and construction of a DTD can be a complex and non-trivial task, so XML has been designed so it can be used either with or without a DTD. DTDless operation means you can invent markup without having to define it formally.

To make this work, a DTDless file in effect `defines’ its own markup, informally, by the existence and location of elements where you create them. But when an XML application such as a browser encounters a DTDless file, it needs to be able to understand the document structure as it reads it, because it has no DTD to tell it what to expect, so some changes have been made to the rules."

Without the technical jargon please

In other words, it is entirely possible to create an XML document without the requirement for a DTD.

What is a valid document?

In the normal sense of the word, if something is invalid, that usually means that it is not any good. However, that is not the case for XML. An invalid XML document can be a perfectly good and useful document.

A valid XML document is one that conforms to an existing DTD in every respect.

In other words, unless the DTD allows a tag with the name "color", an XML document being validated against that DTD containing a tag with that name is not valid.

However, because XML does not require a DTD, an XML processor cannot require validation of the document. Many very useful XML documents are not valid, simply because they were not constructed according to an existing DTD.

An XHTML document

The document that you are now reading is a valid XML document. It is a special flavor of XML known as XHTML. The document was created using W3C's Editor/Browser named Amaya.

What you are probably seeing is a rendered version of the document. However, if you were to look at the raw XML code at the beginning of the document, you should see something like the XML code shown in Listing 1.

Listing 1. Raw XHTML code.

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<title>Flex Programming by Richard G. Baldwin</title>
<meta name="generator"
content="Amaya, see http://www.w3.org/Amaya/" />
</head>

Note that some extra line breaks were inserted in Listing 1 to force it to fit into this narrow publication format.

The DTD

Note in particular the code that is highlighted in yellow. This code specifies the DTD that is used to validate the code. If I inadvertently enter some XML code that causes the document to become invalid, a red warning appears in the bottom right corner of the editor.

A download site

If you examine the material with the yellow highlight in Listing 1 carefully, you will see that it actually specifies a location on the Internet from which you can download the DTD file. You can download it and open it in a text editor such as Windows Notepad to see a sample of a really complicated DTD.

Listing 2 shows a small portion of the XHTML DTD downloaded from the address highlighted in yellow in Listing 1.

Listing 2. A small portion of the XHTML DTD.

<!--
   Extensible HTML version 1.0 Transitional DTD

   This is the same as HTML 4 Transitional except for
   changes due to the differences between XML and SGML.

   Namespace = http://www.w3.org/1999/xhtml

   For further information, see: http://www.w3.org/TR/
xhtml1

   Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio),
   All Rights Reserved. 

   This DTD module is identified by the PUBLIC and 
SYSTEM identifiers:

   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
transitional.dtd"

   $Revision: 1.2 $
   $Date: 2009-12-14$

-->

<!--====== Character mnemonic entities =============-->

<!ENTITY % HTMLlat1 PUBLIC
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
   "-//W3C//ENTITIES Symbols for XHTML//EN"
   "xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
   "-//W3C//ENTITIES Special for XHTML//EN"
   "xhtml-special.ent">
%HTMLspecial;

Note that I inserted some line breaks into the text in Listing 2 to force it to fit into this narrow publication format.

Well-formed documents

XML derives from an earlier more complicated markup language known as SGML. Being well-formed is not a property of SGML. The concept of being well-formed was introduced as a requirement of XML, apparently to deal with the situation where a DTD is not available.

Why do we need well-formed XML documents?

Once again, according to the XML FAQ:

"For example, HTML's <IMG> element is defined as `EMPTY': it doesn't have an end-tag. Without a DTD, an XML application would have no way to know whether or not to expect an end-tag for an element, so the concept of `well-formed' has been introduced.

This makes the start and end of every element, and the occurrence of EMPTY elements completely unambiguous."

All XML documents must be well-formed

XML documents need not be valid, but

ALL XML DOCUMENTS MUST BE WELL-FORMED.

To be well-formed...

A well-formed XML document must meet several different criteria.

To begin with, in a well-formed XML document, all elements that can contain character data must have both start and end tags.

What is character data?

For purposes of this explanation, let’s just say that the content that we discussed earlier comprises character data.

Attribute values must be in quotes

All attribute values must be in quotes (apostrophes or double quotes). You can surround the value with apostrophes (single quotes) if the attribute value contains a double quote. An attribute value that is surrounded by double quotes can contain apostrophes.

Dealing with empty elements

EMPTY elements (those that contain no character data) must be written in one of the two ways shown in Listing 3, and for several reasons, the first way is usually considered preferable.

Listing 3. Required syntax for an empty element.

<foo/>
<foo></foo>

Don't forget that even an EMPTY element can contain one or more attributes along with namespace information inside the start tag.

No markup characters are allowed

For a document to be well-formed, it must not have markup characters such as (< or &) in the text data. If such characters are needed, you can represent them using &lt; and &amp; instead.

These special combinations of characters that represent other characters, such as &lt; that represents the left angle bracket < are called entities.

Nesting

Elements must nest properly. If one element contains another element, the entire second element must be defined inside the start and end tags of the first element. In the next lesson, you will learn that every element in an XML document, other than the root element, is nested inside another element.

Validity and well-formed requirements recap

Valid XML files are those that have a DTD and that conform to the DTD.

All XML files must be well-formed, but there is no requirement for them to be valid.

A DTD is not required in which case validity is impossible to establish. However, if XML documents do have a DTD, they must conform to it, which makes them valid.

Why use a DTD if it is not required?

There are many reasons to use a DTD, in spite of the fact that XML doesn’t require one. One reason is that the use of a DTD makes it possible to enforce format specifications.

Enforcing format specifications

For example, by creating this document using Amaya and the DTD referred to earlier, I am required to produce a document that conforms to the DTD for XHTML documents. Otherwise, I will get several warnings from the editor and will be required to acknowledge that the document doesn't conform to the DTD in order to save it.

On one hand, that sounds like a lot of hassle. On the other hand, by creating a document that conforms to the DTD for XHTML, I can be sure that it will render properly in any browser that is guaranteed to properly render XHTML documents.

Resources


Copyright

Copyright 2009, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is object-oriented programming using Java and other OOP languages.

Richard has participated in numerous consulting projects and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which have gained a worldwide following among experienced and aspiring programmers.

In addition to his programming expertise, Richard has many years of practical experience in Digital Signal Processing (DSP). His first job after he earned his Bachelor's degree was doing DSP in the Seismic Research Department of Texas Instruments. (TI is still a world leader in DSP.) In the following years, he applied his programming and DSP expertise to other interesting areas including sonar and underwater acoustics.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-