Data Structures in Java: Part 1, Getting Started

Baldwin introduces you to the Java Collections Framework.  Once you learn how to use the framework, it is unlikely that you will need to reinvent common data structures, search algorithms, or sorting algorithms again, because those capabilities are neatly packaged within the framework.

Published: May 7, 2001
By Richard G. Baldwin

Java Programming, Lecture Notes # 1350


Preface

This is the first lesson in a new miniseries on Java data structures and the Java Collections Framework.

The purpose of this miniseries is to help you learn the essential features of Object-Oriented data structures in Java using the Collections Framework.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings while you are reading about them.

Recommended supplementary reading

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes my lessons are difficult to locate there.  You will find a consolidated index at Baldwin's Java Programming Tutorials.

The index on my site provides links to the lessons at Gamelan.com.

Preview

This lesson provides a brief introduction to the use of the Java Collections Framework.  The framework is designed to encourage you to reuse rather than to reinvent collections and maps.

A collection represents a group of objects, known as its elements.  Some collections allow duplicate elements while others do not. Some collections are ordered and others are not.  (Maps will be discussed in subsequent lessons.)

The Collections Framework is defined by a set of interfaces and associated contracts, and provides concrete implementations of the interfaces for the most common data structures.  In addition, the framework also provides several abstract implementations, which are designed to make it easier for you to create new and different implementations while still maintaining the structural polymorphic integrity of the framework.

Introduction

A little quiz

Let's begin with a little quiz to establish your baseline knowledge of the Collections Framework.  Take a look at the program in Listing 6 near the end of this lesson.  Which of the following is the output produced by that program?

If your answer was 1234 then you already know quite a lot about the use of the Collections Framework.  If not, keep reading to begin learning about the framework.

Elements of the Framework are easy to use

This simple introductory program is not intended to do anything particularly useful.  Rather, it was designed to illustrate several important features of the framework, including the ease with which elements of the framework can be reused in your programs.

Don't reinvent the wheel

As many of you already know, I am a college professor.  I specialize in teaching OOP using Java.  In the past, many college courses in Data Structures (often referred to as CS2 courses) have emphasized the concept of reinventing the wheel.  Students were required to learn how to reinvent a variety of complex data structures in order to successfully complete the course.

Hopefully a change is in the works

Hopefully, with the conversion of these CS2 courses to Java OOP, the emphasis will change to reuse instead of reinvent.

Unfortunately, most of the CS2 Java textbooks that I have seen so far look like warmed-over textbooks from the days of Pascal, C, and C++.  Many look like they were written using cut-and-paste technology to update an old Pascal, C, or C++ textbook to incorporate a little Java OOP.  Most of the textbooks that I have seen include only enough Java OOP to avoid violating truth in advertising requirements.  They still emphasize reinvent instead of reuse.

Collections Framework encourages reuse

The Java Collections Framework is designed to encourage programmers to reuse existing interfaces and classes instead of inventing new ones.  In the event that it is necessary to invent a new class or interface, the programmer is encouraged to integrate it into the Framework in a polymorphic manner.

Sample Program

I am going to provide a brief discussion of the sample program (shown in Listing 6) in this lesson.  Later, I will provide more detailed discussions of many of the features used in that program.

Interesting Code Fragments

I will break this program down and discuss it in fragments.

An object of the TreeSet class

The code fragment in Listing 1 instantiates an object of the TreeSet class and stores the object's reference in a reference variable of type Collection.
 
class Worker{
  public void doIt(){
    Collection ref = new TreeSet();

Listing 1

Collection is an interface

The TreeSet class implements the SortedSet interface, which extends the Set interface, which in turn extends the Collection interface.  Thus, a TreeSet object is a Collection.  Therefore, a reference to a TreeSet object can be stored in a reference variable of type Collection, and can be treated as the generic type Collection.

What is a TreeSet object?

Among other things, in CS2 courses, we worry about the time and memory cost of a collection.  According to Sun, the TreeSet class guarantees that the sorted set will be in ascending element order, and provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

What does ascending element order mean?

Again, according to Sun, the elements will be sorted according to the natural order of the elements (see the Comparable interface) or by a comparator (see the Comparator interface) provided at the time the set is created.  This depends on which overloaded constructor is used.  I will have more to say about these alternatives in a subsequent lesson.

What does log(n) time cost mean?

I'm not going to try to explain the details of log(n) time cost here.  Suffice it to say that the add, remove, and contains methods execute very fast.  (I will have more to say about this is a subsequent lesson.)

A TreeSet object is a Set

An object of the TreeSet class also is a Set.  One of the characteristics of a Java Set (an object that implements the Set interface) is that it can contain no duplicate elements.  Therefore, a TreeSet object can contain no duplicate elements.  If the add() method of a TreeSet object is invoked in an attempt to add a duplicate element, the element will not be added.

A TreeSet object is a SortedSet

The TreeSet class also implements the SortedSet interface.  This guarantees that the contents of a TreeSet object will be in ascending element order, regardless of the order in which the elements are added.  (In a subsequent lesson, I will discuss how comparisons are made to enforce the ordering of the elements.)

A TreeSet object is a Collection

Because an object of the TreeSet class is a Collection, a reference to such an object can be passed to any method that requires an incoming parameter of type Collection.  The receiving method can invoke any method on that reference that is declared in the Collection interface. (I will discuss such methods in detail in subsequent lessons.)

Populate the Collection

The statement in Listing 2 below passes the TreeSet object's reference to a method named fillIt(), which is a class method of the Populator class.  (The Populator class is a class of my own design whose only purpose is to illustrate the polymorphic behavior achieved using the Collections Framework.)  The behavior of this method is to add elements to the incoming Collection object without regard for the actual type of the object (the class from which the object was instantiated).
 
    Populator.fillIt(ref);

Listing 2

At this point, I am going to discuss the fillIt() method of the Populator class invoked in the above fragment.  The entire class definition of the Populator class is shown in Listing 3 below.
 
class Populator{
  public static void fillIt(
                       Collection ref){
    ref.add(new Integer(4));
    ref.add(new Integer(4));
    ref.add(new Integer(3));
    ref.add(new Integer(2));
    ref.add(new Integer(1));
  }//end fillIt()
}//end class populator

Listing 3

Don't know, don't care

As you can see in the above fragment, the fillIt() method receives the reference to the TreeSet object as type Collection.  This method doesn't know, and doesn't care, what the actual type of the object is.  All it cares about is that the object is a Collection object.  (Otherwise, the object's reference couldn't be passed in as a parameter.  A type mismatch would occur.)

Because the incoming parameter is a reference to a Collection object, the fillIt() method can invoke the add() method on the object, with confidence that the behavior of the add() method will be appropriate for the specific type of object involved.  (For example, the behavior of the add() method for an object of the TreeSet class will probably be different from the behavior of the add() method for an object of some other class that implements the Collection interface.)

Polymorphism is great!

The great thing about polymorphic behavior is that the author of the fillIt() method doesn't need to be concerned about the implementation details of the add() method.

Add five elements with some duplicates

The code in the fillIt() method adds five elements to the object.  Each element is a reference to a new object of type Integer.  Two of the objects encapsulate the int value 4, and thus are duplicates.

The int values encapsulated in the Integer objects are not in ascending order.  Rather, they are added to the object in descending order.  (They could be added in any order and the end result would be the same.)

Filter out the duplicates

Unknown to the author of the fillIt() method, the add() method filters out the duplicate element in order to satisfy the contract of the Collection interface.  In this case, the author didn't care what happens in the case of duplicate elements.

Notification of duplicates

If the author of the fillIt() method does care what happens in the case of duplicates, she can find out when an object is a duplicate.

According to the contract of the Collection interface, the add() method must return true if the invocation of the method modifies the contents of the object and must return false if the collection does not permit duplicates and the collection already contains the specified element.

Sort the elements

Also unknown to the author of the fillIt() method, even though the elements are added in descending order (or could be added in any other order for that matter), they are stored and maintained in the TreeSet object in such a way that they can later be accessed in ascending order.

The TreeSet object is now populated

When the fillIt() method returns, the TreeSet object contains four (not five) elements with no duplicates.  Each element is a reference to an object of type Integer.  Those references are maintained in such a way as to make them accessible in ascending order, based on the int values encapsulated in each of the Integer objects.

Get an Iterator object

Returning now to the flow in the doIt() method, the following statement invokes the iterator() method on the TreeSet object's reference that is stored in the reference variable of type Collection.  This is shown in Listing 4 below.
 
    Iterator iter = ref.iterator();

Listing 4

Invocation of the iterator() method on any Collection object returns an instance of a class that implements the Iterator interface.  The Iterator object can be used to traverse the collection, gaining access to each element in order.  (The concept of in order means different things for different kinds of collections.  For a collection instantiated from the TreeSet class, in order means in ascending order.)

Again, don't know, don't care

Again, the author of the method that uses the Collection object doesn't need to know or care about the internal implementation of the collection, or the implementation of the methods of the Iterator object.  They simply do what they do, and can be used for their intended purpose.

An Iterator object acts as a doorkeeper

The Iterator interface declares three methods:

You might say that an Iterator object acts as a doorkeeper for the collection object that it represents, providing access to the contents of the collection in a very specific manner.

The code fragment in Listing 5 below shows how the first two of the above methods can be used to

As mentioned earlier, when the collection is an object instantiated from the TreeSet class, access to the elements is provided in ascending order.
 
    while(iter.hasNext()){
      System.out.print(iter.next());
    }//end while loop

Listing 5

Four elements with no duplicates

At this point, the TreeSet object contains four elements, with no duplicates.  Each of the elements is a reference to an object of type Integer.  The code in the above loop causes each of those elements to be accessed and displayed in ascending order.  This causes the following text to appear on the screen:

1234

An editorial opinion

In my opinion, this is the kind of knowledge that a computer science student in a modern data structures course should be learning.  This is a far departure from courses of the past where CS2 students were required to memorize the intricate details of how to implement various data structures.

What kind of knowledge is needed?

Does an architect need to understand the detailed inner workings of an air conditioning compressor in order to design a cooling system into a building?  Of course not!

However, the architect does need to know the tradeoffs among the available cooling systems in terms of initial cost, operating cost, size, efficiency, etc.

Does an audio technician need to understand the detailed inner workings of an electronic audio equalizer in order to construct an integrated audio system?  Absolutely not!  If that were a requirement, there would likely be very few audio systems in existence.

However, the audio technician does need to understand the tradeoffs among the various available audio equalizers.

The same concept applies to software design

Does an OOP software designer need to know the detailed inner workings of the various kinds of collection objects in order to use them effectively?  No!

However, the software designer does need to know the tradeoffs among the various types of collection objects in terms of their operational behavior.

Modern CS2 students should be learning about the performance and operational differences among the different types of collections, and how to use available frameworks to create and use those collections.  They should not be wasting their time learning how to reinvent them.  They have more important ways to spend their valuable time, and they have more important things to learn.

An analogy

Frankly, I don't care how the programmers at Sun implemented the TreeSet class, so long as the behavior of objects instantiated from that class meets the published specifications.

As an analogy, I also don't care how they implemented the Random class, so long as objects instantiated from the Random class provide the pseudo random values that I need in my programs.

I see no conceptual differences between the TreeSet class and the Random class from a software reuse viewpoint.


Its time to reinvent the CS2 curriculum

I'm confident that the future employers of most students share my opinion on this.  I don't know of any employer who wants their programmers to expend time and dollars reinventing the classical data structures.  What those employers are looking for is a staff of programmers who understand the tradeoffs among the data structures, and when it is appropriate to use each of the different structures.

It is time to reinvent the curriculum in CS2 courses by

Summary

In this lesson, I have provided a brief introduction to the use of the Java Collections Framework.  The framework is designed to encourage you to reuse rather than to reinvent collections and maps (I will have more to say about maps in a subsequent lesson).

A collection represents a group of objects, known as its elements.

While some collections allow duplicate elements, others do not. Some collections are ordered and others are not ordered.

The Collections Framework is defined by a set of interfaces and associated contracts.  The framework provides concrete implementations of the interfaces (classes) for the most common data structures.  In addition, the framework also provides several abstract implementations, which are designed to make it easier for you to create new and different concrete implementations.

The TreeSet class is a concrete implementation of the SortedSet interface.  The SortedSet interface extends Set, which extends Collection.  Thus, a TreeSet object is a SortedSet.  Also it is a Set, and it is a Collection.

The TreeSet class guarantees that the sorted set will be in ascending element order, and provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

TreeSet objects can be treated as the generic type Collection.  Methods declared in the Collection interface can be invoked on a TreeSet object without regard for the actual class from which the object was instantiated. (This is polymorphic behavior.)

When such methods are invoked, the author of the program can have confidence that the behavior of the method will be appropriate for an object of the class from which the object was instantiated.  In my opinion, this is the true essence of object-oriented behavior.

What's Next?

This is the first lesson in a miniseries on the Collection Framework.  Subsequent lessons will teach you how to use the framework for creating and using various types of collections and maps.

Once you learn how to use the framework, it is unlikely that you will need to reinvent classical data structures, search algorithms, or sorting algorithms, because those capabilities are neatly packaged within the framework.

Complete Program Listing

A complete listing of the program is provided in Listing 6 below.
 
import java.util.TreeSet;
import java.util.Collection;
import java.util.Iterator;

public class AP400{
  public static void main(
                        String args[]){
    new Worker().doIt();
  }//end main()
}//end class AP400

class Worker{
  public void doIt(){
    Collection ref = new TreeSet();
    Populator.fillIt(ref);
    Iterator iter = ref.iterator();
    while(iter.hasNext()){
      System.out.print(iter.next());
    }//end while loop
    System.out.println();
    
  }//end doIt()
}// end class Worker

class Populator{
  public static void fillIt(
                       Collection ref){
    ref.add(new Integer(4));
    ref.add(new Integer(4));
    ref.add(new Integer(3));
    ref.add(new Integer(2));
    ref.add(new Integer(1));
  }//end fillIt()
}//end class populator

Listing 6


Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin.richard@iname.com

-end-