Richard G Baldwin (512) 223-4758, baldwin@austin.cc.tx.us, http://www2.austin.cc.tx.us/baldwin/

Security, Introduction to Message Digests

Java Programming, Lecture Notes # 710, Revised 5/14/99.


Preface

Students in Prof. Baldwin's Advanced Java Programming classes at ACC will be responsible for knowing and understanding all of the material in this lesson beginning with the spring semester of 1999.

This lesson was originally written on April 16, 1999 and has been updated several times since.

The programs in this lesson were tested using JDK 1.2 under Win95

Disclaimer

I claim absolutely no expertise in the area of security. I am simply a college professor attempting to gather information about Java on one hand and present it to my students on the other. I disclaim any responsibility for any security problems that may occur as a result of anyone using any of the material in any of my tutorial lessons.

You are responsible for your own actions. With regard to security, you should study not only the material that I will present, but also material provided by others who possess expertise in the security area. Hopefully my material will be useful in getting you started in that direction.

Two good books on security published by O'Reilly & Associates are:

I highly recommend both of these books.

Introduction

The three legs of the security stool

An earlier lesson suggested that when exchanging data electronically, the parties to the communication might be interested in the following three aspects of that communication:

This lesson deals primarily with the third item, integrity.

Message digests

The lesson introduces you to message digests. This is a good place to begin leaning about the JDK 1.2 security APIs.

Message digests

What is a message digest?

A message digest is a value consisting of a fixed number of bytes that represents a message of arbitrary length.

It is often referred to as the fingerprint of the message.

When computed two or more times using the same algorithm, the same message will always produce the same digest value.

What about duplicate fingerprints?

It is extremely unlikely that two different messages will produce the same digest value, even if the two messages are almost identical.

One prominent author states that you are more likely to win the lottery than you are to discover two different messages that will produce the same digest value.

What is the message digest good for?

Under proper security management (when authentication is achieved), a digest can be used to confirm that a message has not been modified electronically since the digest value was computed. Thus the digest is a tool that can be used to confirm message integrity.

How is authentication achieved?

Sometimes, achieving authentication is no more difficult than calling the sender of the message and asking the sender to read off the characters that make up the original digest value.

The printed version of a digest value is reasonably short making this a practical approach to authentication. (It is much more difficult for a hacker to gain control over a person's telephone than it is for the same hacker to intercept and modify an electronic message.)

How do you verify against the digest?

If you have a copy of the message and a copy of the digest that was originally computed for that message, you can re-compute the digest and compare your version of the digest with the original version of the digest. If they are the same, you can have a very high level of confidence that the message was not modified after the original digest was computed.

What about switching message digests?

Of course, you must be confident that someone didn't switch digests when they modified the message, and the telephone call described above isn't always practical.

Digital signatures and certificates

This leads to the topics of digital signatures and digital certificates, which are primary tools for network security. I will be discussing signatures and certificates in detail in subsequent lessons.

Message digests are used for digital signatures and digital certificates. Therefore, they form the basis for some of the most important security tools that exist on the Internet today.

Discussion

According to Sun

The MessageDigest class provides the functionality of a message digest algorithm, such as MD5 or SHA. Message digests are secure one-way hash functions that take arbitrary-sized data and output a fixed-length hash value.

Alphabet soup

So, what are MD5 and SHA?

This is a good starting point for your introduction to the JDK security and cryptographic APIs. There are several aspects of these two APIs that depend on the application of specific algorithms to your data. For example, MD5 is a specific algorithm for computing message digests. As of 4/16/99, you can learn more about the algorithm at http://www.sw.com.sg/CIE/RFC/1321/index.htm.

SHA-1 is another specific algorithm that is used to compute message digests and as of 4/16/99, you can learn more about it at http://w3c1.inria.fr/TR/1998/PR-DSig-label-19980403/SHA1-1_0.

The SHA-1 algorithm

Here is what Knudsen (Java Cryptography from O'Reilly) has to say about the SHA-1 algorithm.

"SHA-1 stands for Secure Hash algorithm. It was developed by the NIST (National Institute of Standards and Technology) in conjunction with NSA. ..."

What is the NSA?

The term NSA stands for National Security Agency, which is an agency of the United States Federal government. It was described in at least one TV program as an agency that is very interested in the electronic communications of others. (I knew that all those hours spent watching the Discovery channel and The Learning Channel on cable TV would pay off someday.)

Knudsen goes on to discuss the pros and cons of the algorithm in comparison with other algorithms such as MD5, MD4, SHA, and SHA-0.

A little information about security providers

This raises an important point about the APIs. For a variety of reasons, you may prefer to use algorithms developed by one provider rather than the same algorithms developed by a different provider.

Also some of the algorithms that you need may not be available from a single provider.

Although the APIs include a number of algorithms developed by Sun, the API is structured to make it possible for you to install algorithms developed by other providers as well.

Hopefully I will have time in some future lesson to discuss the mechanics of installing new providers.

Using a MessageDigest object

A MessageDigest object is initialized and ready for use when it is made available to your program.

The message is processed through the MessageDigest object by repeatedly invoking the update() method on the object and passing successive portions of the message to the method one byte at a time or as arrays of bytes.

When all of the message data has been fed into the update() method, you invoke the digest() method on the MessageDigest object to finish the job and to get the digest value as an array of bytes.

Very typical operation

The reason that I am discussing this in such detail at this time is because this is typical of the operations that you will be performing throughout the security and cryptographic APIs.

There are many different situations where some specific data needs to be processed using some specific algorithm. Generally in those cases, you will do the following:

MessageDigest doesn't require separate initialization

In the case of the MessageDigest object, the separate initialization step is not required.

The necessary initialization is accomplished by passing parameters to the factory method that you call to get an object of the class.

Using the reset() method of MessageDigest class

Another characteristic of the MessageDigest class that is common throughout the API is that the reset() method can be called on the object to re-initialize it and get it ready to process another message.

Also, calling the digest() method on the object re-initializes it and makes it ready to process another message. Thus, both reset() and digest() reset the object to its initialized state.

Two important aspects of security

There are two major aspects of Java and security on the Internet:

What do I mean by overall security procedures? One obvious example is that you should not allow your secret cryptographic keys to be compromised to the enemy (hackers, crackers, virus writers, etc.). However, there are many less obvious and more subtle operational procedures that are very important to overall security. These procedures require a great deal of thought about who can do what to you and how can they do it (probably bad grammar).

Will concentrate on mechanics

For the most part, my tutorial lessons will concentrate on the mechanics of using the Java tools and the Java API. I won't attempt to give advice on overall security procedures. Rather, I will leave that to others who have given a great deal of thought to the topic of who can do what to you and how can they do it.

Sample Program

This program, named Security06, provides an introduction to the use of the MessageDigest class in Java.

What does the program do?

The program creates and displays the digest values for three short strings. Two of the strings are the same. The third string differs from the other two in only one character

The program demonstrates that the digest value differs significantly for two strings that are almost identical.

It also shows that the digest value that is computed for a given string is the same when re-computed using the same algorithm

The digest values are displayed in Base 64 format, which I will explain later.

Program output

The program was tested using JDK 1.2 and Win95. It produces the following output, showing the digest value for the two different input string values. Note that the latter two input strings are the same as the first string, except for the addition of the 'H' character on the end of the string.

ABCDEFG
k75GEsQdI68YkdrF/Q1TVzb/xOM=

ABCDEFGH
Zdt/hoQdVTax8nL1MO5w/TY9guw=

ABCDEFGH
Zdt/hoQdVTax8nL1MO5w/TY9guw=

The length of the digest value for a given algorithm is always the same regardless of the length of the message.

As you can see, the digest values are sufficiently short that they could easily be confirmed over the telephone.

Interesting Code Fragments

The first fragment shows the beginning of the controlling class and the beginning of the main method. This fragment:

The method named digestIt() is the key to the entire program. I will discuss digestIt() and displayBase64() later.

class Security06 {
  public static void main(String[] args) {
    byte[] dataBuffer = "ABCDEFG".getBytes();
    System.out.println(new String(dataBuffer));
    byte[] theDigest = digestIt(dataBuffer);
    displayBase64(theDigest);

Do it two more times

The next fragment completes the main() method for the program.

This fragment simply repeats the process using the same string value twice. Note that the string value differs from the previous fragment. It has an 'H' character appended onto the end of the string.

    //Do it again for a slightly different byte array
    dataBuffer = "ABCDEFGH".getBytes();
    System.out.println(new String(dataBuffer));
    theDigest = digestIt(dataBuffer);
    displayBase64(digestIt(theDigest));
    
    //Do it one more time for the same byte array
    dataBuffer = "ABCDEFGH".getBytes();
    System.out.println(new String(dataBuffer));
    theDigest = digestIt(dataBuffer);
    displayBase64(digestIt(theDigest));

  }//end main

The digestIt() method

The next fragment shows the beginning of the method named digestIt() that computes and returns the digest value for an incoming array of bytes.

Getting a MessageDigest object, factory methods, etc.

The interesting code in this fragment is the statement that calls the getInstance() factory method on the MessageDigest class to get an initialized object of the MessageDigest class.

SHA and SUN

The parameters (SHA and SUN) that are passed to the factory method cause the object to be initialized to perform the SHA message digest algorithm as provided by SUN on byte data that is later passed to the update() method of the object.

This is where initialization takes place for the MessageDigest class.

  static byte[] digestIt(byte[] dataIn){
    byte[] theDigest = null;
    try{
      MessageDigest messageDigest = 
                   MessageDigest.getInstance("SHA", "SUN");

Feeding a hungry update() method

The next fragment feeds the byte array data to the update() method of the object.

Although this method can accommodate repetitive calls passing new data with each call, only one call was needed in this simple program because all of the message data was contained in a single byte array.

      messageDigest.update(dataIn);

Completing the digestion, the digest() method

The next fragment

      theDigest = messageDigest.digest();
    }catch(Exception e){System.out.println(e);}
    return theDigest;
  }//end digestIt()

Another version of the digest() method

There is another version of the digest() method, which, during a single call will

Could have done it differently

Therefore, for this simple case, where the entire message was available in a single array of bytes, I could have used that version of the digest() method, and could have skipped the use of the update() method altogether.

However, I didn't do that because I wanted to use this simple program to introduce you to the general API structure involving a pair of cooperating methods that work together to produce the desired result.

The first method in the pair can be called repeatedly to apply the algorithm to the data in chunks, and the second method can be called only once to complete the task.

What is Base 64 format?

Base 64 is a system for representing an array of eight-bit bytes as seven-bit ASCII characters instead of eight-bit bytes.

Basically, it hooks a group of 8-bit bytes together end-to-end, and then moves through them extracting the bits seven bits at a time.

The reverse process hooks the seven-bit groups together end-to-end and then extracts the bits, eight bits at a time.

Why Base 64 format?

This is useful for passing data through a system that can only accommodate seven-bit ASCII data.

It is also useful for displaying byte data because not all of the possible bit combinations in an eight-bit byte have a display rendering that we like to see.

This is a useful way to display a digest value if you are going to discuss it with someone over the telephone. With the Base 64 representation, only characters having common, well-known names appear. This isn't true if you display the raw eight-bit data. If you need to display and discuss the raw eight-bit data, you will probably need to display it in hexadecimal format.

As of 4/16/99, you can learn more about Base 64 at http://rfc.fh-koeln.de/rfc/html/rfc1521.html.

The encodeBuffer() method

The next fragment illustrates the use of the encodeBuffer() method of the sun.misc.BASE64Encoder class to encode an array of bytes into Base 64 format and to display the result.

  static void displayBase64(byte[] data){
    BASE64Encoder encoder = new BASE64Encoder();
    String encoded = encoder.encodeBuffer(data);
    System.out.println(encoded);
  }//end base64Display()

The class also has a decodeBuffer() method that can be used to reverse the process.

Program Listing

A complete listing of the program is contained in this section.

/*File Security06.java Copyright 1999, R.G.Baldwin
Rev 4/16/99

This program provides an introduction to the use of the
MessageDigest class in Java.

The program creates and displays the digests for three
short strings.  Two of the strings are the same.

The program shows that the digest differs significantly
for two strings that are different in only one character.
It also shows that the digest that is computed for a given
string is always the same.

The results are displayed in Base 64 format.

Tested using JDK 1.2 and Win95.

This program produces the following output:
  
ABCDEFG
k75GEsQdI68YkdrF/Q1TVzb/xOM=

ABCDEFGH
Zdt/hoQdVTax8nL1MO5w/TY9guw=

ABCDEFGH
Zdt/hoQdVTax8nL1MO5w/TY9guw=
**********************************************************/

import java.io.*;
import java.security.*;
import sun.misc.*;

class Security06 {

  public static void main(String[] args) {
    //Create a simple byte array containing data
    byte[] dataBuffer = "ABCDEFG".getBytes();
    //Display the contents of the byte array
    System.out.println(new String(dataBuffer));
    //Get a digest for the byte array
    byte[] theDigest = digestIt(dataBuffer);
    //Display the digest in Base64 format
    displayBase64(theDigest);

    //Do it again for a slightly different byte array
    dataBuffer = "ABCDEFGH".getBytes();
    System.out.println(new String(dataBuffer));
    theDigest = digestIt(dataBuffer);
    displayBase64(digestIt(theDigest));
    
    //Do it one more time for the same byte array
    dataBuffer = "ABCDEFGH".getBytes();
    System.out.println(new String(dataBuffer));
    theDigest = digestIt(dataBuffer);
    displayBase64(digestIt(theDigest));

  }//end main
  //-----------------------------------------------------//

  //This method generates and returns a digest for an
  // incoming array of bytes.
  static byte[] digestIt(byte[] dataIn){
    byte[] theDigest = null;
    try{
      //Create a MessageDigest object implementing
      // the SHA algorithm, as supplied by SUN
      MessageDigest messageDigest = 
                   MessageDigest.getInstance("SHA", "SUN");
      //Feed the byte array to the digester.  Can 
      // accommodate multiple calls if needed
      messageDigest.update(dataIn);
      //Complete the digestion and save the result
      theDigest = messageDigest.digest();
    }catch(Exception e){System.out.println(e);}
    //Return the digest value to the calling method 
    return theDigest;
  }//end digestIt()
  //-----------------------------------------------------//

  //Method to display an array of bytes in base 64 format
  static void displayBase64(byte[] data){
    BASE64Encoder encoder = new BASE64Encoder();
    String encoded = encoder.encodeBuffer(data);
    System.out.println(encoded);
  }//end base64Display()
  //-----------------------------------------------------//
}//end class Security06

-end-