Learn to Program using Python

Strings, Part I

by Richard G. Baldwin
baldwin.richard@iname.com

File Pyth0012.htm

April 6, 2000


Preface

This document is part of a series of online tutorial lessons designed to teach you how to program using the Python scripting language.

Something for everyone

Beginners start at the beginning, and experienced programmers jump in further along. Lesson 1 provides an overall description of this online programming course.

Introduction

I am taking it slow and easy for the first few lessons.  My informal discussion is designed to familiarize you with the Python interactive programming environment while teaching you some important programming concepts at the same time.

This lesson provides an introduction to the use of strings.

What Is A String

The common interpretation of the word string in computer programming jargon is that a string is a sequence of characters that is treated as a unit.  For example, a person's first and last names are often treated as two different strings.

A person's first name usually consists of several characters, and these characters are treated as a unit to produce a name.

What is a literal?

Perhaps the best way to describe a literal is to describe what it is not.

A literal is not a variable.  In other words, the value of a literal doesn't change with time as the program executes.  You might say that it is taken at face value.

An expression using variables

For example, the following expression describes the sum of two variables named var1 and var2:

sum = var1 + var2

The result of this expression can vary depending on the values stored in var1 and var2 at the instant in time that the expression is evaluated.

An expression using literals

On the other hand, the following expression describes the sum of two literal numeric values:

sum = 6 + 8

No matter when this expression is evaluated, it will always produce a sum of 14.

String literals

Literal values can also be used for strings.

For example, the interactive code fragment in Figure 1 shows

Oops!

The first two entries are valid string literals.  As you can see, in the first two cases, the interpreter displays my name in the output.

Note that in the first two cases, my name is surrounded by either quotes (sometimes called double quotes) or apostrophes (sometimes called single quotes).

A syntax error

However, the third entry is not a valid string literal, and the interactive interpreter produced a syntax error message.  In the third case, my name is not surrounded by either double quotes or single quotes, and that is what produced the error.

So, what is a valid string literal?

According to the Python Reference Manual,
 
String literals can be enclosed in matching single quotes (') or double quotes (").

This explains why the first two input lines in the above interactive code fragment were accepted and the third line produced an error.

Proper syntax

In the first line, my name was surrounded by matching double quotes.  In the second input line, my name was surrounded by matching single quotes.

Bad syntax

However, in the third input line, my name was not surrounded by quotes of either type and this produced a syntax error.

More examples

Figure 2 shows two more examples of valid string literals with the input value highlighted in boldface.

(Note that I purposely colored the "\012" in red to make it stand out.  It was not that color in the original interpreter output.  I will explain what it means later.)

What does """...""" mean?

This syntax is explained by the following excerpt from the Python Reference Manual
 
Strings can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). 

The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. 
...
In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A ``quote'' is the character used to open the string, i.e. either ' or ".) 

Use of triple quoted strings

One of the main advantages of using triple-quoted strings is that this makes it possible to

This is illustrated in Figure 3, which shows my name, surrounded by matching triple quotes and split onto two consecutive lines of input.

The newline (\012) character

When this triple quoted, multiple-line input was displayed, by the interpreter, the display included "\012".

This is a numeric representation of the newline character.  (I will show you another representation later.)  It appeared in the output at the point representing the end of the first line of input.  This indicates that the interpreter knows and remembers that the input string was split across two lines.

Why "represent" the newline character?

As the name implies, a newline character is a character that means, "Go to the beginning of the next line."

The newline character is sort of like the wind.  You can't see the wind, but you can see the result of the wind blowing through a tree.

Similarly, you can't see a newline character, but you can see what it does.  Therefore, we must represent it by something else, like \012 if we want to be able to see where it appears within a string.

An escape sequence

The \012 is what we call an escape sequence.  I will discuss escape sequences in detail a little later.

One more syntax option

The Python Reference Manual describes one more syntax option for strings as shown below.  I am going to let this one lie for the time being.  I will come back and address it in a subsequent lesson if I have the time.  I am including it here simply for completeness.
 
String literals may optionally be prefixed with a letter `r' or `R'; such strings are called raw strings and use different rules for backslash escape sequences. 
... 
Unless an `r' or `R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C."

What Are Escape Sequences

Escape sequences are special sequences of characters used to represent other characters that The newline character

An example of the first category is the newline character.  Except when using triple quoted strings, you cannot enter the newline character directly into a string.

Why?  Because when you press the Enter key in an attempt to enter a newline, that simply terminates your input for that line.  It doesn't enter the newline character into the string.

Using the newline character

The interactive code fragment in Figure 4 illustrates the use of an escape sequence to enter the newline character into a string.  Note the \012 between my first and last names.

What does print mean?

This fragment uses a print statement.  I haven't explained that statement to you before, but you can probably guess what it means.

When print is used interactively, it is a request to have its right operand (the expression to its right) printed on the next line.  In this case, it is a request to have my name printed on the next line.

Including the newline character

In this fragment, I entered the newline escape sequence between my first and last names when I constructed the string.  Then, when the string was printed, the cursor advanced to a new line following my first name and printed my last name on the new line.  That is what escape sequences are all about.

print renders according to meaning

Note also that the print statement rendered the newline character according to its meaning.

What I mean by this is that the print statement did not print something that represented the newline character (\012) as we have seen before.  Rather, it actually did what a newline character is supposed to do  -- go to the beginning of the next line.

Escaping the quote character

Suppose that you are constructing a string that is surrounded by double quotes, and you want to use a pair of double quotes inside the string.  If you were to simply enter the double quote when you construct the string, that quote would terminate the string.

The interactive code fragment in Figure 5 shows how to escape the double quote character -- precede it with a backslash character.

What I mean by this is that if you want to include a double quote inside a string that is surrounded by double quotes, you must enter the double quote inside the string as follows:  \"

Avoiding the quote problem

Because this is such a common problem, and because the escape solution is so ugly and difficult to read, Python gives us another way to deal with quotes inside of quotes.  This solution, shown in Figure 6, is the use of single and double quotes in combination.

In Python, double quotes can be included directly in strings that are surrounded by single quotes, and single quotes can be included directly in strings that are surrounded by double quotes.  This is much easier to read than the solution that requires you to place a lot of backslash characters inside your string.

List of escape sequences

A complete list of the escape sequences supported by Python is available in the Python Reference Manual.

More Ways to Span Lines

Just when you thought that you had seen it all, I am going to show you three more ways to span multiple lines with strings.  One of them is shown in Figure 7.

End the line with a backslash

As shown in Figure 7, the use of a backslash at the end of the line makes it possible to continue the string on a new line.  However, the backslash is not included in the output, and there is no newline character in the output.

Not restricted to strings

Actually, the backslash can be used at the end of a line to cause that line to be continued on the next line whether inside a string or not.  This is illustrated in the review section.

A form of concatenation

When used in this way with a string, the backslash at the end of the line becomes a form of string concatenation.  The portions of the strings on each of the input lines are concatenated to produce a single line containing both parts of the string in the output.

I will have more to say about string concatenation later in this lesson.

Use the \n escape sequence

As shown in Figure 8, the inclusion of "\n" inside the string produces the same result as the inclusion of the numeric representation of the newline character, "\012" shown earlier.

This is the common form of the newline escape sequence typically used in C, C++, and Java.

Combine backslash and \n

The code in Figure 9 shows how to combine the backslash at the end of the line with a newline character placed there to cause the output to closely resemble the input.

String Concatenation

To concatenate two strings means to hook them together end-to-end, thus producing a new string that is the combination of the two.

Literal string concatenation

You can cause literal strings to be concatenated just by writing one adjacent to the other as shown in Figure 10.

Note that you can mix the different quote types and it doesn't matter if there is whitespace in between.

Creating whitespace

However, if you want any space between the substrings in the output, you must include that space inside the quotes that delimit the individual strings as shown in Figure 11.

Using + for concatenation

The plus operator (+) can be used to concatenate strings as illustrated in Figure 11.

This fragment assigns string literal values to two variables, and then uses the plus operator to concatenate the contents of those variables with another string literal.

Of course, it could also have been used to concatenate the contents of the two variables without the string literal in between.

Whitespace is included in the quotes

Note that the string literals contain space characters.  There is a space after the d in my first name and before the B in my last name.  That is what I meant earlier when I said that if you want any space between the substrings in the output, you must include that space inside the quotes

More on Strings

I will have more to say about strings in a future lesson.  Before that, however, we need to learn how to create and execute script files, and we also need to learn a little more about Python syntax.

Review

1.  Describe the common meaning of the word string in your own words, and give some examples.

Ans:  The common interpretation of the word string in computer programming jargon is that a string is a sequence of characters that is treated as a unit.  For example, a person's first and last names are often treated as two different strings.

2.  Describe the common meaning of the word literal in your own words.

Ans:  Perhaps one way to describe the meaning of the word literal would be that the literal item is taken at face value, and its value is not subject to change as the program executes.

3.  Describe three different ways to format string literals (without spanning lines) and show examples.

Ans:  Surround with matching pairs of single quotes, double quotes, or triple quotes as shown in Figure 12.

4.  What is one of the advantages of using triple quoted strings?  Show an example.

Ans:  The use of triple quoted strings, as shown in Figure 13, makes it possible for you to continue a string on a new line, and to preserve the line break in the string.

5.  Show two different representations of the newline character.

Ans: \012 and \n as shown in Figure 14.  Of the two, the latter is probably the most commonly used, perhaps because it is easiest to remember.

6.  Describe, in your own words, the purpose of an escape sequence.  Show two examples.

Ans:  Escape sequences are special sequences of characters used to represent other characters that either

Examples are shown in Figure 15.

7.  Show two different ways to include a double quote character in a string.

Ans:  Surround with single quotes, or use an escape character as shown in Figure 16.

8.  Show the escape sequence for the tab character.

Ans:  The escape sequence for the tab character is \t as shown in Figure 17.
 
 

Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without  express written permission from Richard Baldwin is prohibited. 

About the author

Richard Baldwin is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

baldwin.richard@iname.com

-end-