Ch 5 -- Perl
UNIX Unleashed, Internet Edition
- 5 -
Perl
by David Till
The following sections tell you what Perl is and how you can get it, and provide
a short example of a working Perl program.
Features of Perl covered in this chapter include:
- Scalar variables, and string and integer interchangeability
- Arithmetic, logical, bitwise, and string operators
- List, array, and associative array manipulation
- Control structures for handling program flow
- File input and output capability
- Subroutines
- Formatted output
- References
- Object-oriented capability
- Built-in functions
Overview of Perl
Perl is a simple yet useful programming language that provides the convenience
of shell scripts and the power and flexibility of high-level programming languages.
Perl programs are interpreted and executed directly, just as shell scripts are; however,
they also contain control structures and operators similar to those found in the
C programming language. This gives you the ability to write useful programs in a
very short time.
Where Can I Get Perl?
Perl is freeware: It can be obtained by file transfer (ftp) from the
Free Software Foundation at prep.ai.mit.edu (in the directory pub/gnu).
Perl is also available from several other sites on the Internet, including any site
that archives the newsgroup comp.sources.unix.
The Perl artistic license gives you the right to obtain Perl and its source, provided
others have the right to obtain them from you. For more details on the Perl licensing
policy, refer to the Perl source distribution.
A Simple Sample Program
To show how easy it is to use Perl, Listing 5.1 is a simple program that echoes
(writes out) a line of input typed in at a terminal.
Listing 5.1. A sample Perl program.
#!/usr/bin/perl
$inputline = <STDIN>;
print ("$inputline");
To run this program, do the following:
- 1. Type in the program and save it in a file. (In subsequent steps, assume
the file is named foo).
2. Tell the system that this file contains executable statements. To do this,
enter the command chmod +x foo.
3. Run the program by entering the command foo.
If you receive the error message foo not found or some equivalent, either
enter the command ./foo or add the current directory . to your
PATH environment variable.
At this point, the program waits for you to type in an input line. After you have
done so, the program echoes your input line and exits.
The following sections describe each of the components of this simple program
in a little more detail.
Using Comments
The first line of this program is an example of a Perl comment. In Perl, any time
a # character is recognized, the rest of the line is treated as a comment:
# this is a comment that takes up the whole line
$count = 0; # this part of the line is a comment
A comment appearing as the first line of a program is special. This header comment
indicates the location of the program interpreter to use. In this example, the string
!/usr/bin/perl indicates that this file is a Perl program.
The Perl interpreter should be located in /usr/bin/perl on your system.
If it is not, replace /usr/bin/perl in the header comment with the location
of the Perl interpreter on your system.
Reading from Standard Input
Like C, Perl recognizes the existence of the UNIX standard input file, standard
output file, and standard error file. In C, these files are called stdin,
stdout, and stderr; in Perl, they are called STDIN, STDOUT,
and STDERR.
The Perl construct <STDIN> refers to a line of text read in from
the standard input file. This line of text includes the closing newline character.
Storing Values in Scalar Variable
The construct $inputline is an example of a scalar variable. A scalar
variable is a variable that holds exactly one value. This value can be a string,
integer, or floating-point number.
All scalar variables start with a dollar sign, $. This distinguishes
them from other Perl variables. In a scalar variable, the character immediately following
the dollar sign must be a letter or an underscore. Subsequent characters can be letters,
digits, or underscores. Scalar variable names can be as long as you like.
For more information on scalar variables and their values, see the section titled
"Working with Scalar Variables," later in this chapter.
Assigning a Value to a Scalar Variable
The statement $inputline = <STDIN>; contains the = character,
which is the Perl assignment operator. This statement tells Perl that the line of
text read from standard input, represented by <STDIN>, is to become
the new value of the scalar variable $inputline.
Perl provides a full set of useful arithmetic, logical, and string operators.
For details, refer to the sections titled "Working with Scalar Variables"
and "Using Lists and Array Variables," later in this chapter.
CAUTION: All scalar variables are given
an initial value of the null string, "". Therefore, a Perl program
can be run even when a scalar variable is used before a value has been assigned to
it. Consider the statement
$b = $a;
This statement assigns the value of the variable $a to $b. If
$a has not been seen before, it is assumed to have the value "",
and "" is assigned to $b. Because this behavior is legal
in Perl, you must check your programs for "undefined" variables yourself.
Scalar Variables Inside Character Strings
The final statement of the program, print ("$inputline");,
contains a character string, which is a sequence of characters enclosed in double
quotes. In this case, the character string is "$inputline".
The string "$inputline" contains the name of a scalar variable,
$inputline. When Perl sees a variable inside a character string, it replaces
the variable with its value. In this example, the string "$inputline"
is replaced with the line of text read from the standard input file.
Writing to Standard Output
The built-in function print() writes its arguments (the items enclosed
in parentheses) to the standard output file. In this example, the statement print
("$inputline"); sends the contents of the scalar variable $inputline
to the standard output file.
The print() function can also be told to write to the standard error
file or to any other specified file. See the section titled "Reading from and
Writing to Files" later in this chapter for more details.
Working with Scalar Variables
Now that you know a little about Perl, it's time to describe the language in a
little more detail. This section begins by discussing scalar variables and the values
that can be stored in them.
Understanding Scalar Values
In Perl, a scalar value is any value that can be stored in a scalar variable.
The following are scalar values:
- Integers
- Double- and single-quoted character strings
- Floating-point values
The following assignments are all legal in Perl:
$variable = 1;
$variable = "this is a string";
$variable = 3.14159;
The following assignments are not legal:
$variable = 67M;
$variable = ^803;
$variable = $%$%!;
Using Octal and Hexadecimal Representation
Normally, integers are assumed to be in standard base-ten notation. Perl also
supports base-eight (octal) and base-sixteen (hexadecimal) notation.
To indicate that a number is in base-eight, put a zero in front of the number:
$a = 0151; # 0151 octal is 105
To indicate base-sixteen, put 0x (or 0X) in front of the number:
$a = 0x69; # 69 hex is also 105
The letters A through F (in either upper- or lowercase) represent the values 10
through 15:
$a = 0xFE; # equals 16 * 15 + 1 * 14, or 254
NOTE: Strings containing a leading 0
or 0x are not treated as base-eight or base-sixteen:
$a = "0151";
$a = "0x69";
These strings are treated as character strings whose first character is 0.
Using Double- and Single-Quoted Strings
So far, all of the strings you have seen have been enclosed by the "
(double quotation mark) characters:
$a = "This is a string in double quotes";
Perl also allows you to enclose strings using the ' (single quotation
mark) character:
$a = 'This is a string in single quotes';
There are two differences between double-quoted strings and single-quoted strings.
The first difference is that variables are replaced by their values in double-quoted
strings, but not in single-quoted strings:
$x = "a string";
$y = "This is $x"; # becomes "This is a string"
$z = 'This is $x'; # remains 'This is $x'
Also, double-quoted strings recognize escape sequences for special characters.
These escape sequences consist of a backslash () followed by one or more
characters. The most common escape sequence is n, representing the newline
character:
$a = "This is a string terminated by a newlinen";
Table 5.1 lists the escape sequences recognized in double-quoted strings.
Table 5.1. Escape sequences in double-quoted strings.
Escape sequence |
Meaning |
a |
Bell (beep) |
b |
Backspace |
cn |
The control-n character |
e |
Escape |
E |
Cancel the effect of L, U, or Q |
f |
Form feed |
l |
Force the next letter to lowercase |
L |
All following letters are lowercase |
n |
Newline |
Q |
Do not look for special pattern characters |
r |
Carriage return |
t |
Tab |
u |
Force the next letter to uppercase |
U |
All following letters are uppercase |
v |
Vertical tab |
L and U can be turned off by E:
$a = "TLHIS IS A ESTRING"; # same as "This is a STRING"
To include a backslash or double quote in a double-quoted string, precede it with
another backslash:
$a = "A quote " in a string";
$a = "A backslash \ in a string";
You can specify the ASCII value for a character in base-eight or octal notation
using nnn, where each n is an octal digit:
$a = "377"; # this is the character 255, or EOF
You can also use hexadecimal to specify the ASCII value for a character. To do
this, use the sequence xnn, where each n is a hexadecimal
digit:
$a = "xff"; # this is also 255
None of these escape sequences is supported in single-quoted strings, except for
' and \, which represent the single quote character and the backslash,
respectively:
$a = 'b is not a bell'
$a = 'a single quote ' in a string'
$a = 'a backslash \ in a string'
NOTE: In Perl, strings are not terminated
by a null character (ASCII 0) as they are in C. In Perl, the null character
can appear anywhere in a string:
$a = "This string 00 has a null character in it";
Using Floating-Point Values
Perl supports floating-point numbers in both conventional and scientific notation.
The letter E (or e) represents the power of 10 to which a number
in scientific notation is to be raised.
$a = 11.3; # conventional notation
$a = 1.13E01; # 11.3 in scientific notation
$a = -1.13e-01; # the above divided by -10
CAUTION: Perl uses your machine's floating-point
representation. This means that only a certain number of digits (in mathematical
terms, a certain precision) is supported. For example, consider the following very
short program:
#!/usr/bin/perl
$pi = 3.14159265358979233;
print ("pi is $pin");
This program prints the following:
pi = 3.1415926535897922
This is because there just isn't room to keep track of all of the digits of pi
specified by the program.
This problem is made worse when arithmetic operations are performed on floating-point
numbers; see "Performing Comparisons" for more information on this problem.
Note that most programming languages, including C, have this problem.
Interchangeability of Strings and Numeric Values
In Perl, as you have seen, a scalar variable can be used to store a character
string, an integer, or a floating-point value. In scalar variables, a value that
was assigned as a string can be used as an integer whenever it makes sense to do
so, and vice versa. For example, consider the program in file LIST 5_2 on
this book's CD-ROM, which converts distances from miles to kilometers and vice versa.
In this example, the scalar variable $originaldist contains the character
string read in from the standard input file. The contents of this string are then
treated as a number, multiplied by the miles-to-kilometers and kilometers-to-miles
conversion factors, and stored in $miles and $kilometers.
This program also contains a call to the function chop(). This function
throws away the last character in the specified string. In this case, chop()
gets rid of the newline character at the end of the input line.
If a string contains characters that are not digits, it is converted to 0:
# this assigns 0 to $a, because "hello" becomes 0
$a = "hello" * 5;
In cases like this, Perl does not tell you that anything has gone wrong and your
results might not be what you expect.
Also, strings containing misprints yield unexpected results:
$a = "12O34"+1 # the letter O, not the number 0
When Perl sees a string in the middle of an expression, it converts the string
to an integer. To do this, it starts at the left of the string and continues until
it sees a letter that is not a digit. In this case, "12O34" is
converted to the integer 12, not 12034.
Using Scalar Variable Operators
The statement $miles = $originaldist * 0.6214; uses two scalar variable
operators: =, the assignment operator, which assigns a value to a variable,
and *, the multiplication operator, which multiplies two values.
Perl provides the complete set of operators found in C, plus a few others. These
operators are described in the following sections.
Performing Arithmetic
To do arithmetic in Perl, use the arithmetic operators. Perl supports the following
arithmetic operators:
$a = 15; # assignment: $a now has the value 15
$a = 4 + 5.1; # addition: $a is now 9.1
$a = 17 - 6.2; # subtraction: $a is now 10.8
$a = 2.1 * 6; # multiplication: $a is now 12.6
$a = 48 / 1.5; # division: $a is now 32
$a = 2 ** 3; # exponentiation: $a is now 8
$a = 21 % 5; # remainder (modulo): $a is now 1
$a = - $b; # arithmetic negation: $a is now $b * -1
Non-integral values are converted to integers before a remainder operation is
performed:
$a = 21.4 % 5.1; # identical to 21 % 5
Performing Comparisons
To compare two scalar values in Perl, use the logical operators. Logical operators
are divided into two classes: numeric and string. The following numeric logical operators
are defined:
11.0 < 16 # less than
16 > 11 # greater than
15 == 15 # equals
11.0 <= 16 # less than or equal to
16 >= 11 # greater than or equal to
15 != 14 # not equal to
$a || $b # logical OR: true if either is non-zero
$a && $b # logical AND: true only if both are non-zero
! $a # logical NOT: true if $a is zero
In each case, the result of the operation performed by a logical operator is non-zero
if true and zero if false, just like in C.
The expression on the left side of a || (logical OR) operator
is always tested before the expression on the right side, and the expression on the
right side is used only when necessary. For example, consider the following expression:
$x == 0 || $y / $x > 5
Here, the expression on the left side of the ||, $x == 0, is
tested first. If $x is zero, the result is true, regardless of the value
of $y / $x > 5, so Perl doesn't bother to compute this value. $y
/ $x > 5 is evaluated only if $x is not zero. This ensures that
division by zero can never occur.
Similarly, the expression on the right side of a && operator
is tested only if the expression on the left side is true:
$x != 0 && $y / $x > 5
Once again, a division-by-zero error is impossible, because $y / $x > 5
is only evaluated if $x is non-zero.
Perl also defines the <=> operator, which returns 0 if
the two values are equal, 1 if the left value is larger, and -1
if the right value is larger:
4 <=> 1 # returns 1
3 <=> 3.0 # returns 0
1 <=> 4.0 # returns -1
CAUTION: Be careful when you use floating-point
numbers in comparison operations, because the result might not be what you expect.
Consider the following code fragment:
$val1 = 14.3;
$val2 = 100 + 14.3 - 100;
print "val1 is $val1, val2 is $val2n";
On first examination, $val1 and $val2 appear to contain the
same value--14.3. However, the print statement produces the following:
val1 is 14.300000000000001, val2 is 14.299999999999997
Adding and subtracting 100 affects the value stored in $val2
because of the way floating-point values are calculated and stored on the machine.
As a result, $val1 and $val2 are not the same, and $val1 ==
$val2 is not true.
This problem occurs in most programming languages (including C).
Besides the preceding numeric logical operators, Perl also provides logical operators
that work with strings:
"aaa" lt "bbb" # less than
"bbb" gt "aaa" # greater than
"aaa" eq "aaa" # equals
"aaa" le "bbb" # less than or equal to
"bbb" ge "aaa" # greater than or equal to
"aaa" ne "bbb" # not equal to
Perl also defines the cmp operator, which, like the numeric operator
<=>, returns 1, 0, or -1:
"aaa" cmp "bbb" # returns 1
"aaa" cmp "aaa" # returns 0
"bbb" cmp "aaa" # returns -1
This behavior is identical to that of the C function strcmp().
Note that the logical string operators perform string comparisons, not numeric
comparisons. For example, "40" lt "8" is true; if the
two strings are sorted in ascending order, "40" appears before
"8".
Manipulating Bits
Any integer can always be represented in binary or base-two notation. For example,
the number 38 is equivalent to the binary value 100110: 32
plus 4 plus 2. Each 0 or 1 in this binary value
is called a bit.
If a Perl scalar value happens to be an integer, Perl allows you to manipulate
the bits that make up that integer. To do this, use the Perl bitwise operators.
The following bitwise operators are supported in Perl:
- The & (bitwise AND) operator
- The | (bitwise OR) operator
- The ^ (bitwise EXOR, or exclusive OR) operator
- The ~ (bitwise NOT) operator
- The << (left-shift) and >> (right-shift) operators
If a scalar value is not an integer, it is converted to an integer before a bitwise
operation is performed:
$a = 24.5 & 11.2 # identical to $a = 24 & 11
The & operator works as follows: First, it examines the values on
either side of the &. (These values are also known as the operands of
the & operator.) These values are examined in their binary representations.
For example, consider the following bitwise operation:
$a = 29 & 11;
In this case, 29 is converted to 11101, and 11 is converted
to 01011. (A binary representation can have as many leading zeroes as you
like.)
Next, Perl compares each bit of the first operand with the corresponding bit in
the second operand:
11101
01011
In this case, only the second and fifth bits (from the left) of the two operands
are both 1; therefore, the binary representation of the result is 01001,
or 9.
The | operator works in much the same way. The bits of the two operands
are compared one at a time; if a bit in the first operand is 1 or its corresponding
bit in the second operand is 1, the bit in the result is set to 1.
Consider this example:
$a = 25 | 11;
Here, the binary representations are 11001 and 01011. In this
case, only the third bits are both 0, and the result is 11011,or
27.
The ^ operator sets a result bit to 1 if exactly one of the
corresponding bits in an operand is 0 If both bits are 1 or both
are 0, the result bit is set to 0. In the example $a = 25 ^
11; the binary representations of the operands are 11001 and 01011,
and the result is 10010, or 18.
The ~ operator works on one operand. Every 0 bit in the operand
is changed to a 1, and vice versa. For example, consider the following:
$a = ~ 25;
Here, the binary representation of 25 is 11001. The result,
therefore, is 00110, or 6.
The << operator shifts the bits of the left operand the number
of places specified by the right operand, and fills the vacated places with zeroes:
$a = 29 << 2;
The value 29, whose binary representation is 11101, is shifted
left two positions. This produces the result 1110100, or 116.
Similarly, the >> operator shifts the bits rightward, with the
rightmost bits being lost:
$a = 29 >> 2;
In this case, 29, or 11101, is shifted right two places. The
01 on the end is thrown away, and the result is 111, or 7.
Shifting left 1 bit is equivalent to multiplying by 2:
$a = 54 << 1; # this result is 108
$a = 54 * 2; # this result is also 108
Shifting right 1 bit is equivalent to dividing by 2:
$a = 54 >> 1; # this result is 27
$a = 54 / 2; # this result is also 27
Similarly, shifting left or right n bits is equivalent to multiplying or
dividing by 2**n.
Using the Assignment Operators
The most common assignment operator is the = operator, which you've already
seen:
$a = 9;
Here, the value 9 is assigned to the scalar variable $a.
Another common assignment operator is the += operator, which combines
the operations of addition and assignment:
$a = $a + 1; # this adds 1 to $a
$a += 1; # this also adds 1 to $a
Other assignment operators exist that correspond to the other arithmetic and bitwise
operators:
$a -= 1; # same as $a = $a - 1
$a *= 2; # same as $a = $a * 2
$a /= 2; # same as $a = $a / 2
$a %= 2; # same as $a = $a % 2
$a **= 2; # same as $a = $a ** 2
$a &= 2; # same as $a = $a & 2
$a |= 2; # same as $a = $a | 2
$a ^= 2; # same as $a = $a ^ 2
Using Autoincrement and Autodecrement
Another way to add 1 to a scalar variable is with the ++, or
the autoincrement, operator:
++$a; # same as $a += 1 or $a = $a + 1
This operator can appear either before or after its operand:
$a++; # also equivalent to $a += 1 and $a = $a + 1
The ++ operator can also be part of a more complicated sequence of operations.
(A code fragment consisting of a sequence of operations and their values is known
as an expression.) Consider the following statements:
$b = ++$a;
$b = $a++;
In the first statement, the ++ operator appears before its operand. This
tells Perl to add 1 to $a before assigning its value to $b:
$a = 7;
$b = ++$a; # $a and $b are both 8
If the ++ operator appears after the operand, Perl adds 1 to $a
after assigning its value to $b:
$a = 7;
$b = $a++; # $a is now 8, and $b is now 7
Similarly, the --, or autodecrement, operator subtracts 1 from the value
of a scalar variable either before or after assigning the value:
$a = 7;
$b = --$a; # $a and $b are both 6
$a = 7;
$b = $a--; # $a is now 6, and $b is now 7
The ++ and -- operators provide a great deal of flexibility,
and are often used in loops and other control structures.
CAUTION: Do not use the ++ and
-- operators on the same variable more than once in the same expression:
$b = ++$a + $a++;
The value assigned to $b depends on which of the operands of the +
operator is evaluated first. On some systems, the first operand (++$a) is
evaluated first. On others, the second operand ($a++) is evaluated first.
You can ensure that you get the result you want by using multiple statements and
the appropriate assignment operator:
$b = ++$a;
$b += $a++;
Concatenating and Repeating Strings
Perl provides three operators that operate on strings: the . operator,
which joins two strings together; the x operator, which repeats a string;
and the .= operator, which joins and then assigns.
The . operator joins the second operand to the first operand:
$a = "be" . "witched"; # $a is now "bewitched"
This join operation is also known as string concatenation.
The x operator (the letter x) makes n copies of a string,
where n is the value of the right operand:
$a = "t" x 5; # $a is now "ttttt"
The .= operator combines the operations of string concatenation and assignment:
$a = "be";
$a .= "witched"; # $a is now "bewitched"
Using Other C Operators
Perl also supports the following operators found in the C programming language:
the , (comma) operator, and the ? and : (conditional)
operator combination.
The , operator ensures that one portion of an expression is evaluated
first:
$x += 1, $y = $x;
The , operator breaks this expression into two parts:
$x += 1
$y = $x
The part before the comma is performed first. Thus, 1 is added to $x
and then $x is assigned to $y.
The ? and : combination allows you to test the value of a variable
and then perform one of two operations based on the result of the test. For example,
in the expression $y = $x == 0 ? 15 : 8, the variable $x is compared
with 0. If $x equals 0, $y is assigned 15; if $x is not
0, $y is assigned 8.
Matching Patterns
Perl allows you to examine scalar variables and test for the existence of a particular
pattern in a string. To do this, use the =~ (pattern-matching) operator:
$x =~ /jkl/
The character string enclosed by the / characters is the pattern to be
matched, and the scalar variable on the left of the =~ operator is the variable
to be examined. This example searches for the pattern jkl in the scalar
variable $x. If $x contains jkl, the expression is true;
if not, the expression is false. In the statement $y = $x =~ /jkl/;, $y
is assigned a non-zero value if $x contains jkl, and is assigned
zero if $x does not contain jkl.
The !~ operator is the negation of =~:
$y = $x !~ /jkl/;
Here, $y is assigned zero if $x contains jkl, and a
non-zero value otherwise.
Using Special Characters in Patterns You can use several special characters
in your patterns. The * character matches zero or more of the character
it follows:
/jk*l/
This matches jl, jkl, jkkl, jkkkl, and so
on.
The + character matches one or more of the preceding character:
/jk+l/
This matches jkl, jkkl, jkkkl, and so on.
The ? character matches zero or one copies of the preceding character:
/jk?l/
This matches jl or jkl.
The { and } characters specify the number of occurrences of
a character that constitute a match:
/jk{1,3}l/ # matches jkl, jkkl, or jkkkl
/jk{3}l/ # matches jkkkl
/jk{3,}l/ # matches j, three or more k's, then l
/jk{0,2}l/ # matches jl, jkl, or jkkl
The character . matches any character except the newline character:
/j.l/
This matches any pattern consisting of a j, any character, and an l.
If a set of characters is enclosed in square brackets, any character in the set
is an acceptable match:
/j[kK]l/ # matches jkl or jKl
Consecutive alphanumeric characters in the set can be represented by a dash (-):
/j[k1-3K]l/ # matches jkl, j1l, j2l, j3l or jKl
You can specify that a match must be at the start or end of a line by using ^
or $:
/^jkl/ # matches jkl at start of line
/jkl$/ # matches jkl at end of line
/^jkl$/ # matches line consisting of exactly jkl
You can specify that a match must be either on a word boundary or inside a word
by including b or B in the pattern:
/bjkl/ # matches jkl, but not ijkl
/Bjkl/ # matches ijkl, but not jkl
Some sets are so common that special characters exist to represent them:
- d matches any digit and is equivalent to [0-9].
- D matches any character that is not a digit.
- w matches any word character (a character that can appear in a variable
name); it is equivalent to [A-Za-z_0-9].
- W matches any character that is not a word character.
- s matches any whitespace (any character not visible on the screen);
it is equivalent to [ rtnf]. (These backslash characters were explained
in the section titled "Using Double- and Single-Quoted Strings," earlier
in this chapter.)
- S matches any character that is not whitespace.
To match all but a specified set of characters, specify ^ at the start
of your set:
/j[^kK]l/
This matches any string containing j, any character but k or
K, and l.
To specify two or more acceptable patterns for a match, use the | character:
/jkl|pqr/ # matches jkl or pqr
If you are using Perl 5, you can specify positive or negative look-ahead conditions
for a match:
/jkl(?=pqr)/ # match jkl only if it is followed by pqr
/jkl(?!pqr)/ # match jkl if not followed by pqr
To use a special character as an ordinary character, precede it with a backslash
():
/j*l/ # this matches j*l
This matches j*l.
In patterns, the * and + special characters match as many characters
in a string as possible. For exaple, consider the following:
$x = "abcde";
$y = $x =~ /a.*/;
The pattern /a.*/ can match a, ab, abc, abcd,
or abcde. abcde is matched, because it is the longest. This becomes
meaningful when patterns are used in substitution.
Substituting and Translating Using Patterns You can use the =~
operator to substitute one string for another:
$val =~ s/abc/def/; # replace abc with def
$val =~ s/a+/xyz/; # replace a, aa, aaa, etc., with xyz
$val =~ s/a/b/g; # replace all a's with b's
Here, the s prefix indicates that the pattern between the first /
and the second is to be replaced by the string between the second / and
the third.
You can also translate characters using the tr prefix:
$val =~ tr/a-z/A-Z/; # translate lower case to upper
Here, any character matched by the first pattern is replaced by the corresponding
character in the second pattern.
The Order of Operations
Consider the following statement:
$a = 21 * 2 + 3 << 1 << 2 ** 2;
The problem: Which operation should be performed first?
The following sections answer questions of this type.
Precedence In standard grade-school arithmetic, certain operations are
always performed before others. For example, multiplication is always performed before
addition:
4 + 5 * 3
Because multiplication is performed before addition, it has higher precedence
than addition.
Table 5.2 defines the precedence of the Perl operators described in these sections.
The items at the top of the table have the highest precedence, and the items at the
bottom have the lowest.
Table 5.2. Operator precedence in Perl.
Operator |
Description |
++, -- |
Autoincrement and autodecrement |
-, ~, ! |
Operators with one operand |
** |
Exponentiation |
=~, !~ |
Matching operators |
*, /, %, x |
Multiplication, division, remainder, and repetition |
+, -, . |
Addition, subtraction, and concatenation |
<<, >> |
Shifting operators |
-e, -r, etc. |
File status operators |
<, <=, >, >=, lt, le,
gt, ge |
Inequality comparison operators |
==, !=, <=>, eq, ne, cmp |
Equality comparison operators |
& |
Bitwise AND |
|, ^ |
Bitwise OR and exclusive OR |
&& |
Logical AND |
|| |
Logical OR |
.. |
List range operator |
? and : |
Conditional operator |
=, +=, -=, *=, etc. |
Assignment operators |
, |
Comma operator |
not |
Low-precedence logical NOT |
and |
Low-precedence logical AND |
or, xor |
Low-precedence logical OR and XOR |
For example, consider the following statement:
$x = 11 * 2 + 6 ** 2 << 2;
The operations in this statement are performed in the following order:
- 1. 6 ** 2, yielding 36
2. 11 * 2, yielding 22
3. 36 + 22, yielding 58
4. 58 << 2, yielding 116
Therefore, 116 is assigned to $x.
This operator precedence table contains some operators that are defined in later
sections. The .. (list range) operator is defined in the section titled
"Using Lists and Array Variables." The file status operators are described
in the section titled "Reading from and Writing to Files."
Associativity Consider the following statement:
$x = 2 + 3 - 4;
In this case, it doesn't matter whether the addition (2 + 3) or the subtraction
(3 - 4) is performed first, because the result is the same either way. However,
for some operations, the order of evaluation makes a difference:
$x = 2 ** 3 ** 2;
Is $x assigned the value 64 (8 ** 2) or the value 512 (2
** 9)?
To resolve these problems, Perl associates a specified associativity with each
operator. If an operator is right-associative, the rightmost operator is performed
first when two operators have the same precedence:
$x = 2 ** 3 ** 2; # the same as $x = 2 ** 9, or $x = 512
If an operator is left-associative, the leftmost operator is performed first when
two operators have the same precedence:
$x = 29 % 6 * 2; # the same as $x = 5 * 2, or $x = 10
The following operators in Perl are right-associative:
- The assignment operators (=, +=, and so on)
- The ? and : operator combination
- The ** operator (exponentiation)
- The operators that have only one operand (!, ~, and -)
All other operators are left-associative.
Forcing Precedence Using Parentheses Perl allows you to force the order
of evaluation of operations in expressions. To do this, use parentheses:
$x = 4 * (5 + 3);
In this statement, 5 is added to 3 and then multiplied by 4,
yielding 32.
You can use as many sets of parentheses as you like:
$x = 4 ** (5 % (8 - 6));
Here, the result is 4:
- 8 - 6 is performed, leaving 4 ** (5 % 2).
- 5 % 2 is performed, leaving 4 ** 1.
- 4 ** 1 is 4.
Using Lists and Array Variables
The Perl programs you have seen have only used scalar data and scalar variables.
In other words, they have dealt with one only value at a time.
Perl also allows you to manipulate groups of values, known as lists or arrays.
These lists can be assigned to special variables known as array variables, which
can be processed in a variety of ways.
This section describes lists and array variables, and how to use them. It also
describes how to pass command-line arguments to your program using the special-purpose
array @ARGV.
Introducing Lists
A list is a collection of scalar values enclosed in parentheses. The following
is a simple example of a list:
(1, 5.3, "hello", 2)
This list contains four elements, each of which is a scalar value: the numbers
1 and 5.3, the string "hello", and the number
2. As always in Perl, numbers and character strings are interchangeable:
Each element of a list can be either a number or a string.
A list can contain as many elements as you like (or as many as your machine's
memory can store at one time). To indicate a list with no elements, just specify
the parentheses:
() # this list is empty
Scalar Variables and Lists
Lists can also contain scalar variables:
(17, $var, "a string")
Here, the second element of the list is the scalar variable $var. When
Perl sees a scalar variable in a list, it replaces the scalar variable with its current
value.
A list element can also be an expression:
(17, $var1 + $var2, 26 << 2)
Here, the expression $var1 + $var2 is evaluated to become the second
element, and the expression 26 << 2 is evaluated to become the third
element.
Scalar variables can also be replaced in strings:
(17, "the answer is $var1")
In this case, the value of $var1 is placed into the string.
Using List Ranges
Suppose that you wanted to define a list consisting of the numbers 1
through 10, inclusive. You can do this by typing in each of the numbers
in turn:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
However, there is a simpler way to do it: Use the list range operator, which is
.. (two consecutive periods). The following is a list created using the
list range operator:
(1..10)
This tells Perl to define a list whose first value is 1, second value
is 2, and so on up to 10.
The list range operator can be used to define part of a list:
(2, 5..7, 11)
This list consists of five elements: the numbers 2, 5, 6,
7 and 11.
Elements that define the range of a list range operator can be expressions, and
these expressions can contain scalar variables:
($a..$b+5)
This list consists of all values between the current value of $a and
the current value of the expression $b+5.
Storing Lists in Array Variables
Perl allows you to store lists in special variables designed for that purpose.
These variables are called array variables.
The following is an example of a list being assigned to an array variable:
@array = (1, 2, 3);
Here, the list (1, 2, 3) is assigned to the array variable @array.
Note that the name of the array variable starts with the character @.
This allows Perl to distinguish array variables from other kinds of variables, such
as scalar variables, which start with the character $. As with scalar variables,
the second character of the variable name must be a letter, and subsequent characters
of the name can be letters, numbers, or underscores.
When an array variable is first created (seen for the first time), it is assumed
to contain the empty list () unless something is assigned to it.
Because Perl uses @ and $ to distinguish array variables from
string variables, the same name can be used in an array variable and in a string
variable:
$var = 1;
@var = (11, 27.1, "a string");
Here, the name var is used in both the string variable $var
and the array variable @var. These are two completely separate variables.
Assigning to Array Variables
As you have seen, lists can be assigned to array variables with the assignment
operator =:
@x = (11, "my string", 27.44);
You can also assign one array variable to another:
@y = @x;
A scalar value can be assigned to an array variable:
@x = 27.1;
@y = $x;
In this case, the scalar value (or value stored in a scalar variable) is converted
into a list containing one element.
Using Array Variables in Lists
As you have already seen, lists can contain scalar variables:
@x = (1, $y, 3);
Here, the value of the scalar variable $y becomes the second element
of the list assigned to @x.
You can also specify that the value of an array variable is to appear in a list:
@x = (2, 3, 4);
@y = (1, @x, 5);
Here, the list (2, 3, 4) is substituted for @x, and the resulting
list (1, 2, 3, 4, 5) is assigned to @y.
Assigning to Scalar Variables from Array Variables
Consider the following assignment:
@x = ($a, $b);
Here, the values of the scalar variables $a and $b are used
to form a two-element list that is assigned to the array variable @x.
Perl also allows you to take the current value of an array variable and assign
its components to a group of scalar variables:
($a, $b) = @x;
Here, the first element of the list currently stored in @x is assigned
to $a, and the second element is assigned to $b. Additional elements
in @x, if they exist, are not assigned.
If there are more scalar variables than elements in an array variable, the excess
scalar variables are given the value "" (the null string), which
is equivalent to the numeric value 0:
@x = (1, 2);
($a, $b, $c) = @x; # $a is now 1, $b is now 2, $c is now ""
Retrieving the Length of a List
As you already seen, when a scalar value is assigned to an array variable, the
value is assumed to be a list containing one element. For example, the following
statements are equivalent:
@x = $y;
@x = ($y);
However, the converse is not true. In the statement $y = @x;, the value
assigned to $y is the number of elements in the list currently stored in
@x:
@x = ("string 1", "string 2", "string 3");
$y = @x; # $y is now 3
To assign the value of the first element of a list to a scalar variable, enclose
the scalar variable in a list:
@x = ("string 1", "string 2", "string 3");
($y) = @x; # $y is now "string 1"
Using Array Slices
Perl allows you to specify what part of an array to use in an expression. The
following example shows you how to do this:
@x = (1, 2, 3);
@y = @x[0,1];
Here, the list (1, 2, 3) is first assigned to the array variable @x.
Then, the array slice [0,1] is assigned to @y: In other words,
the first two elements of @x are assigned to @y. (Note that the
first element of the array is specified by 0, not 1.)
You can assign to an array slice as well:
@x[0,1] = (11.5, "hello");
This statement assigns the value 11.5 to the first element of the array
variable @x and assigns the string "hello" to the second.
Array variables automatically grow when necessary, with null strings assigned
to fill any gaps:
@x = (10, 20, 30);
@x[4,5] = (75, 85);
Here, the second assignment increases the size of the array variable @x
from three elements to six, and assigns 75 to the fifth element and 85
to the sixth. The fourth element is set to be the null string.
Using Array Slices with Scalar Variables
An array slice can consist of a single element. In this case, the array slice
is treated as if it were a scalar variable:
@x = (10, 20, 30);
$y = $x[1]; # $y now has the value 20
Note that the array slice is now preceded by the character $, not the
character @. This tells Perl that the array slice is to be treated as a
scalar variable.
Recall that array variables and scalar variables can have the same name:
$x = "Smith";
@x = (47, "hello");
Here, the scalar variable $x and the array variable @x are both
defined, and are completely independent of one another. This can cause problems if
you want to include a scalar variable inside a string:
$y = "Refer to $x[1] for more information.";
In this case, Perl assumes that you want to substitute the value of the array
slice $x[1] into the string. This produces the following:
$y = "Refer to hello for more information.";
To specify the scalar variable and not the array slice, enclose the variable name
in braces:
$y = "Refer to ${x}[1] for more information.";
This tells Perl to replace $x, not $x[1], and produces the following:
$y = "Refer to Smith[1] for more information.";
Using the Array Slice Notation as a Shorthand
So far, we have been using the array slice notation @x[0,1] to refer
to a portion of an array variable. In Perl, an array slice described using this notation
is exactly equivalent to a list of single-element array slices:
@y = @x[0,1];
@y = ($x[0], $x[1]); # these two statements are identical
This allows you to use the array slice notation whenever you want to refer to
more than one element in an array:
@y = @x[4,1,5];
In this statement, the array variable @y is assigned the values of the
fifth, second, and sixth elements of the array variable @x.
@y[0,1,2] = @x[1,1,1];
Here, the second element of @x is copied to the first three elements
of @y.
In Perl, assignments in which the operands overlap are handled without difficulty.
Consider this example:
@x[4,3] = @x[3,4];
Perl performs this assignment by creating a temporary array variable, copying
@x[3,4] to it, and then copying it to @x[4,3]. Thus, this statement
swaps the values in the fourth and fifth elements of @x.
Other Array Operations
Perl provides a number of built-in functions that work on lists and array variables.
For example, you can sort array elements in alphabetic order, reverse the elements
of an array, remove the last character from all elements of an array, and merge the
elements of an array into a single string.
Sorting a List or Array Variable
The built-in function sort() sorts the elements of an array in alphabetic
order and returns the sorted list:
@x = ("this", "is", "a", "test");
@x = sort (@x); # @x is now ("a", "is", "test", "this")
Note that the sort is in alphabetic, not numeric, order:
@x = (70, 100, 8);
@x = sort (@x); # @x is now ("100", "70", "8")
The number 100 appears first because the string "100"
is alphabetically ahead of "70" (because "1"
appears before "7").
Reversing a List or Array Variable
The function reverse() reverses the order of the elements in a list or
array variable and returns the reversed list:
@x = ("backwards", "is", "array", "this");
@x = reverse(@x); # @x is now ("this", "array", "is", "backwards")
You can sort and reverse the same list:
@x = reverse(sort(@x));
This produces a sort in reverse alphabetical order.
Using chop() on Array Variables
The chop() function can be used on array variables as well as scalar
variables:
$a[0] = <STDIN>;
$a[1] = <STDIN>;
$a[2] = <STDIN>;
chop(@a);
Here, three input lines are read into the array variable @a--one in each
of the first three elements. chop() then removes the last character (in
this case, the terminating newline character) from all three elements.
Creating a Single String from a List
To create a single string from a list or array variable, use the function join():
$x = join(" ", "this", "is", "a", "sentence");
The first element of the list supplied to join() contains the characters
that are to be used to glue the parts of the created string together. In this example,
$x becomes "this is a sentence".
join() can specify other join strings besides " ":
@x = ("words","separated","by");
$y = join("::",@x,"colons");
Here, $y becomes "words::separated::by::colons".
To undo the effects of join(), call the function split():
$y = "words::separated::by::colons";
@x = split(/::/, $y);
The first element of the list supplied to split() is a pattern to be
matched. When the pattern is matched, a new array element is started and the pattern
is thrown away. In this case, the pattern to be matched is ::, which means
that @x becomes ("words", "separated", "by",
"colons").
Note that the syntax for the pattern is the same as that used in the =~
operator; see the section titled "Matching Patterns" for more information
on possible patterns to match.
Example: Sorting Words in a String
The example in LIST 5_2 on this book's CD-ROM uses split(),
join(), and sort() to sort the words in a string.
Using Command-Line Arguments
The special array variable @ARGV is automatically defined to contain
the strings entered on the command line when a Perl program is invoked. For example,
if the program
#!/usr/bin/perl
print("The first argument is $ARGV[0]n");
is called printfirstarg, entering the command
printfirstarg 1 2 3
produces the following output:
The first argument is 1
You can use join() to turn @ARGV into a single string:
#!/usr/bin/perl
$commandline = join(" ", @ARGV);
print("The command line arguments: $commandlinen");
If this program is called printallargs, entering
printallargs 1 2 3
produces
The command line arguments: 1 2 3
Note that $ARGV[0], the first element of the @ARGV array variable,
does not contain the name of the program. For example, in the invocation
printallargs 1 2 3
$ARGV[0] is "1", not "printallargs".
This is a difference between Perl and C; In C, argv[0] is "printallargs"
and argv[1] is "1".
Standard Input and Array Variables
Because an array variable can contain as many elements as you like, you can assign
an entire input file to a single array variable:
@infile = <STDIN>;
This works as long as you have enough memory to store the entire file.
Controlling Program Flow
Like all programming languages, Perl allows you to include statements that are
executed only when specified conditions are true; these statements are called conditional
statements.
The following is a simple example of a conditional statement:
if ($x == 14) {
print("$x is 14n");
}
Here, the line if ($x == 14) { tells Perl that the following statements--those
between the { and }--are to be executed only if $x is
equal to 14.
Perl provides a full range of conditional statements; these statements are described
in the following sections.
Conditional Execution: The if Statement
The if conditional statement has the following structure:
if (expr) {
...
}
When Perl sees the if, it evaluates the expression expr to be
either true or false. If the value of the expression is the integer 0, the
null string "", or the string "0", the value
of the expression is false; otherwise, the value of the expression is true.
CAUTION: The only string values that evaluate
to false are "" and "0". Strings such as "00"
and "0.0" return true, not false.
Two-Way Branching Using if and else
The else statement can be combined with the if statement to
allow for a choice between two alternatives:
if ($x == 14) {
print("$x is 14n");
} else {
print("$x is not 14n");
}
Here, the expression following the if is evaluated. If it is true, the
statements between if and else are executed. Otherwise, the statements
between else and the final } are executed. In either case, execution
then proceeds to the statement after the final }.
Note that the else statement cannot appear by itself: It must follow
an if statement.
MultiWay Branching Using elsif
The elsif statement allows you to write a program that chooses between
more than two alternatives:
if ($x == 14) {
print("$x is 14n");
} elsif ($x == 15) {
print("$x is 15n");
} elsif ($x == 16) {
print("$x is 16n");
} else {
print("$x is not 14, 15 or 16n");
}
Here, the expression $x == 14 is evaluated. If it evaluates to true (if
$x is equal to 14), the first print() statement is executed.
Otherwise, the expression $x == 15 is evaluated. If $x == 15 is
true, the second print() is executed; otherwise, the expression $x ==
16 is evaluated, and so on.
You can have as many elsif statements as you like; however, the first
elsif statement of the group must be preceded by an if statement.
The else statement can be omitted:
if ($x == 14) {
print("$x is 14n");
} elsif ($x == 15) {
print("$x is 15n");
} elsif ($x == 16) {
print("$x is 16n");
} # do nothing if $x is not 14, 15 or 16
If the else statement is included, it must follow the last elsif.
Conditional Branching Using unless
The unless statement is the opposite of the if statement:
unless ($x == 14) {
print("$x is not 14n");
}
Here, the statements between the braces are executed unless the value of the expression
evaluates to true.
You can use elsif and else with unless, if you like;
however, an if-elsif-else structure is usually easier
to follow than an unless-elsif-else one.
Repeating Statements Using while and until
In the previous examples, each statement between braces is executed once, at most.
To indicate that a group of statements between braces is to be executed until a certain
condition is met, use the while statement:
#!/usr/bin/perl
$x = 1;
while ($x <= 5) {
print("$x is now $xn");
++$x;
}
Here, the scalar variable $x is first assigned the value 1.
The statements between the braces are then executed until the expression $x <=
5 is false.
When you run the program preceding shown, you get the following output:
$x is now 1
$x is now 2
$x is now 3
$x is now 4
$x is now 5
As you can see, the statements between the braces have been executed five times.
The until statement is the opposite of while:
#!/usr/bin/perl
$x = 1;
until ($x <= 5) {
print("$x is now $xn");
++$x;
}
Here, the statements between the braces are executed until the expression $x
<= 5 is true. In this case, the expression is true the first time it is evaluated,
which means that the print() statement is never executed. To fix this, reverse
the direction of the arithmetic comparison:
#!/usr/bin/perl
$x = 1;
until ($x > 5) {
print("$x is now $xn");
++$x;
}
This now produces the same output as the program containing the preceding while
statement.
CAUTION: If you use while, until,
or any other statement that repeats, you must make sure that the statement does not
repeat forever:
$x = 1;
while ($x == 1) {
print("$x is still $xn");
}
Here, $x is always 1, $x == 1 is always true, and the
print() statement is repeated an infinite number of times.
Perl does not check for infinite loops such as this one above. It is your responsibility
to make sure that infinite loops don't happen!
Using Single-Line Conditional Statements
If only one statement is to be executed when a particular condition is true, you
can write your conditional statement using a single-line conditional statement. For
example, instead of writing
if ($x == 14) {
print("$x is 14n");
}
you can use the following single-line conditional statement:
print("$x is 14n") if ($x == 14);
In both cases, the print() statement is executed if $x is equal
to 14.
You can also use unless, while, or until in a single-line
conditional statement:
print("$x is not 14n") unless ($x == 14);
print("$x is less than 14n") while ($x++ < 14);
print("$x is less than 14n") until ($x++ > 14);
Note how useful the autoincrement operator ++ is in the last two statements:
It allows you to compare $x and add 1 to it all at once. This ensures
that the single-line conditional statement does not execute forever.
Looping with the for Statement
Most loops--segments of code that are executed more than once--use a counter to
control and eventually terminate the execution of the loop. Here is an example similar
to the ones you've seen so far:
$count = 1; # initialize the counter
while ($count <= 10) { # terminate after ten repetitions
print("the counter is now $countn");
$count += 1; # increment the counter
}
As you can see, the looping process consists of three components:
- The initialization of the counter variable
- A test to determine whether to terminate the loop
- The updating of the counter variable after the execution of the statements in
the loop
Because a loop so often contains these three components, Perl provides a quick
way to do them all at once by using the for statement. The following example
uses the for statement and behaves the same as the example you just saw:
for ($count = 1; $count <= 10; $count += 1) {
print("the counter is now $countn");
}
Here the three components of the loop all appear in the same line, separated by
semicolons. Because the components are all together, it is easier to remember to
supply all of them, which makes it more difficult to write code that goes into an
infinite loop.
Looping Through a List The foreach Statement
All the examples of loops that you've seen use a scalar variable as the counter.
You can also use a list as a counter by using the foreach statement:
#!/usr/bin/perl
@list = ("This", "is", "a", "list", "of", "words");
print("Here are the words in the list: n");
foreach $temp (@list) {
print("$temp ");
}
print("n");
Here, the loop defined by the foreach statement executes once for each
element in the list @list. The resulting output is
Here are the words in the list:
This is a list of words
The current element of the list being used as the counter is stored in a special
scalar variable, which in this case is $temp. This variable is special because
it is defined only for the statements inside the foreach loop:
#!/usr/bin/perl
$temp = 1;
@list = ("This", "is", "a", "list", "of", "words");
print("Here are the words in the list: n");
foreach $temp (@list) {
print("$temp ");
}
print("n");
print("The value of temp is now $tempn");
The output from this program is the following:
Here are the words in the list:
This is a list of words
The value of temp is now 1
The original value of $temp is restored after the foreach statement
is finished.
Variables that exist only inside a certain structure, such as $temp in
the foreach statement in the preceding example, are called local variables.
Variables that are defined throughout a Perl program are known as global variables.
Most variables you use in Perl are global variables. To see other examples of local
variables, see the section in this chapter titled "Using Subroutines."
CAUTION: Changing the value of the local
variable inside a foreach statement also changes the value of the corresponding
element of the list:
@list = (1, 2, 3, 4, 5);
foreach $temp (@list) {
if ($temp == 2) {
$temp = 20;
}
}
In this loop, when $temp is equal to 2, $temp is reset
to 20. Therefore, the contents of the array variable @list become
(1, 20, 3, 4, 5).
Exiting a Loop with the last Statement
Normally, you exit a loop by testing the condition at the top of the loop and
then jumping to the statement after it. However, you can also exit a loop in the
middle. To do this, use the last statement.
File LIST 5_5 on this book's CD-ROM totals a set of receipts entered
one at a time; execution is terminated when a null line is entered. If a value entered
is less than zero, the program detects this and exits the loop.
Using next to Start the Next Iteration of a
Loop
In Perl, the last statement terminates the execution of a loop. To terminate
a particular pass through a loop (also known as an iteration of the loop), use the
next statement.
File LIST 5_4 on this book's CD-ROM sums up the numbers from 1 to a user-specified
upper limit, and also produces a separate sum of the numbers divisible by two.
Be careful when you use next in a while or until loop.
The following example goes into an infinite loop:
$count = 0;
while ($count <= 10) {
if ($count == 5) {
next;
}
$count++;
}
When $count is 5, the program tells Perl to start the next iteration
of the loop. However, the value of $count is not changed, which means that
the expression $count == 5 is still true.
To get rid of this problem, you need to increment $count before using
next, as in:
$count = 0;
while ($count <= 10) {
if ($count == 5) {
$count++;
next;
}
$count++;
}
This, by the way, is why many programming purists dislike statements such as next
and last: it's too easy to lose track of where you are and what needs to
be updated.
Perl automatically assumes that variables are initialized to be the null string,
which evaluates to 0 in arithmetic expressions. This means that in code
fragments such as
$count = 0;
while ($count <= 10) {
...
$count++;}
you don't really need the $count = 0; statement. However, it is a good
idea to explicitly initialize everything, even when you don't need to. This makes
it easier to spot misprints:
$count = $tot = 0;
while ($count <= 10) {
$total += $count; # misprint: you meant to type "$tot"
$count += 1;
}
print ("the total is $totn");
If you've gotten into the habit of initializing everything, it's easy to spot
that $total is a misprint. If you use variables without initializing them,
you first have to determine whether $total is really a different variable
than $tot. This might be difficult if your program is large and complicated.
Using Labeled Blocks for Multilevel Jumps
In Perl, loops can be inside other loops; such loops are said to be nested.
To get out of an outer loop from within an inner loop, label the outer loop and specify
its label when using last or next:
$total = 0;
$firstcounter = 1;
DONE: while ($firstcounter <= 10) {
$secondcounter = 1;
while ($secondcounter <= 10) {
$total += 1;
if ($firstcounter == 4 && $secondcounter == 7) {
last DONE;
}
$secondcounter += 1;
}
$firstcounter += 1;
}
The statement
last DONE;
tells Perl to jump out of the loop labeled DONE and continue execution
with the first statement after the outer loop. (By the way, this code fragment is
just a rather complicated way of assigning 37 to $total.)
Loop labels must start with a letter and can consist of as many letters, digits,
and underscores as you like. The only restriction is that you can't use a label name
that corresponds to a word that has a special meaning in Perl:
if: while ($x == 0) { # this is an error in perl
...
}
When Perl sees the if, it doesn't know whether you mean the label if
or the start of an if statement.
Words such as if that have special meanings in Perl are known as reserved
words or keywords.
Terminating Execution Using die
As you have seen, the last statement terminates a loop. To terminate
program execution entirely, use the die() function.
To illustrate the use of die(), see File LIST 5_6 on this book's
CD-ROM, a simple program that divides two numbers supplied on a single line. die()
writes its argument to the standard error file, STDERR, and then exits immediately.
In this example, die() is called when there are not exactly two numbers
in the input line or if the second number is zero.
If you like, you can tell die() to print the name of the Perl program
and the line number being executed when the program was terminated. To do this, leave
the closing newline character off the message:
die("This prints the filename and line number");
If the closing newline character is included, the filename and line number are
not included:
die("This does not print the filename and line numbern");
Reading from and Writing to Files
So far, all of the examples have read from the standard input file, STDIN,
and have written to the standard output file, STDOUT, and the standard error
file, STDERR. You can also read from and write to as many other files as
you like.
To access a file on your UNIX file system from within your Perl program, you must
perform the following steps:
- 1. Your program must open the file. This tells the system that your Perl
program wants to access the file.
2. The program can either read from or write to the file, depending on how
you have opened the file.
3. The program can close the file. This tells the system that your program
no longer needs access to the file.
The following sections describe these operations, tell you how you can read from
files specified in the command line, and describe the built-in file test operations.
Opening a File
To open a file, call the built-in function open():
open(MYFILE, "/u/jqpublic/myfile");
The second argument is the name of the file you want to open. You can supply either
the full UNIX pathname, as in /u/jqpublic/myfile, or just the filename,
as in myfile. If only the filename is supplied, the file is assumed to be
in the current working directory.
The first argument is an example of a file handle. After the file has been opened,
your Perl program accesses the file by referring to this handle. Your file handle
name must start with a letter or underscore, and can then contain as many letters,
underscores, and digits as you like. (You must ensure, however, that your file handle
name is not the same as a reserved word, such as if. See the note in the
section titled "Using Labelled Blocks for multilevel Jumps" for more information
on reserved words.)
By default, Perl assumes that you want to read any file that you open. To open
a file for writing, put a > (greater than) character in front of your
filename:
open(MYFILE, ">/u/jqpublic/myfile");
When you open a file for writing, any existing contents are destroyed. You cannot
read from and write to the same file at the same time.
To append to an existing file, put two > characters in front of the
filename:
open(MYFILE, ">>/u/jqpublic/myfile");
You still cannot read from a file you are appending to, but the existing contents
are not destroyed.
Checking Whether the Open Succeeded
The open() function returns one of two values:
- open() returns true (a non-zero value) if the open succeeds.
- open() returns false (zero) if an error occurs (that is, the file does
not exist or you don't have permission to access the file).
You can use the return value from open() to test whether the file is
actually available, and call die() if it is not:
unless (open(MYFILE, "/u/jqpublic/myfile")) {
die("unable to open /u/jqpublic/myfile for readingn");
}
This ensures that your program does not try to read from a nonexistent file.
You can also use the || (logical OR) operator in place of unless:
open(MYFILE, "/u/jqpublic/myfile") ||
die("unable to open /u/jqpublic/myfile for readingn");
This works because the right side of the || operator is executed only
if the left side is false. See section titled "Performing Comparisons"
for more information on the || operator.
Reading from a File
To read from a file, enclose the name of the file in angle brackets:
$line = <MYFILE>;
This statement reads a line of input from the file specified by the file handle
MYFILE and stores the line of input in the scalar variable $line.
As you can see, you read from files in exactly the same way you read from the standard
input file, STDIN.
Writing to a File
To write to a file, specify the file handle when you call the function print():
print MYFILE ("This is a line of text to write n",
"This is another line to writen");
The file handle must appear before the first line of text to be written to the
file.
This method works both when you are writing a new file and when you are appending
to an existing one.
Closing a File
When you are finished reading from or writing to a file, you can tell the system
that you are finished by calling close():
close(MYFILE);
Note that close() is not required: Perl automatically closes the file
when the program terminates or when you open another file using a previously defined
file handle.
Determining the Status of a File
As you have seen, when you open a file for writing, the existing contents of the
file are destroyed. If you want to open the file for writing if the file does not
already exist, you can first test to see if a file exists. To do this, use the -e
operator:
if (-e "/u/jqpublic/filename") {
die ("file /u/jqpublic/filename already exists");
}
open (MYFILE, "/u/jqpublic/filename");
The -e operator assumes that its operand--a scalar value--is the name
of a file. It checks to see if a file with that name already exists. If the file
exists, the -e operator returns true; otherwise, it returns false.
Similar tests exist to test other file conditions. The most commonly used file
status operators are listed in Table 5.3.
Table 5.3. File status operators.
Operator |
File condition |
-d |
Is this file really a directory? |
-e |
Does this file exist? |
-f |
Is this actually a file? |
-l |
Is this file really a symbolic link? |
-o |
Is this file owned by the person running the program? |
-r |
Is this file readable by the person running the program? |
-s |
Is this a non-empty file? |
-w |
Is this file writeable by the person running the program? |
-x |
Is this file executable by the person running the program? |
-z |
Is this file empty? |
-B |
Is this a binary file? |
-T |
Is this a text file? |
Reading from a Sequence of Files
Many UNIX commands have the form
command file1 file2 file3 ...
These commands operate on all of the files specified on the command line, starting
with file1 and continuing from there.
You can simulate this behavior in Perl. To do this, use the <>
operator.
File LIST 5_7 on this book's CD-ROM counts all the times the word "the"
appears in a set of files.
Suppose that this example is stored in a file named thecount. If the
command thecount myfile1 myfile2 myfile3 is entered from the command line,
the program starts by reading a line of input from the file myfile1 into
the scalar variable $inputline. This input line is then split into words,
and each word is tested to see if it is "the." After this line is processed,
the program reads another line from myfile1.
When myfile1 is exhausted, the program then begins reading lines from
myfile2, and then from myfile3. When myfile3 is exhausted,
the program prints the total number of occurrences of "the" in the three
files.
Using Subroutines
Some programs perform the same task repeatedly. If you are writing such a program,
you might get tired of writing the same lines of code over and over. Perl provides
a way around this problem: Frequently used segments of code can be stored in separate
sections, known as subroutines.
The following sections describe how subroutines work, how to pass values to subroutines
and receive values from them, and how to define variables that only exist inside
subroutines.
Defining a Subroutine
A common Perl task is to read a line of input from a file and break it into words.
Here is an example of a subroutine that performs this task. Note that it uses the
<> operator described in the section titled "Reading from a Sequence
of Files."
sub getwords {
$inputline = <>;
@words = split(/s+/, $inputline);
}
All subroutines follow this simple format: the reserved word sub, the
name of the subroutine (in this case, getwords), a { (open brace)
character, one or more Perl statements (also known as the body of the subroutine),
and a closing } (close brace) character.
The subroutine name must start with a letter or underscore, and can then consist
of any number of letters, digits, and underscores. (As always, you must ensure that
your variable name is not a reserved word. See the note in the section titled "Using
Labelled Blocks for multilevel Jumps" for more information on reserved words.)
A subroutine can appear anywhere in a Perl program--even right in the middle,
if you like. However, programs are usually easier to understand if the subroutines
are all placed at the end.
Using a Subroutine
After you have written your subroutine, you can use it by specifying its name.
Here is a simple example that uses the subroutine getwords to count the
number of occurrences of the word "the":
#!/usr/bin/perl
$thecount = 0;
&getwords;
while ($words[0] ne "") { # stop when line is empty
for ($index = 0; $words[$index] ne ""; $index += 1) {
$thecount += 1 if $words[$index] eq "the";
}
&getwords;
}
print ("Total number of occurrences of the: $thecountn");
The statement &getwords; tells Perl to call the subroutine getwords.
When Perl calls the subroutine getwords, it executes the statements contained
in the subroutine, namely
$inputline = <>;
@words = split(/s+/, $inputline);
After these statements have been executed, Perl executes the statement immediately
following the &getwords statement.
In Perl 5, if the call to a subroutine appears after its definition, the &
character can be omitted from the call.
Returning a Value from a Subroutine
The getwords subroutine defined previously is useful, but it suffers
from one serious limitation: It assumes that the words from the input line are always
going to be stored in the array variable @words. This can lead to problems:
@words = ("These", "are", "some", "words");
&getwords;
Here, calling getwords destroys the existing contents of @words.
To solve this problem, consider the subroutine getwords you saw earlier:
sub getwords {
$inputline = <>;
@words = split(/s+/, $inputline);
}
In Perl subroutines, the last value seen by the subroutine becomes the subroutine's
return value. In this example, the last value seen is the list of words assigned
to @words. In the call to getwords, this value can be assigned
to an array variable:
@words2 = &getwords;
Note that this hasn't yet solved the problem, because @words is still
overwritten by the getwords subroutine. However, now you don't need to use
@words in getwords, because you are assigning the list of words
by using the return value. You can now change getwords to use a different
array variable:
sub getwords {
$inputline = <>;
@subwords = split(/s+/, $inputline);
}
Now, the statements
@words = ("These", "are", "some", "words");
@words2 = &getwords;
work properly: @words is not destroyed when getwords is called.
(For a better solution to this problem, see the following section, "Using Local
Variables.")
Because the return value of a subroutine is the last value seen, the return value
might not always be what you expect.
Consider the following simple program that adds numbers supplied on an input line:
#!/usr/bin/perl
$total = &get_total;
print("The total is $totaln");
sub get_total {
$value = 0;
$inputline = <STDIN>;
@subwords = split(/s+/, $inputline);
$index = 0;
while ($subwords[$index] ne "") {
$value += $subwords[$index++];
}
}
At first glance, you might think that the return value of the subroutine get_total
is the value stored in $value. However, this is not the last value seen
in the subroutine!
Note that the loop exits when $subwords[index] is the null string. Because
no statements are processed after the loop exits, the last value seen in the subroutine
is, in fact, the null string. Thus, the null string is the return value of get_total
and is assigned to $total.
To get around this problem, always have the last statement of the subroutine refer
to the value you want to use as the return value:
sub get_total {
$value = 0;
$inputline = <STDIN>;
@subwords = split(/s+/, $inputline);
$index = 0;
while ($subwords[$index] ne "") {
$value += $subwords[$index++];
}
$value; # $value is now the return value
}
Now, get_total actually returns what you want it to.
Using Local Variables
As you saw in the section titled "Returning a Value from a Subroutine,"
defining variables that appear only in a subroutine ensures that the subroutine doesn't
accidentally overwrite anything:
sub getwords {
$inputline = <>;
@subwords = split(/s+/, $inputline);
}
Note, however, that the variables $inputline and @subwords could
conceivably be added to your program at a later time. Then, a call to getwords
would once again accidentally destroy values that your program needs to keep.
You can ensure that the variables used in a subroutine are known only inside that
subroutine by defining them as local variables. Here is the subroutine getwords
with $inputline and @subwords defined as local variables:
sub getwords {
local($inputline, @subwords);
$inputline = <>;
@subwords = split(/s+/, $inputline);
}
The local() statement tells Perl that versions of the variables $inputline
and @subwords are to be defined for use inside the subroutine. Once a variable
has been defined with local(), it cannot accidentally destroy values in
your program:
@subwords = ("Some", "more", "words");
@words = &getwords;
Here, @subwords is not destroyed, because the @subwords used
in getwords is known only inside the subroutine.
Note that variables defined using local() can be used in any subroutines
called by this subroutine. If you are using Perl 5, you can use the my()
statement to define variables that are known only to the subroutine in which they
are defined:
my($inputline, @subwords);
The syntax for the my() statement is the same as that of the local()
statement.
Passing Values to a Subroutine
You can make your subroutines more flexible by allowing them to accept values.
As an example, here is the getwords subroutine modified to split the
input line using a pattern that is passed to it:
sub getwords {
local($pattern) = @_;
local($inputline, @subwords);
$inputline = <>;
@subwords = split($pattern, $inputline);
}
The array variable @_ is a special system variable that contains a copy
of the values passed to the subroutine. The statement local($pattern) = @_;
creates a local scalar variable named $pattern and assigns the first value
of the array, @_, to it.
Now, to call getwords, you must supply the pattern you want it to use
when splitting words. To split on whitespace, as before, call getwords as
follows:
@words = getwords(/s+/);
If your input line consists of words separated by colons, you can split it using
getwords by calling it as follows:
@words = getwords(/:/);
If you like, you can break your line into single characters:
@words = getwords(//);
For more information on patterns you can use, see the section titled "Matching
Patterns."
The array variable @_ behaves like any other array variable. In particular,
its components can be used as scalar values:
$x = $_[0];
Here, the first element of @_--the first value passed to the subroutine--is
assigned to $x.
Usually, assigning @_ to local variables is the best approach, because
your subroutine becomes easier to understand.
Calling Subroutines from Other Subroutines
You can have a subroutine call another subroutine you have written. For example,
here is a subroutine that counts the number of words in an input line:
sub countline {
local(@words, $count);
$count = 0;
@words = getwords(/s+/);
foreach $word (@words) {
$count += 1;
}
$count; # make sure the count is the return value
}
The subroutine countline first calls the subroutine getwords
to split the input line into words. Then it counts the number of words in the array
returned by getwords and returns that value.
After you have written countline, it is easy to write a program called
wordcount that counts the number of words in one or more files:
#!/usr/bin/perl
$totalwordcount = 0;
while (($wordcount = &countline) != 0) {
$totalwordcount += $wordcount;
}
print("The total word count is $totalwordcountn");
# include the subroutines getwords and countline here
This program reads lines until an empty line--a line with zero words--is read
in. (The program assumes that the files contain no blank lines. You can get around
this problem by having getwords test whether $inputline is empty
before breaking it into words, returning a special "end of file" value
in this case. This value could then be passed from getwords to countline,
and then to the main program.)
Because getwords uses the <> operator to read input, the
files whose words are counted are those listed on the command line:
wordcount file1 file2 file3
This counts the words in the files file1, file2, and file3.
The variable @_ is a local variable whose value is defined only in the
subroutine in which it appears. This allows subroutines to pass values to other subroutines:
Each subroutine has its own copy of @_, and none of the copies can destroy
each other's values.
The BEGIN, END, and AUTOLOAD
Subroutines
Perl 5 enables you to define special subroutines that are to be called at certain
times during program execution.
The BEGIN subroutine, if defined, is called when program execution begins:
BEGIN {
print ("This is the start of the program.n");
}
The END subroutine, if defined, is called when program execution terminates:
END {
print ("This is the last sentence you will read.n");
}
The AUTOLOAD statement is called when your program tries to call a subroutine
that does not exist:
AUTOLOAD {
print ("subroutine $AUTOLOAD not found.n");
print ("arguments passed: @_n");
}
Associative Arrays
A common programming task is to keep counts of several things at once. You can,
of course, use scalar variables or array variables to solve this problem, but this
requires a rather messy if-elsif structure:
if ($fruit eq "apple") {
$apple += 1;
} elsif ($letter eq "banana") {
$banana += 1;
} elsif ($letter eq "cherry") {
$cherry += 1;
...
This takes up a lot of space and is rather boring to write.
Fortunately, Perl provides an easier way to solve problems like these--associative
arrays. The following sections describe associative arrays and how to manipulate
them.
Defining Associative Arrays
In ordinary arrays, you access an array element by specifying an integer as the
index:
@fruits = (9, 23, 11);
$count = $fruits[0]; # $count is now 9
In associative arrays, you do not have to use numbers such as 0, 1,
and 2 to access array elements. When you define an associative array, you
specify the scalar values you want to use to access the elements of the array. For
example, here is a definition of a simple associative array:
%fruits = ("apple", 9,
"banana", 23,
"cherry", 11);
$count = $fruits{"apple"}; # $count is now 9
Here, the scalar value "apple" accesses the first element of
the array %fruits, "banana" accesses the second element,
and "cherry" accesses the third. You can use any scalar value
you like as an array index, or any scalar value as the value of the array element:
%myarray = ("first index", 0,
98.6, "second value",
76, "last value");
$value = $myarray{98.6}; # $value is now "second value"
Associative arrays eliminate the need for messy if-elsif structures.
To add 1 to an element of the %fruits array, for example, you just
need to do the following:
$fruits{$fruit} += 1;
Better still, if you decide to add other fruits to the list, you do not need to
add more code, because the preceding statement also works on the new elements.
The character % tells Perl that a variable is an associative array. As
with scalar variables and array variables, the remaining characters of the associative
array variable name must consist of a letter followed by one or more letters, digits,
or underscores.
Accessing Associative Arrays
Because an associative array value is a scalar value, it can be used wherever
a scalar value can be used:
$redfruits = $fruits{"apple"} + $fruits{"cherry"};
print("yes, we have no bananasn") if ($fruits{"banana"} == 0);
Note that Perl uses braces (the { and } characters) to enclose
the index of an associative array element. This makes it possible for Perl to distinguish
between ordinary array elements and associative array elements.
Copying to and from Associative Arrays
Consider the following assignment, which initializes an associative array:
%fruits = ("apple", 9,
"banana", 23,
"cherry", 11);
The value on the right of this assignment is actually just the ordinary list,
("apple", 9, "banana", 23, "cherry", 11),
grouped into pairs for readability. You can assign any list, including the contents
of an array variable, to an associative array:
@numlist[0,1] = ("one", 1);
@numlist[2,3] = ("two", 2);
%numbers = @numlist;
$first = $numbers{"one"}; # $first is now 1
Whenever a list or an array variable is assigned to an associative array, the
odd-numbered elements (the first, third, fifth, and so on) become the array indexes,
and the even-numbered elements (the second, fourth, sixth, and so on) become the
array values. Perl 5 allows you to use => to separate array elements
to make this assignment easier to see:
%fruits = ("apple" => 9,
"banana" => 23,
"cherry" => 11);
In associative array assignments, => and , are equivalent.
You can also assign an associative array to an array variable:
%numbers = ("one", 1,
"two", 2);
@numlist = %numbers;
$first = $numlist[3]; # first is now 2
Here, the array indexes and array values both become elements of the array.
Adding and Deleting Array Elements
To add a new element to an associative array, just create a new array index and
assign a value to its element. For example, to create a fourth element for the %fruits
array, type the following:
$fruits{"orange"} = 1;
This statement creates a fourth element with index "orange"
and gives it the value 1.
To delete an element, use the delete() function:
delete($fruits{"orange"});
This deletes the element indexed by "orange" from the array
%fruits.
Listing Array Indexes and Values
The keys() function retrieves a list of the array indexes used in an
associative array:
%fruits = ("apple", 9,
"banana", 23,
"cherry", 11);
@fruitindexes = keys(%fruits);
Here, @fruitindexes is assigned the list consisting of the elements "apple",
"banana", and "cherry". Note that this list
is in no particular order. To retrieve the list in alphabetic order, use sort()
on the list:
@fruitindexes = sort(keys(%fruits));
This produces the list ("apple", "banana", "cherry").
To retrieve a list of the values stored in an associative array, use the function
values():
%fruits = ("apple", 9,
"banana", 23,
"cherry", 11);
@fruitvalues = values(%fruits);
@fruitvalues now contains a list consisting of the elements 9,
23, and 11 (again, in no particular order).
Looping with an Associative Array
Perl provides a convenient way to use an associative array in a loop:
%fruits = ("apple", 9,
"banana", 23,
"cherry", 11);
while (($fruitname, $fruitvalue) == each(%fruitnames) {
...
}
The each() function returns each element of the array in turn. Each element
is returned as a two-element list (array index and then array value). Again, the
elements are returned in no particular order.
Formatting Your Output
So far, the only output produced has been raw, unformatted output produced using
the print() function. However, you can control how your output appears on
the screen or on the printed page. To do this, define print formats and use the write()
function to print output using these formats.
The following sections describe print formats and how to use them.
Defining a Print Format
Here is an example of a simple print format:
format MYFORMAT =
===================================
Here is the text I want to display.
===================================
.
Here, MYFORMAT is the name of the print format. This name must start
with a letter and can consist of any sequence of letters, digits, or underscores.
The subsequent lines define what is to appear on the screen. Here, the lines to
be displayed are a line of = characters followed by a line of text and ending
with another line of = characters. A line consisting of a period indicates
the end of the print format definition.
Like subroutines, print formats can appear anywhere in a Perl program.
Displaying a Print Format
To print using a print format, use the write() function. For example,
to print the text in MYFORMAT, use
$~ = "MYFORMAT";
write();
This sends
===================================
Here is the text I want to display.
===================================
to the standard output file.
$~ is a special scalar variable used by Perl; it tells Perl which print
format to use.
Displaying Values in a Print Format
To specify a value to be printed in your print format, add a value field to your
print format. Here is an example of a print format that uses value fields:
format VOWELFORMAT =
==========================================================
Number of vowels found in text file:
a: @<<<<< e: @<<<<< i: @<<<<< o: @<<<<< u: @<<<<<
$letter{"a"}, $letter{"e"}, $letter{"i"}, $letter{"o"}, $letter{"u"}
==========================================================
.
The line
a: @<<<<< e: @<<<<< i: @<<<<< o: @<<<<< u: @<<<<<
contains five value fields. Each value field contains special characters that
provide information on how the value is to be displayed. (These special characters
are described in the following section, "Choosing a Value Field Format.")
Any line that contains value fields must be followed by a line listing the scalar
values (or variables containing scalar values) to be displayed in these value fields:
$letter{"a"}, $letter{"e"}, $letter{"i"}, $letter{"o"}, $letter{"u"}
The number of value fields must equal the number of scalar values.
Choosing a Value Field Format
The following value field formats are supported:
@<<<< |
Left-justified output: width equals the number of characters supplied. |
@>>>> |
Right-justified output: width equals the number of characters supplied. |
@|||| |
Centered output: width equals the number of characters supplied. |
@##.## |
Fixed-precision numeric: . indicates location of decimal point. |
@* |
Multiline text. |
In all cases, the @ character is included when the number of characters
in the field are counted. For example, the field @>>>> is five
characters wide. Similarly, the field @###.## is seven characters wide:
four before the decimal point, two after the decimal point, and the decimal point
itself.
Writing to Other Output Files
You can also write to other files by using print formats and write().
For example, to write to the file represented by file variable MYFILE using
print format MYFORMAT, use the following statements:
select(MYFILE);
$~ = "MYFORMAT";
write(MYFILE);
The select() statement indicates which file is to be written to, and
the $~ = "MYFORMAT"; statement selects the print format to use.
After an output file has been selected using select(), it stays selected
until another select() is seen. This means that if you select an output
file other than the standard output file, as in select(MYFILE);, output
from write() won't go to the standard output file until Perl sees the statement
select (MYFILE);.
There are two ways of making sure you don't get tripped up by this:
- Always use STDOUT as the default output file. If you change the output
file, change it back when you're done:
select(MYFILE);
$~ = "MYFORMAT";
write(MYFILE);
select(STDOUT);
- Always specify the output file with select() before calling write():
select(STDOUT);
$~ = "MYFORMAT";
write(); # STDOUT is assumed
It doesn't really matter which solution you use, as long as you're consistent.
If you are writing a subroutine that writes to a particular output file, you can
save the current selected output file in a temporary variable and restore it later:
$temp = select(MYFILE); # select the output file
$~ = "MYFORMAT";
write(MYFILE);
select($temp); # restore the original selected output file
This method is also useful if you're in the middle of a large program and you
don't remember which output file is currently selected.
Specifying a Page Header
You can specify a header to print when you start a new page. To do this, define
a print format with the name filename_TOP, where filename
is the name of the file variable corresponding to the file you are writing to. For
example, to define a header for writing to standard output, define a print format
named STDOUT_TOP:
format STDOUT_TOP =
page @<
$%
The system variable $% contains the current page number (starting with
1).
Setting the Page Length
If a page header is defined for a particular output file, write() automatically
paginates the output to that file. When the number of lines printed is greater than
the length of a page, it starts a new page.
By default, the page length is 60 lines. To specify a different page length, change
the value stored in the system variable $=:
$= = 66; # set the page length to 66 lines
This assignment must appear before the first write() statement.
Formatting Long Character Strings
A scalar variable containing a long character string can be printed out using
multiple value fields:
format QUOTATION =
Quotation for the day:
-----------------------------
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$quotation
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$quotation
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$quotation
.
Here, the value of $quotation is written on three lines. The @
character in the value fields is replaced by ^; this tells Perl to fill
the lines as full as possible (cutting the string on a space or tab). Any of the
value fields defined in the section titled "Choosing a Value Field Format"
can be used.
CAUTION: The contents of the scalar variable
are destroyed by this write operation. To preserve the contents, make a copy before
calling write().
If the quotation is too short to require all of the lines, the last line or lines
are left blank. To define a line that is used only when necessary, put a ~
character in the first column:
~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
To repeat a line as many times as necessary, put two ~ characters at
the front:
~~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
References
Perl 5 supports references, which are constructs that allow you to access data
indirectly. These constructs enable you to build complex data structures, including
multidimensional arrays.
The following sections describe how to use references.
CAUTION: If you are using Perl 4, you
will not be able to use pointers and references, because they were added to version
5 of the language.
Understanding References
The scalar variables you have seen so far contain a single integer or string value,
such as 43 or hello. A reference is a scalar variable whose
value is the location, or address, of another Perl variable.
The easiest way to show how references work is using an example:
$myvar = 42;
$myreference = $myvar;
print ("$$myreference"); # this prints 42
This code example contains three statements. The first statement just assigns
42 to the scalar variable $myvar. In the second statement, $myvar
means "the address of $myvar," which means that the statement
assigns the address of $myvar to the scalar variable $myreference.
$myreference is now a reference, also sometimes called a pointer.
The third statement shows how to use a reference after you have created one. Here,
$$myreference means "the variable whose address is contained in $myreference."
Because the address of $myvar is contained in $myreference, $$myreference
is equivalent to $myvar. This means that the print statement prints the
value of $myvar, which is 42.
The $$ in this statement is called a dereference, and it can basically
be thought of as the opposite of .
References and Arrays
A reference can also store the address of an array. For example, the statement
$arrayref = @myarray;
assigns the address of @myarray to $arrayref. Given this reference,
the following statements both assign the second element of @myarray to the
variable $second:
$second = $myarray[1];
$second = $$arrayref[1];
As before, $$arrayref refers to the variable whose address is stored
in $arrayref, which in this case is @myarray.
The address of an associative array can be stored in a reference as well:
%fruits = ("apple", 9,
"banana", 23,
"cherry", 11);
$fruitref = %fruits;
$bananaval = $$fruitref{"banana"}; # this is 23
Here, $$fruitref{"banana"} is equivalent to $fruits{"banana"},
which is 23.
Another way to access an element of an array whose address is stored in a reference
is to use the -> (dereference) operator. The following pairs of statements
are equivalent in Perl:
$second = $$arrayref[1];
$second = $arrayref->[1];
$bananaval = $$fruitref{"banana"};
$bananaval = $fruitref->{"banana"};
The -> operator is useful when creating multidimensional arrays, described
in the following subsection.
Multidimensional Arrays
You can use references to construct multidimensional arrays. The following statements
create a multidimensional array and access it:
$arrayptr = ["abc", "def", [1, 2, 3], [4, 5, 6]];
$def = $arrayptr->[1]; # assigns "def" to $def
$two = $arrayptr->[2][1]; # assigns 2 to $two
The first statement creates a four-element array and assigns its address to $arrayptr.
The third and fourth elements of this array are themselves arrays, each containing
three elements.
$arrayptr->[1] refers to the second element of the array whose address
is stored in $arrayptr. This element is "def". Similarly,
$arrayptr->[2] refers to the third element of the array, which is [1,
2, 3]. The [1] in $arrayptr->[2][1] specifies the second
element of [1, 2, 3], which is 2.
You can access associative arrays in this way as well.
NOTE: Multidimensional arrays can have
as many dimensions as you want.
References to Subroutines
You can use references to indirectly access subroutines. For example, the following
code creates a reference to a subroutine, and then calls it:
$subreference = sub {
print ("hello, world");
};
&$subreference(); # this prints "hello, world"
Here, &$subreference() calls the subroutine whose address is stored
in $subreference. This subroutine call is treated like any other subroutine
call: The subroutine can be passed parameters and can return a value.
References to File Handles
You can use a reference to indirectly refer to a file handle. For example, the
following statement writes a line of output to the standard output file:
$stdout = *STDOUT;
print $stdout ("hello, worldn");
This makes it possible to, for example, create subroutines that write to a file
whose handle is passed as a parameter.
CAUTION: Don't forget to include the *
after the when creating a reference to a file handle. (The *
refers to the internal symbol table in which the file handle is stored.)
You do not need to supply a * when creating a reference to a scalar variable,
an array, or a subroutine.
Object-Oriented Programming
Perl 5 provides the ability to write programs in an object-oriented fashion. You
can do this by creating packages containing code that performs designated tasks.
These packages can contain private variables and subroutines that are not accessible
from the other parts of your program.
The following sections describe packages and how they can be used to create classes
and objects. These sections also describe how to use packages to create exportable
program modules.
CAUTION: If you are using Perl 4, you
will not be able to use many of the features described here, because they were added
to version 5 of the language.
Packages
In Perl, a package is basically just a separate collection of variables
and subroutines contained in its own name space. To create a package or switch from
one existing package to another, use the package statement:
package pack1;
$myvar = 26;
package pack2;
$myvar = 34;
package pack1;
print ("$myvarn"); # this prints 26
This code creates two packages, pack1 and pack2, and then switches
from pack2 back to pack1. Each package contains its own version
of the variable $myvar: In package pack1, $myvar is assigned
26, and in package pack2, $myvar is assigned 34.
Because the print statement is inside pack1, it prints 26, which
is the value of the pack1 $myvar variable.
Subroutines can also be defined inside packages. For example, the following creates
a subroutine named mysub inside a package named pack1:
package pack1;
subroutine mysub {
print ("hello, world!n");
}
To access a variable or subroutine belonging to one package from inside another
package, specify the package name and two colons:
package pack1;
print ("$pack2::myvarn");
This print statement prints the value of the version of $myvar
belonging to package pack2, even though the current package is pack1.
NOTE: Perl 4 uses a single quote character
instead of two colons to separate a package name from a variable name:
$pack2´´myvar
If no package is specified, by default all variables and subroutines are added
to a package named main. This means that the following statements are equivalent:
$newvar = 14;
$main::newvar = 14;
To switch back to using the default package, just add the line
package main;
to your program at the point at which you want to switch.
Creating a Module
You can put a package you create into its own file, called a module. This
makes it possible to use the same package in multiple programs.
The following file, named Hello.pm, creates a module containing a subroutine
that prints hello, world!:
package Hello;
require Exporter;
@ISA = "Exporter";
@EXPORT = ("helloworld");
sub helloworld {
print ("hello, world!n");
}
1;
The first statement defines the package named Hello. The
require Exporter;
statement includes a predefined Perl module called Exporter.pm; this
module handles the details of module creation for you. The statement
@ISA = "Exporter";
sets the @ISA array, which is a predefined array that specifies a list
of packages to look for subroutines in. The statement
@EXPORT = ("helloworld");
indicates that the helloworld subroutine is to be made accessible to
other Perl programs. If you add other subroutines to your module, add their names
to the list being assigned to @EXPORT.
Note the closing 1; statement in the package. This ensures that your
package is processed properly when it is included by other programs. Also note that
your package file should have the suffix .pm.
After you have created Hello.pm, you can include it in other programs.
The following program uses the use statement to include Hello.pm
and then calls the subroutine contained in the Hello package:
#!/usr/bin/perl
use Hello;
&Hello::helloworld();
TIP: Perl 5 users all over the world write
useful modules that are made available to the Perl user community via the Internet.
The CPAN network of archives provides a complete list of these modules. For more
information, access the Web site located at http://www.perl.com/perl/CPAN/README.html.
Creating a Class and Its Objects
One of the fundamental concepts of object-oriented programming is the concept
of a class, which is a template consisting of a collection of data items and subroutines.
After a class is created, you can define variables that refer to this class; these
variables are called objects (or instances of the class).
In Perl, a class is basically just a package containing a special initialization
function, called a constructor, which is called each time an object is created. The
following code is an example of a simple class:
package MyClass;
sub new {
my ($myref) = [];
bless ($myref);
return ($myref);
}
The subroutine named new is the constructor for the class MyClass.
(Perl assumes that all constructors are named new.) This subroutine defines
a local variable named $myref, which, in this case, is a reference to an
empty array. (You can also refer to a scalar variable or associative array if you
like.)
The bless function, called within the subroutine, indicates that the
item being referenced by $myref is to be treated as part of the MyClass
package. The reference is then returned.
After you have created a class, it's easy to create an object of this class:
$myobject = new MyClass;
Here, new MyClass calls the subroutine new defined inside the
MyClass package. This subroutine creates a reference to an array of class
MyClass, which is then assigned to $myobject.
NOTE: new, like any other Perl
subroutine, can be passed parameters. These parameters can be used to initialize
each object as it is created.
Methods
Most classes have methods defined for them. Methods manipulate an object of the
class for which they are defined.
In Perl, a method is just an ordinary subroutine whose first parameter
is the object being manipulated. For example, the following method assumes that its
object is an array and prints one element of the array:
package MyPackage;
sub printElement {
my ($object) = shift(@_);
my ($index) = @_;
print ("$object->[$index]n");
}
The first parameter passed to printElement is the object to be manipulated.
(The shift() function removes the first element from an array. Recall that
the @_ array contains the values passed to the subroutine.) The second parameter
specifies the index of the element to be printed.
The following code shows two ways to call this method once it has been created:
$myobject = new MyPackage;
MyPackage::printElement($myobject, 2); # print the third element
$myobject->printElement(2); # this is identical to the above
The second way of calling this method more closely resembles the syntax used in
other object-oriented programming languages.
Overrides
As you have seen, when an object is created, it is assumed to be of a particular
class. To use a method from another class on this object, specify the class when
calling the method, as in
$myobject = new MyClass;
MyOtherClass::myMethod($myobject);
This calls the method named myMethod, which is of the class MyOtherClass.
Inheritance
Perl allows you to define classes which are subclasses of existing classes. These
subclasses inherit the methods of their parent class.
The following code is an example of a module that contains a subclass:
package MySubClass;
require Exporter;
require MyParentClass;
@ISA = ("Exporter", "MyParentClass");
@EXPORT = ("myChildRoutine");
sub myChildRoutine {
my ($object) = shift(@_);
print ("$object->[0]n");
}
sub new { # the constructor for MySubClass
my ($object) = MyParentClass->new();
$object->[0] = "initial value";
bless($object);
return ($object);
}
1;
This class contains a method, myChildRoutine, which prints the first
element of the array referenced by $object. The constructor for this class
calls the constructor for its parent class, MyParentClass; this constructor
returns a reference, which is then used and later returned by the MySubClass
constructor.
Note that the @ISA array defined at the start of the module includes
the name of the parent class, MyParentClass. This tells Perl to look for
methods in the class named MyParentClass if it can't find them in MySubClass.
Methods in the parent class can be called as if they were defined in the subclass:
use MySubClass;
$myobject = new MySubClass;
$myobject->myParentRoutine("hi there");
This creates an object of class MySubClass. The code then calls myParentRoutine,
which is a method belonging to class MyParentClass.
Using Built-In Functions
The examples you have seen so far use some of the many built-in functions provided
with Perl. Table 5.4 provides a more complete list.
For more details on these functions and others, see the online documentation for
Perl.
Table 5.4. Built-in functions.
Function |
Description |
abs($scalar) |
Return absolute value of number |
alarm($scalar) |
Deliver SIGALRM in $scalar seconds |
atan2($v1, $v2) |
Return arctangent of $v1/$v2 |
caller($scalar) |
Return context of current subroutine |
chdir($scalar) |
Change working directory to $scalar |
chmod(@array) |
Change permissions of file list |
chomp($scalar) |
Remove last chars if line separator |
chop($scalar) |
Remove the last character of a string |
chown(@array) |
Change owner and group of file list |
chr($scalar) |
Convert number to ASCII equivalent |
close(FILE) |
Close a file |
cos($scalar) |
Return cosine of $scalar in radians |
crypt($v1, $v2) |
Encrypt a string |
defined($scalar) |
Determine whether $scalar is defined |
delete($array{$val}) |
Delete value from associative array |
die(@array) |
Print @array to STDERR and exit |
dump($scalar) |
Generate UNIX core dump |
each(%array) |
Iterate through an associative array |
eof(FILE) |
Check whether FILE is at end of file |
eval($scalar) |
Treat $scalar as a subprogram |
exec(@array) |
Send @array to system as command |
exists($element) |
Does associative array element exist? |
exit($scalar) |
Exit program with status $scalar |
exp($scalar) |
Compute e ** $scalar |
fileno(FILE) |
Return file descriptor for FILE |
fork() |
Create parent and child processes |
getc(FILE) |
Get next character from FILE |
getlogin() |
Get current login from /etc/utmp |
gmtime($scalar) |
Convert time to GMT array |
grep($scalar, @array) |
Find $scalar in @array |
hex($scalar) |
Convert value to hexadecimal |
index($v1, $v2, $v3) |
Find $v2 in $v1 after position $v3 |
int($scalar) |
Return integer portion of $scalar |
join($scalar, @array) |
Join array into single string |
keys(%array) |
Retrieve indexes of associative array |
length($scalar) |
Return length of $scalar |
lc($scalar) |
Convert value to lowercase |
lcfirst($scalar) |
Convert first character to lowercase |
link(FILE1, FILE2) |
Hard link FILE1 to FILE2 |
localtime($scalar) |
Convert time to local array |
log($scalar) |
Get natural logarithm of $scalar |
map($scalar, @array) |
Use each list element in expression |
mkdir(DIR, $scalar) |
Create directory |
oct($string) |
Convert value to octal |
open(FILE, $scalar) |
Open file |
ord($scalar) |
Return ASCII value of character |
pack($scalar, @array) |
Pack array into binary structure |
pipe(FILE1, FILE2) |
Open pair of pipes |
pop(@array) |
Pop last value of array |
pos($scalar) |
Return location of last pattern match |
print(FILE, @array) |
Print string, list or array |
push(@array, @array2) |
Push @array2 onto @array |
quotemeta($string) |
Place backslash before non-word chars |
rand($scalar) |
Return random value |
readlink($scalar) |
Return value of symbolic link |
require($scalar) |
Include library file $scalar |
reverse(@list) |
Reverse order of @list |
rindex($v1, $v2) |
Return last occurrence of $v2 in $v1 |
scalar($val) |
Interpret $val as scalar |
shift(@array) |
Shift off first value of @array |
sin($scalar) |
Return sine of $scalar in radians |
sleep($scalar) |
Sleep for $scalar seconds |
sort(@array) |
Sort @array in alphabetical order |
splice(@a1, $v1, $v2, @a2) |
Replace elements in array |
split($v1, $v2) |
Split scalar into array |
sprintf($scalar, @array) |
Create formatted string |
sqrt($expr) |
Return square root of $expr |
srand($expr) |
Set random number seed |
stat(FILE) |
Retrieve file statistics |
substr($v1, $v2) |
Retrieve substring |
symlink(FILE1, FILE2) |
Create symbolic link |
system(@array) |
Execute system command |
time() |
Get current time |
uc($scalar) |
Convert value to uppercase |
ucfirst($scalar) |
Convert first character to uppercase |
undef($scalar) |
Mark $scalar as undefined |
unlink(@array) |
Unlink a list of files |
unpack($v1, $v2) |
Unpack array from binary structure |
unshift(@a1, @a2) |
Add @a2 to the front of @a1 |
utime(@array) |
Change date stamp on files |
values(%array) |
Return values of associative array |
vec($v1, $v2, $v3) |
Treat string as vector array |
wait() |
Wait for child process to terminate |
wantarray() |
Determine whether a list is expected |
write(FILE) |
Write formatted output |
The $_ Variable
By default, any function that accepts a scalar variable can have its argument
omitted. In this case, Perl uses $_, which is the default scalar variable.
$_ is also the default variable when reading from a file. So, for example,
instead of writing
$var = <STDIN>;
chop($var);
you can write
chop(<STDIN>);
Summary
Perl is a programming language that allows you to write programs that manipulate
files, strings, integers, and arrays quickly and easily.
Perl provides features commonly found in high-level languages such as C; these
features include arrays, references, control structures, subroutines, and object-oriented
capabilities.
Perl is easy to use. Character strings and integers are freely interchangeable;
you don't need to convert an integer to a character string or vice versa. You don't
need to know all of Perl to begin writing useful programs in the language; simple
constructs can be used to solve simple problems.
Perl is also a very flexible language, providing a variety of ways to solve programming
problems.
This combination of simplicity, power, and flexibility makes Perl an attractive
choice.
©Copyright,
Macmillan Computer Publishing. All rights reserved.
|