Базы данных • Интернет • Компьютеры • Операционные системы • Программирование • Сети • Связь • Разное

CGI Programming FAQ: Techniques: "How do I..."<br />

Программы •Железо • Драйверы • Хостинг •Энциклопедия рекламы



This section comprises programming hints and tips for a number of popular

tasks. Also included are a number of common questions to which the answer

is "you can't", with the reasons why.

3.1: Can I get information about who is visiting?



*sigh*

Many people keep mailing me questions or suggested hacks to get

visitor information, particularly email addresses.   It seems they

won't take "NO" for an answer.



The bottom line is that whatever information is available to _you_

is _equally_ available to every spammer on the net.   Therefore when

a browser bug _does_ permit personal data to be collected, it gets

reported and fixed very quickly (one short-lived Netscape 2.0.x

release reportedly had such a bug in its Javascript engine).



You can get some limited information from the environment variables

passed to you by the browser.   Relatively few of these are guaranteed

to be available, and some may be misleading.   For particular types

of information, see below.   For full details, see NCSA's reference pages.

[Table of Contents]

3.2: Can I get the email of visitors?



Why do you want to do this?



The best information available is the REMOTE_ADDR and REMOTE_HOST,

which tell you nothing about the user.   Techniques such as "finger@"

are not reliable, are widely disliked, and generally serve only to

introduce long delays in your CGI.   Better - as well as more polite -

just to ask your users to fill in a form.



BTW: the "From:" header line (HTTP_FROM variable) is usually only set

by robots, since human visitors to your webpage will not normally want

their addresses collected without permission, and browsers respect this.

[Table of Contents]

3.3: "But I saw some.kool.site display my email address..."



Some sites will play party tricks, which can get *some users* email

addresses.   Possible tell-tale signs of this are inordinate delays

loading a page (fingering @REMOTE_HOST - doesn't often work but

probably can't be detected from the webpage), or a submit button that

appears to do nothing at all (a mailto: form - works well with some

browsers but trivially detectable).   As a "snoop" party trick that's

fine, but if you find someone abusing these facilities (eg they send

you junkmail), alert their service provider!

[Table of Contents]

3.4: Can I verify the email addresses people enter in my Form?



Unfortunately people will sometimes enter an incorrect or invalid

email address in your Form.   Worse, they may enter a valid but

incorrect email address that will deliver to someone who doesn't

want your mail.



Proposed regexps to match email addresses are sometimes posted.

Most of these will fail against perfectly valid email addresses,

like "S=N.OTHER/OU1=X12345A/RECIPNUM=1/MTA-BASIC@attmail.com"

(which is what your address looks like if you are connected to

the Internet via X400 - and if you think that example is too easy,

check the ones at the end of Eli the Bearded's Email Addressing FAQ).



Probably the most complete parser and checker available for download

is Tom Christiansen's, at

http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz

Of course, this still says nothing about deliverability.



A frequently-suggested hack that doesn't work is to use

SMTP EXPN or VRFY commands.   Modern versions of sendmail permit

administrators to disable these commands, and many sites take

advantage of this facility to protect their users' privacy.



Probably the best way to verify an email address is to send mail to

it, asking the user to respond.   Include a clause like "if you have

received this mail in error, please accept our apologies..."

[Table of Contents]

3.5: Subject: How can I get the hostname of the remote user?



You can't. Well, not always.



IF it is available, you'll find it in the REMOTE_HOST environment

variable.  However, this will more often than not contain the numerical

IP address rather than the IP name of the remote host. Remember that

not all IP addresses have a hostname associated with them; this is the

case of most IP addresses assigned to dialup users, for example. Your

web server may also not perform a reverse lookup on incoming

connections, in which case REMOTE_HOST will contain the IP address even

if it has a corresponding IP name. In the second case, you can do a

reverse lookup yourself in your script, but this is expensive and

should probably be avoided unless absolutely necessary.



Even if you do manage to obtain a hostname, you should be aware that it

may not correspond to the hostname the user is accessing your page

from. It may instead be that of an intervening proxy host.



The short answer is therefore that there is no reliable way of finding

out what the remote user's hostname is.

[Table of Contents]

3.6: Can I get browser details and return different pages?



Why do you want to do this?



Well-written HTML will display correctly in any browser, so the correct

answer to this question is to design a template for your output in good

HTML, and make sure your output is correct.



If you insist on a different answer, you can use the HTTP_USER_AGENT

environment variable.  This requires care, and can lead to unexpected

results.   For example, checking for "Mozilla" and serving a frameset

to it ensures that you *also* serve the frameset to early (Non-Frame)

Netscapes, me-too browsers (notably Microsoft[1]) and others who have

chosen to lie to you about their browser.



Note also that not every User Agent is a browser.   Your page may be

read by a user agent you've never heard of, and then displayed by

100 different browsers.   Or retrieved by different browsers from

a cache.   Another reason to write good HTML, and not try to

devise a clever or koool substitute.



[1] At the time of writing, only Netscape 2+ supported frames, and

    some authors considered them koool.  That's changed, but the same

    general principle still holds.

[Table of Contents]

3.7: Can I trace where a user has come from/is going to?



HTTP_REFERER might or might not tell you anything.   By all means

use it to collect partial statistics if you participate in (say)

an advertising banner scheme.   But it is not always set, and may

be meaningless (eg if a user has accessed your page from a bookmark,

and the browser is too dumb to cope with this).



The HTTP protocol forbids relying on Referer information for functionality

in your programs, so don't try it.



You cannot trace outgoing links at all.   If you really must try,

point all the external links to your HTTPD and use its redirection

facility (which gives you generally-reliable logs).   This is much

less inefficient than using a CGI script.



BTW: don't even think about asking Javascript to send you information

on some event: it's a violation of privacy which Netscape fixed as

soon as complaints about its abuse started coming in.   If it works

with *your* browser, you should upgrade!

[Table of Contents]

3.8: Can I launch a long process and return a page before it's finished?



[UNIX]

You have to fork/spawn the long-running process.

The important thing to remember is to close all its file descriptors;

otherwise nothing will be returned to the browser until it's finished.

The standard trick to accomplish this is redirection to/from /dev/null:



        "long_process < /dev/null > /dev/null 2>&1 &"

        print HTML page as usual

[Table of Contents]

3.9: Can I launch a long process which the user interacts with?



This does not fit well with the basic mechanics of the Web, in which

each transaction comprises a single request and response.

If your processing can be done on the Client machine, you can use

a clientside application; for example a Java applet.



For processing on the server, one trick that works well for Clients

running an X server (and far more efficient than a JAVA solution) is:

  if ( fork() ) {

    print HTML page explaining what's going on and advising about xhost

  } else {

    exec ("xterm -display THEIR_DISPLAY -title MY_APP -e MY_PROG ARGS

        < /dev/null > /dev/null 2>&1 &") ;

  }

NOTE: THEIR_DISPLAY is not necessarily the same as REMOTE_HOST or REMOTE_ADDR.

You have to ask users to supply their display (set REMOTE_HOST as default).



A JAVA terminal program will accomplish something similar for the many

users with platforms that support JAVA but not X.

[Table of Contents]

3.10: Can I password-protect my pages?



Yes.   Use your HTTPD's authentication, just as you would a basic HTML page.

Now you'll have the identity of every visitor in REMOTE_USER.

[Table of Contents]

3.11: Can I do HTTP authentication using CGI?



It depends on which version of the question you asked.



Yes, you can use CGI to trigger the browser's standard Username/Password

dialogue.   Send a response code 401, together with a "WWW-authenticate"

header including details of the the authentication scheme and realm:

e.g. (in a non-NPH script)



	Status: 401 Unauthorized to access the document

	WWW-authenticate: Basic realm="foobar"

	Content-type: text/plain



	Unauthorised to access this document



The use you can make of this is server-dependent, and harder,

since most servers expect to deal with authentication before ever

reaching the CGI (eg through .www_acl or .htaccess).

Thus it cannot usefully replace the standard login sequence, although

it can be applied to other situations, such as re-validating a user -

e.g after a certain timeout period or if the same person may need to

login under more than one userid.



What you can never get in CGI is the credentials returned by the user.

The HTTPD takes care of this, and simply sets REMOTE_USER to the

username if the correct password was entered.



For a much longer but outdated discussion of this question,

see my discussion at http://www.webthing.com/tutorials/login.html

[Table of Contents]

3.12: Can I identify users/sessions without password protection?



The most usual (but browser-dependent) way to do this is to set a cookie.

If you do this, you are accepting that not all users will have a 'session'.



An alternative is to pass a session ID in every GET URL, and in hidden

fields of POST requests.   This can be a big overhead unless _every_ page

requires CGI in any case.



Another alternative is the Hyper-G[1] solution of encoding a session-id in

the URLs of pages returned:

	http://hyper-g.server/session_id/real/path/to/page

This has the drawback of making the URLs very confusing, and causes any

bookmarked pages to generate old session_ids.



Note that a session ID based solely on REMOTE_HOST (or REMOTE_ADDR)

will NOT work, as multiple users may access your pages concurrently

from the same machine.



[1] Actually I don't think that's been true of Hyper-G since sometime

in '96.  However, general advances in web server technology, such as

Apache's mod_alias or mod_rewrite, make it straightforward without

the need for CGI.

[Table of Contents]

3.13: Can I redirect users to another page?



For permanent and simple redirection, use the HTTPD configuration file:

it's much more efficient than doing it yourself.   Some servers enable

you to do this using a file in your own directory (eg Apache) whereas

others use a single configuration file (eg CERN).



For more complicated cases (eg process form inputs and conditionally

redirect the user), use the "Location:" response header.

If the redirection is itself a CGI script,  it is easy to URLencode

parameters to it in a GET request, but don't forget to escape the URL!

[Table of Contents]

3.14: Can I run a CGI script without returning a new page to the browser?



Yes, but think carefully first:  How are your readers going to know

that their "submit" has succeeded?   They may hit 'submit' many times!



The correct solution according to the HTTP specification is to

return HTTP status code 204.   As an NPH script, this would be:



	#!/bin/sh

	# do processing (or launch it as background job)

	echo "HTTP/1.0 204 No Change"

	echo



(as non-NPH, you'd simply replace HTTP/1.0 with the Status: CGI header).



Alan J Flavell has pointed out that this will fail with certain

popular browsers, and suggests a workaround to accommodate them:



[ May 1998 update[1]: I'm deleting Alan's suggestion, because the problem

  is mainly of historical interest, and the workaround is no longer

  recommended.  See his page for a a detailed survey and recommendations.

]



His survey is at

http://ppewww.ph.gla.ac.uk/%7Eflavell/status204/results.html



[1] With apologies to Alan for having left it in so long.

[Table of Contents]

3.15: Can I write output to a different Netscape frame?



Yep.   The fact you're using CGI makes no difference: use

"target=" in your links as usual.   Alternatively, the script

can print a "Window-target:" header.   Read Netscape's pages

for detail: these answer all the questions about things like

"getting rid of" or "breaking out of" frames, too.

[Table of Contents]

3.16: Can I write output to several frames at once?



A single CGI script can only ever print to one frame.



However, this limitation may be overcome by using more than one script.

The first script (the URL of the "submit" button) prints a frameset,

typically to a "_parent" or "_top" target.   The sources for one or

more of the frames thus generated may also be CGI scripts, to which

you can easily pass parameters (eg encoded in URLs with method GET).

This hack is definitely not recommended.   If you find yourself wanting

to update several frames from a single user event, it probably means

you should review the design of your application at a higher level.



Warnings:

 1. Don't forget to escape your URLs.

 2. This technique results in your server being hit by multiple 

    concurrent CGI requests.   You'll need LOTS of memory, especially

    if you use a memory-hog like Perl.   It can be a good recipe

    for bringing a server to its knees.



Javascript is often a valid alternative here, but note just how silly

it can (and often does) look in a different browser.

[Table of Contents]

3.17: Can I use a CGI script to generate both text and inline images?



Not directly.   One script generates one response to one request.



If you want to generate a dynamic page including dynamic images

(say, a report including graphs, all of which depend on user input)

then your primary script will print the usual

   <img src="899/[script-to-generate-image]" alt="[what you asked for]">

and, just as in the multiple frames case, you can pass data to the

image-generating program encoded in a GET URL.   Of course, the same

caveats apply: see above.

[Table of Contents]

3.18: How can I use Caches to make CGI scripts faster and more Net-friendly?



This is currently beyond the scope of this FAQ.   However,

there is an excellent introduction to net-friendly webpages, including

CGI pages, at http://vancouver-webpages.com/CacheNow/



A sample cacheing perl/cgi script by Andrew Daviel is available at

http://vancouver-webpages.com/proxy/log-tail.pl

[Table of Contents]

3.19: How can I avoid users hitting "submit" twice?



You can't.   You just have to deal with it when they do.



You can avoid re-processing a submission by embedding a unique ID in your

Form each time it is displayed.   When you process the form, you enter

the ID in a database.  Or, if it's already there, you don't repeat the

processing.



You probably want to expire your database entries after a little time:

an hour should be fine in a typical situation.



If you're already using cookies (e.g. a shoppingcart), an alternative is

to use the cookie as a unique identifier.   This means you also have to

handle the situation where a user deliberately "goes round twice" and

submits the same form with different contents.



If your script may take some time to process, you should also consider

running it as a background job, and returning an immediate

acknowledgement to the user (see above if your "immediate" response

gets delayed until processing is complete in any case).

[Table of Contents]

3.20: How can I stop my CGI script reading and writing files as "nobody"?



CGI scripts are run by the HTTPD, and therefore by the UID of the HTTPD

process, which is (by convention) usually a special user "nobody".



There are two basic ways to run a script under your own userid:

(1) The direct approach: use a setuid program.

(2) The double-server approach: have your CGI script communicate

    with a second process (e.g. a daemon) running under your userid,

    which is responsible for the actual file management.



The direct approach is usually faster, but the client-server architecture

may help with other problems, such as maintaining integrity of a database.



When running a compiled CGI program (e.g. C, C++), you can make it

setuid by simply setting the setuid bit:

e.g. "chmod 4755 myprog.cgi"



For security reasons, this is not possible with scripting languages

(eg Perl, Tcl, shell).   A workaround is to run them from a setuid

program, such as cgiwrap.



In most cases where you'd want to use the client-server approach,

the server is a finished product (such as an SQL server) with its

own CGI interface.

A lightweight alternative to this is Don Libes' "expect" package.



Note that any program running under your userid has access to all your

files, and could do serious damage if hacked.   Take care!

[Table of Contents]

3.21: How can I prevent my CGI results being cached by the browser?



Firstly, we need to debunk a myth.  People asking this question usually

add that they tried "Pragma: no-cache".  Whilst this is not actively

wrong, there is no requirement on browsers to take any notice of it,

and most of them don't.



The "Pragma: no-cache" header (now superseded by HTTP/1.1 Cache-Control)

is a directive to proxies.  The browser sends it with an HTTP request

to indicate that it wants the request to be dealt with by the original

server and will not accept a proxy's cached document (e.g. when you

use a reload button).  The server may send it to tell a proxy not to

cache the document.



Having said all that, a practical hack to get round cacheing is

to use a different URL for your CGI script each time it's called.

This can easily be accomplished by adding a unique identifier such

as current time in the QUERY_STRING or PATH_INFO.  The browser will

see a different URL, but the script can just ignore it.  Note that

this can be very inefficient, and should be avoided where possible.

[Table of Contents]

3.22: How can I control the default filename when downloading a file via CGI?



	(from a newsgroup post by Matthew Healy)



One option, assuming you aren't already using the PATH_INFO

environment variable, is just to call your CGI script with extra

path information.



For example, suppose the URL to your script is actually



http://server.com/scriptname?name1=value1&name2=value2



Instead, try calling it as



http://server.com/scriptname/filename.ext?name1=value1&name2=value2



and note that you need to escape the URL if it's in an HTML page:



http://server.com/scriptname/filename.ext?name1=value1&amp;name2=value2



And probably the browser will assign the name given in the last chunk

as the suggested filename for downloading.



This works because the http server looks for the program file to run,

then passes any extra path to the program as PATH_INFO variable; the

browser cannot tell where the SCRIPT_NAME part ends and the PATH_INFO

part begins.



This can also be very useful if you want one script to generate more

than one filename -- the script can check the PATH_INFO value and

alter its response accordingly...

[Table of Contents]

Назад в раздел

Emanual.ru – это сайт, посвящённый всем значимым событиям в IT-индустрии: новейшие разработки, уникальные методы и горячие новости! Тонны информации, полезной как для обычных пользователей, так и для самых продвинутых программистов! Интересные обсуждения на актуальные темы и огромная аудитория, которая может быть интересна широкому кругу рекламодателей. У нас вы узнаете всё о компьютерах, базах данных, операционных системах, сетях, инфраструктурах, связях и программированию на популярных языках!


Copyright © 2001-2026

Реклама на сайте