This section aims to deal with basic questions, addressing the role and
nature of CGI, and its place in Web programming. Questions/answers which
just don't appear to 'fit' under any other section may also be included
here.
The Common Gateway Interface, or CGI, is a standard for external
gateway programs to interface with information servers such as HTTP servers.
A plain HTML document that the Web daemon retrieves is static,
which means it exists in a constant state: a text file that doesn't change.
A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.
The distinction is semantic. Traditionally, compiled executables
(binaries) are called programs, and interpreted programs are usually
called scripts. In the context of CGI, the distinction has become
even more blurred than before. The words are often used interchangably
(including in this document). Current usage favours the word "scripts"
for CGI programs.
There are innumerable caveats to this answer, but basically any
Webpage containing a form will require a CGI script or program
to process the form inputs.
[answer to this non-question hopes to try and reduce the noise level of
the recurrent "CGI vs JAVA" threads].
CGI and JAVA are fundamentally different, and for most applications
are NOT interchangable.
CGI is a protocol for running programs on a WWW server. Whilst JAVA
can also be used for that, and even has a standardised API (the servlet,
which is indeed an alternative to CGI), the major role of JAVA on the
Web is for clientside programming (the applet).
In certain instances the two may be combined in a single application:
for example a JAVA applet to define a region of interest from a
geographical map, together with a CGI script to process a query
for the area defined.
CGI and SSI (Server-Side Includes) are often interchangable, and it may
be no more than a matter of personal preference. Here are a few
guidelines:
1) CGI is a common standard agreed and supported by all major HTTPDs.
SSI is NOT a common standard, but an innovation of NCSA's HTTPD
which has been widely adopted in later servers. CGI has the
greatest portability, if this is an issue.
2) If your requirement is sufficiently simple that it can be done
by SSI without invoking an exec, then SSI will probably be
more efficient. A typical application would be to include
sitewide 'house styles', such as toolbars, netscapeised <body>
tags or embedded CSS stylesheets.
3) For more complex applications - like processing a form -
where you need to exec (run) a program in any case, CGI
is usually the best choice.
4) If your transaction returns a response that is not an HTML page,
SSI is not an option at all.
Many more recent variants on the theme of SSI are now available.
Probably the best-known are PHP which embeds server-side scripting
in a pre-html page, and ASP which is Microsoft's version of a
similar interface.
APIs are proprietary programming interfaces supported by particular
platforms. By using an API, you lose all portability. If you know
your application will only ever run on one platform (OS and HTTPD),
and it has a suitable API, go ahead and use it. Otherwise stick to CGI.
Too many to enumerate - but I'll try and summarise. Briefly, there
are several decisions you have to make, including:
* Power. Is it up to a complex task?
* Complexity. How much programming manpower is it worth?
* Portability. Might you want to run your program on another system?
So here's an overview of the main options. It's inevitably subjective,
but may be helpful to someone:
Basic SSI: Simple interface for basic dynamic content.
Non-standard - read your server docs.
Enhanced SSI[1]: Suitable for more complex tasks within
an HTML page.
CGI: The standardised, portable general-purpose API,
not limited to working with HTML pages.
Enhanced CGI-like[2]: Typically gain efficiency but lose portability
compared to standard CGI.
Servlets: An alternative API for JAVA, that overcomes
the limitation of JAVA not supporting
environment variables.
Server API: Generally the most powerful and most complex option.
[1] For example, PHP, ASP.
[2] For example, CGI adapted to mod_perl or fastcgi.
If you're already a programmer, CGI is extremely straightforward, and just
three resources should get you up to speed in the time it takes to read them:
1) Installation notes for your HTTPD. Is it configured to run CGI
scripts, and if so how does it identify that a URL should be executed?
(Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't
find it ask your server administrator).
2) The CGI specification at NCSA tells you all you need to know
to get your programs running as CGI applications. http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
3) WWW Security FAQ. This is not required to 'get it working', but
is essential reading if you want to KEEP it working! http://www.w3.org/Security/Faq/www-security-faq.html
If you're NOT already a programmer, you'll have to learn. If you would
find it hard to write, say, a 'grep' or 'cat' utility to run from the
commandline, then you will probably have a hard time with CGI. Make
sure your programs work from the commandline BEFORE trying them with CGI,
so that at least one possible source of errors has been dealt with.
Yes. Period.
There is a lot you can do to minimise these. The most important thing
to do is read and understand Lincoln Stein's excellent WWW security
FAQ, at http://www.w3.org/Security/Faq/www-security-faq.html
No, but it helps. The Web, along with the Internet itself, C, Perl,
and almost every other Good Thing in the last 20 years of computing,
originated in Unix. At the time of writing, this is still the
most mature and best-supported platform for Web applications.
No - you can use any programming language you please. Perl is simply
today's most popular choice for CGI applications. Some other widely-
used languages are C, C++, TCL, BASIC and - for simple tasks -
even shell scripts.
Reasons for choosing Perl include its powerful text manipulation
capabilities (in particular the 'regular' expression) and the fantastic
WWW support modules available.
It isn't really that important. Use what you're comfortable with,
or what you're constrained (eg by your manager) to use.
If you're just dabbling with programming, Perl is a good choice, simply
because of the wealth of ready-to-run Perl/CGI resources available.
If you're serious about programming, you should be at home in a
range of languages. C, the industry standard, is a must (at least to
the level of comfortably reading other people's code). You'll
certainly want at least one scripting language such as Perl, Python
or Tcl. C++ is also a good idea.
In response to a Usenet newbie question:
> I am seriously wanting to learn some CGI programming languages
J.M. Ivler wrote some eloquent words of wisdom:
> If you want to learn a programming language, learn a programming language.
> If you want to learn how to do CGI programming, learn a programming
> language first.
>
> My book is one of the few that tackles two languages at the same time.
> Why? because it's not about languages (which are just syntax for logic).
> CGI programming is about programming, and how to leverage the experience
> for the person coming to the site, or maintaining the site, or in some way
> meeting some requirements. Language is just a tool to do so.
These types of filenames are commonly used conventions - no more.
It is up to the server administrator whether or not CGI scripts are
enabled, and (if so) what conventions tell the server to run or
to print them.
If you are running your own server, read the manual.
If you're on ISP or other rented webspace, check their webpages for
information or FAQs. As a last resort, ask the server administrator.
The CGI Overhead is a consequence of HTTP being a stateless protocol.
This means that a CGI process must be initialised for every "hit"
from a browser.
In the first instance, this usually means the server forking a
new process. This in itself is a modest overhead, but it can
become important on a heavily-used server if the number of
processes grows to problem levels.
In the second place, the CGI program must initialise. In the
case of a compiled language such as C or C++ this is negligible,
but there is a small penalty to pay for scripting languages such as Perl.
Thirdly, CGI is often used as 'glue' to a backend program, such as
a database, which may take some considerable time to initialise.
This represents a major overhead, which must be avoided in any
serious application. The most usual solution is for the backend
program to run as a separate server doing most of the work, while
the actual CGI simply carries messages.
Fourthly, some CGI scripts are just plain inefficient, and may
take hundreds of times the resources they need. Programs using
system() or `backtick` notation often fall into this category.
Note that there are ways to reduce or eliminate all these overheads,
but these tend to be system- or server-specific. The best-supported
server is probably Apache, as commercial server-vendors may prefer to
push their proprietary solutions in preference to CGI.
Unix systems are designed for multiple users, and include provision
for protecting your work from unauthorised access by other users
of the system. The file permissions determine who is permitted
to do what with your programs, data, and directories. The command
that sets file permissions is chmod.
Web servers typically run as user "nobody". That means that, setting
aside serious bugs (such as those in certain versions of the Frontpage
extensions), your files are absolutely secure from damage through the
webserver. It also means that you may have to make explicit changes to
enable the server to access them in a CGI context.
There are two ways to run CGI:
- by default they run as the webserver user (nobody)
For most purposes this is safest, as your programs and data
are protected by the operating system from unauthorised access
through possible bugs in your CGI. However, when the CGI has
to write to a file, that file must be writable to every web
user on the system, and is therefore completely unprotected.
- setuid, they run under your own userid.
This means that files written by your CGI can be secure.
On the other hand, any bugs in your CGI could now compromise
*all* your programs and data on the server.
As an elementary security precaution, scripts (e.g. Perl) are
prevented from running setuid by most OSs. The "cgiwrap"
program offers a workaround for this.
A third way you should *never* permit CGI to be run is:
- as root or setuid root, they can run as any user.
This is extremely dangerous, as any bugs could compromise the
entire server, including every user's files. Fortunately only
the system administrator can install setuid root programs. If
you are *at all* concerned about security, make sure that no such
programs (in particular Frontpage extensions) are installed,
regardless of whether you use them yourself.
For a proper overview, "man chmod". Some modes that may be useful
in a typical CGI context are:
* CGI programs, 0755
* data files to be readable by CGI, 0644
* directories for data used by CGI, 0755
* data files to be writable by CGI, 0666 (data has absolutely no security)
* directories for data used by CGI with write access, 0777 (no security)
* CGI programs to run setuid, 4755
* data files for setuid CGI programs, 0600 or 0644
* directories for data used by setuid CGI programs, 0700 or 0755
* For a typical backend server process, 4750
Finally, if this answer tells you anything you didn't already know,
don't even think about trying to set up a secure server!
> CGIWrap is a gateway program that allows general users to use CGI scripts
> and HTML forms without compromising the security of the http server.
> Scripts are run with the permissions of the user who owns the script. In
> addition, several security checks are performed on the script, which will not
> be executed if any checks fail.
>
> CGIWrap is used via a URL in an HTML document. As distributed, cgiwrap
> is configured to run user scripts which are located in the
> ~/public_html/cgi-bin/ directory.
The normal format for data in HTTP requests is URLencoded. All Form data
is encoded in a string, of the form
param1=value1¶m2=value2&...paramn=valuen
Many non-alphanumeric characters are "escaped" in the encoding:
the character whose hexadecimal number is "XY" will be represented by
the character string "%XY".
Decoding this string is a fundamental function of every CGI library.
Another format is "multipart/form-data", also known as "file upload".
You will get this from the HTML markup
<form method="POST" enctype="multipart/form-data">
(but note you must accept URLencoded input in any case, since not all
browsers support multipart forms).
Most(?) CGI libraries will handle this transparently.
Emanual.ru – это сайт, посвящённый всем значимым событиям в IT-индустрии: новейшие разработки, уникальные методы и горячие новости! Тонны информации, полезной как для обычных пользователей, так и для самых продвинутых программистов! Интересные обсуждения на актуальные темы и огромная аудитория, которая может быть интересна широкому кругу рекламодателей. У нас вы узнаете всё о компьютерах, базах данных, операционных системах, сетях, инфраструктурах, связях и программированию на популярных языках!