Teach Yourself SQL in 21 Days, Second Edition
- Day 1 -
Introduction to SQL
A Brief History of SQL
The history of SQL begins in an IBM laboratory in San Jose, California, where
SQL was developed in the late 1970s. The initials stand for Structured Query Language,
and the language itself is often referred to as "sequel." It was originally
developed for IBM's DB2 product (a relational database management system, or RDBMS,
that can still be bought today for various platforms and environments). In fact,
SQL makes an RDBMS possible. SQL is a nonprocedural language, in contrast to the
procedural or third-generation languages (3GLs) such as COBOL and C that had been
created up to that time.
NOTE: Nonprocedural means what
rather than how. For example, SQL describes what data to retrieve, delete,
or insert, rather than how to perform the operation.
The characteristic that differentiates a DBMS from an RDBMS is that the RDBMS
provides a set-oriented database language. For most RDBMSs, this set-oriented database
language is SQL. Set oriented means that SQL processes sets of data in groups.
Two standards organizations, the American National Standards Institute (ANSI)
and the International Standards Organization (ISO), currently promote SQL standards
to industry. The ANSI-92 standard is the standard for the SQL used throughout this
book. Although these standard-making bodies prepare standards for database system
designers to follow, all database products differ from the ANSI standard to some
degree. In addition, most systems provide some proprietary extensions to SQL that
extend the language into a true procedural language. We have used various RDBMSs
to prepare the examples in this book to give you an idea of what to expect from the
common database systems. (We discuss procedural SQL--known as PL/SQL--on Day 18,
"PL/SQL: An Introduction," and Transact-SQL on Day 19, "Transact-SQL:
An Introduction.")
A Brief History of Databases
A little background on the evolution of databases and database theory will help
you understand the workings of SQL. Database systems store information in every conceivable
business environment. From large tracking databases such as airline reservation systems
to a child's baseball card collection, database systems store and distribute the
data that we depend on. Until the last few years, large database systems could be
run only on large mainframe computers. These machines have traditionally been expensive
to design, purchase, and maintain. However, today's generation of powerful, inexpensive
workstation computers enables programmers to design software that maintains and distributes
data quickly and inexpensively.
Dr. Codd's 12 Rules for a Relational Database Model
The most popular data storage model is the relational database, which grew from
the seminal paper "A Relational Model of Data for Large Shared Data Banks,"
written by Dr. E. F. Codd in 1970. SQL evolved to service the concepts of the relational
database model. Dr. Codd defined 13 rules, oddly enough referred to as Codd's 12
Rules, for the relational model:
- 0. A relational DBMS must be able to manage databases entirely through
its relational capabilities.
1. Information rule-- All information in a relational database (including
table and column names) is represented explicitly as values in tables.
2. Guaranteed access--Every value in a relational database is guaranteed to
be accessible by using a combination of the table name, primary key value, and column
name.
3. Systematic null value support--The DBMS provides systematic support for
the treatment of null values (unknown or inapplicable data), distinct from default
values, and independent of any domain.
4. Active, online relational catalog--The description of the database and
its contents is represented at the logical level as tables and can therefore be queried
using the database language.
5. Comprehensive data sublanguage--At least one supported language must have
a well-defined syntax and be comprehensive. It must support data definition, manipulation,
integrity rules, authorization, and transactions.
6. View updating rule--All views that are theoretically updatable can be updated
through the system.
7. Set-level insertion, update, and deletion--The DBMS supports not only set-level
retrievals but also set-level inserts, updates, and deletes.
8. Physical data independence--Application programs and ad hoc programs are
logically unaffected when physical access methods or storage structures are altered.
9. Logical data independence--Application programs and ad hoc programs are
logically unaffected, to the extent possible, when changes are made to the table
structures.
10. Integrity independence--The database language must be capable of defining
integrity rules. They must be stored in the online catalog, and they cannot be bypassed.
11. Distribution independence--Application programs and ad hoc requests are
logically unaffected when data is first distributed or when it is redistributed.
12. Nonsubversion--It must not be possible to bypass the integrity rules defined
through the database language by using lower-level languages.
Most databases have had a "parent/child" relationship; that is, a parent
node would contain file pointers to its children. (See Figure 1.1.)
Figure 1.1.
Codd's relational database management system.
This method has several advantages and many disadvantages. In its favor is the
fact that the physical structure of data on a disk becomes unimportant. The programmer
simply stores pointers to the next location, so data can be accessed in this manner.
Also, data can be added and deleted easily. However, different groups of information
could not be easily joined to form new information. The format of the data on the
disk could not be arbitrarily changed after the database was created. Doing so would
require the creation of a new database structure.
Codd's idea for an RDBMS uses the mathematical concepts of relational algebra
to break down data into sets and related common subsets.
Because information can naturally be grouped into distinct sets, Dr. Codd organized
his database system around this concept. Under the relational model, data is separated
into sets that resemble a table structure. This table structure consists of individual
data elements called columns or fields. A single set of a group of fields is known
as a record or row. For instance, to create a relational database consisting of employee
data, you might start with a table called EMPLOYEE that contains the following
pieces of information: Name, Age, and Occupation. These
three pieces of data make up the fields in the EMPLOYEE table, shown in
Table 1.1.
Table 1.1. The EMPLOYEE table.
Name |
Age |
Occupation |
Will Williams |
25 |
Electrical engineer |
Dave Davidson |
34 |
Museum curator |
Jan Janis |
42 |
Chef |
Bill Jackson |
19 |
Student |
Don DeMarco |
32 |
Game programmer |
Becky Boudreaux |
25 |
Model |
The six rows are the records in the EMPLOYEE table. To retrieve a specific
record from this table, for example, Dave Davidson, a user would instruct the database
management system to retrieve the records where the NAME field was equal
to Dave Davidson. If the DBMS had been instructed to retrieve all the fields in the
record, the employee's name, age, and occupation would be returned to the user. SQL
is the language that tells the database to retrieve this data. A sample SQL statement
that makes this query is
SELECT *
FROM EMPLOYEE
Remember that the exact syntax is not important at this point. We cover this topic
in much greater detail beginning tomorrow.
Because the various data items can be grouped according to obvious relationships
(such as the relationship of Employee Name to Employee Age), the
relational database model gives the database designer a great deal of flexibility
to describe the relationships between the data elements. Through the mathematical
concepts of join and union, relational databases can quickly retrieve pieces of data
from different sets (tables) and return them to the user or program as one "joined"
collection of data. (See Figure 1.2.) The join feature enables the designer to store
sets of information in separate tables to reduce repetition.
Figure 1.2.
The join feature.
Figure 1.3 shows a union. The union would return only data common to both sources.
Figure 1.3.
The union feature.
Here's a simple example that shows how data can be logically divided between two
tables. Table 1.2 is called RESPONSIBILITIES and contains two fields: NAME
and DUTIES.
Table 1.2. The RESPONSIBILITIES table.
Name |
Duties |
Becky Boudreaux |
Smile |
Becky Boudreaux |
Walk |
Bill Jackson |
Study |
Bill Jackson |
Interview for jobs |
It would be improper to duplicate the employee's AGE and OCCUPATION
fields for each record. Over time, unnecessary duplication of data would waste a
great deal of hard disk space and increase access time for the RDBMS. However, if
NAME and DUTIES were stored in a separate table named RESPONSIBILITIES,
the user could join the RESPONSIBILITIES and EMPLOYEE tables on
the NAME field. Instructing the RDBMS to retrieve all fields from the RESPONSIBILITIES
and EMPLOYEE tables where the NAME field equals Becky Boudreaux
would return Table 1.3.
Table 1.3. Return values from retrieval where NAME
equals Becky Boudreaux.
Name |
Age |
Occupation |
Duties |
Becky Boudreaux |
25 |
Model |
Smile |
Becky Boudreaux |
25 |
Model |
Walk |
More detailed examples of joins begin on Day 6, "Joining Tables."
Designing the Database Structure
The most important decision for a database designer, after the hardware platform
and the RDBMS have been chosen, is the structure of the tables. Decisions made at
this stage of the design can affect performance and programming later during the
development process. The process of separating data into distinct, unique sets is
called normalization.
Today's Database Landscape
Computing technology has made a permanent change in the ways businesses work around
the world. Information that was at one time stored in warehouses full of filing cabinets
can now be accessed instantaneously at the click of a mouse button. Orders placed
by customers in foreign countries can now be instantly processed on the floor of
a manufacturing facility. Although 20 years ago much of this information had been
transported onto corporate mainframe databases, offices still operated in a batch-processing
environment. If a query needed to be performed, someone notified the management information
systems (MIS) department; the requested data was delivered as soon as possible (though
often not soon enough).
In addition to the development of the relational database model, two technologies
led to the rapid growth of what are now called client/server database systems. The
first important technology was the personal computer. Inexpensive, easy-to-use applications
such as Lotus 1-2-3 and Word Perfect enabled employees (and home computer users)
to create documents and manage data quickly and accurately. Users became accustomed
to continually upgrading systems because the rate of change was so rapid, even as
the price of the more advanced systems continued to fall.
The second important technology was the local area network (LAN) and its integration
into offices across the world. Although users were accustomed to terminal connections
to a corporate mainframe, now word processing files could be stored locally within
an office and accessed from any computer attached to the network. After the Apple
Macintosh introduced a friendly graphical user interface, computers were not only
inexpensive and powerful but also easy to use. In addition, they could be accessed
from remote sites, and large amounts of data could be off-loaded to departmental
data servers.
During this time of rapid change and advancement, a new type of system appeared.
Called client/server development because processing is split between client
computers and a database server, this new breed of application was a radical change
from mainframe-based application programming. Among the many advantages of this type
of architecture are
- Reduced maintenance costs
- Reduced network load (processing occurs on database server or client computer)
- Multiple operating systems that can interoperate as long as they share a common
network protocol
- Improved data integrity owing to centralized data location
In Implementing Client/Server Computing, Bernard H. Boar defines client/server
computing as follows:
- Client/server computing is a processing model in which a single application
is partitioned between multiple processors (front-end and back-end) and the processors
cooperate (transparent to the end user) to complete the processing as a single unified
task. Implementing Client/Server Computing A client/server bond product ties the
processors together to provide a single system image (illusion). Shareable resources
are positioned as requestor clients that access authorized services. The architecture
is endlessly recursive; in turn, servers can become clients and request services
of other servers on the network, and so on and so on.
This type of application development requires an entirely new set of programming
skills. User interface programming is now written for graphical user interfaces,
whether it be MS Windows, IBM OS/2, Apple Macintosh, or the UNIX X-Window system.
Using SQL and a network connection, the application can interface to a database residing
on a remote server. The increased power of personal computer hardware enables critical
database information to be stored on a relatively inexpensive standalone server.
In addition, this server can be replaced later with little or no change to the client
applications.
A Cross-Product Language
You can apply the basic concepts introduced in this book in many environments--for
example, Microsoft Access running on a single-user Windows application or SQL Server
running with 100 user connections. One of SQL's greatest benefits is that it is truly
a cross-platform language and a cross-product language. Because it is also what programmers
refer to as a high-level or fourth-generation language (4GL), a large amount of work
can be donehigher-level language 4GL (fourth-generation) language fourth-generation
(4GL) language in fewer lines of code.
Early Implementations
Oracle Corporation released the first commercial RDBMS that used SQL. Although
the original versions were developed for VAX/VMS systems, Oracle was one of the first
vendors to release a DOS version of its RDBMS. (Oracle is now available on more than
70 platforms.) In the mid-1980s Sybase released its RDBMS, SQL Server. With client
libraries for database access, support for stored procedures (discussed on Day 14,
"Dynamic Uses of SQL"), and interoperability with various networks, SQL
Server became a successful product, particularly in client/server environments. One
of the strongest points for both of theseSQL Server powerful database systems is
their scalability across platforms. C language code (combined with SQL) written for
Oracle on a PC is virtually identical to its counterpart written for an Oracle database
running on a VAX system.
SQL and Client/Server Application Development
The common thread that runs throughout client/server application development is
the use client/server computing of SQL and relational databases. Also, using this
database technology in a single-user business application positions the application
for future growth.
An Overview of SQL
SQL is the de facto standard language used to manipulate and retrieve data from
these relational databases. SQL enables a programmer or database administrator to
do the following:
- Modify a database's structure
- Change system security settings
- Add user permissions on databases or tables
- Query a database for information
- Update the contents of a database
NOTE: The term SQL can be confusing. The S,
for Structured, and the L, for Language, are straightforward enough, but the
Q is a little misleading. Q, of course, stands for "Query,"
which--if taken literally--would restrict you to asking the database questions. But
SQL does much more than ask questions. With SQL you can also create tables, add data,
delete data, splice data together, trigger actions based on changes to the database,
and store your queries within your program or database.
Unfortunately, there is no good substitute for Query. Obviously, Structured
Add Modify Delete Join Store Trigger and Query Language (SAMDJSTQL) is a bit cumbersome.
In the interest of harmony, we will stay with SQL. However, you now know that its
function is bigger than its name.
The most commonly used statement in SQL is the SELECT statement (see
Day 2, "Introduction to the Query: The SELECT Statement"), which
retrieves data from the database and returns the data to the user. The EMPLOYEE
table example illustrates a typical example of a SELECT statement situation.
In addition to the SELECT statement, SQL provides statements for creating
new databases, tables, fields, and indexes, as well as statements for inserting and
deleting records. ANSI SQL also recommends a core group of data manipulation functions.
As you will find out, many database systems also have tools for ensuring data integrity
and enforcing security (see Day 11, "Controlling Transactions") that enable
programmers to stop the execution of a group of commands if a certain condition occurs.
Popular SQL Implementations
This section introduces some of the more popular implementations of SQL, each
of which has its own strengths and weaknesses. Where some implementations of SQL
have been developed for PC use and easy user interactivity, others have been developed
to accommodate very large databases (VLDB). This sections introduces selected key
features of some implementations.
NOTE: In addition to serving as an SQL
reference, this book also contains many practical software development examples.
SQL is useful only when it solves your real-world problems, which occur inside your
code.
Microsoft Access
We use Microsoft Access, a PC-based DBMS, to illustrate some of the examples in
this text. Access is very easy to use. You can use GUI tools or manually enter your
SQL statements.
Personal Oracle7
We use Personal Oracle7, which represents the larger corporate database world,
to demonstrate command-line SQL and database management techniques. (These techniques
are important because the days of the standalone machine are drawing to an end, as
are the days when knowing one database or one operating system was enough.) In command-line
RÊl, simple stand+[cedilla]one SQL statements are entered into Oracle's
SQL*Plus tool. This tool then returns data to the screen for the user to see, or
it performs the appropriate action on the database.
Most examples are directed toward the beginning programmer or first-time user
of SQL. We begin with the simplest of SQL statements and advance to the topics of
transaction management and stored procedure programming. The Oracle RDBMS is distributed
with a full complement of development tools. It includes a C++ and Visual Basic language
library (Oracle Objects for OLE) that can link an application to a Personal Oracle
database. It also comes with graphical tools for database, user, and object administration,
as well as the SQL*Loader utility, which is used to import and export data to and
from Oracle.
NOTE: Personal Oracle7 is a scaled-down
version of the full-blown Oracle7 server product. Personal Oracle7 allows only single-user
connections (as the name implies). However, the SQL syntax used on this product is
identical to that used on the larger, more expensive versions of Oracle. In addition,
the tools used in Personal Oracle7 have much in common with the Oracle7 product.
We chose the Personal Oracle7 RDBMS for several reasons:
- It includes nearly all the tools needed to demonstrate the topics discussed in
this book.
- It is available on virtually every platform in use today and is one of the most
popular RDBMS products worldwide.
- A 90-day trial copy can be downloaded from Oracle Corporation's World Wide Web
server (http://www.oracle.com).
Figure 1.4 shows SQL*Plus from this suite of tools.
Figure 1.4.
Oracle's SQL*Plus.
TIP: Keep in mind that nearly all the
SQL code given in this book is portable to other database management systems. In
cases where syntax differs greatly among different vendors' products, examples are
given to illustrate these differences.
Microsoft Query
Microsoft Query (see Figure 1.5) is a useful query tool that comes packaged with
Microsoft's Windows development tools, Visual C++, and Visual Basic. It uses the
ODBC standard to communicate with underlying databases. Microsoft Query passes SQL
statements to a driver, which processes the statements before passing them to a database
system.
Figure 1.5.
Microsoft Query.
Open Database Connectivity (ODBC)
ODBC is a functional library designed to provide a common Application Programming
Interface (API) to underlying database systems. It communicates with the database
through a library driver, just as Windows communicates with a printer via a printer
driver. Depending on the database being used, a networking driver may be required
to connect to a remote database. The architecture of ODBC is illustrated in Figure
1.6.
Figure 1.6.
ODBC structure.
The unique feature of ODBC (as compared to the Oracle or Sybase libraries) is
that none of its functions are database-vendor specific. For instance, you can use
the same code to perform queries against a Microsoft Access table or an Informix
database with little or no modification. Once again, it should be noted that most
vendors add some proprietary extensions to the SQL standard, such as Microsoft's
and Sybase's Transact-SQL and Oracle's PL/SQL.
You should always consult the documentation before beginning to work with a new
data source. ODBC has developed into a standard adopted into many products, including
Visual Basic, Visual C++, FoxPro, Borland Delphi, and PowerBuilder. As always, application
developers need to weigh the benefit of using the emerging ODBC standard, which enables
you to design code without regard for a specific database, versus the speed gained
by using a database specific function library. In other words, using ODBC will be
more portable but slower than using the Oracle7 or Sybase libraries.
SQL in Application Programming
SQL was originally made an ANSI standard in 1986. The ANSI 1989 standard (often
called SQL-89) defines three types of interfacing to SQL within an application program:
- Module Language-- Uses procedures within programs. These procedures can be called
by the application program and can return values to the program via parameter passing.
- Embedded SQL--Uses SQL statements embedded with actual program code. This method
often requires the use of a precompiler to process the SQL statements. The standard
defines statements for Pascal, FORTRAN, COBOL, and PL/1.
- Direct Invocation--Left up to the implementor.
Before the concept of dynamic SQL evolved, embedded SQL was the most popular way
to use SQL within a program. Embedded SQL, which is still used, uses static
SQL--meaning that the SQL statement is compiled into the application and cannot be
changed at runtime. The principle is much the same as a compiler versus an interpreter.
The performance for this type of SQL is good; however, it is not flexible--and cannot
always meet the needs of today's changing business environments. Dynamic SQL is discussed
shortly.
The ANSI 1992 standard (SQL-92) extended the language and became an international
standard. It defines three levels of SQL compliance: entry, intermediate, and full.
The new features introduced include the following:
- Connections to databases
- Scrollable cursors
- Dynamic SQL
- Outer joins
This book covers not only all these extensions but also some proprietary extensions
used by RDBMS vendors. Dynamic SQL allows you to prepare the SQL statement at runtime.
Although the performance for this type of SQL is not as good as that of embedded
SQL, it provides the application developer (and user) with a great degree of flexibility.
A call-level interface, such as ODBC or Sybase's DB-Library, is an example of dynamic
SQL.
Call-level interfaces should not be a new concept to application programmers.
When using ODBC, for instance, you simply fill a variable with your SQL statement
and call the function to send the SQL statement to the database. Errors or results
can be returned to the program through the use of other function calls designed for
those purposes. Results are returned through a process known as the binding
of variables.
Summary
Day 1 covers some of the history and structure behind SQL. Because SQL and relational
databases are so closely linked, Day 1 also covers (albeit briefly) the history and
function of relational databases. Tomorrow is devoted to the most important component
of SQL: the query.
Q&A
- Q Why should I be concerned about SQL?
A Until recently, if you weren't working on a large database system, you probably
had only a passing knowledge of SQL. With the advent of client/server development
tools (such as Visual Basic, Visual C++, ODBC, Borland's Delphi, and Powersoft's
PowerBuilder) and the movement of several large databases (Oracle and Sybase) to
the PC platform, most business applications being developed today require a working
knowledge of SQL.
Q Why do I need to know anything about relational database theory to use SQL?
A SQL was developed to service relational databases. Without a minimal understanding
of relational database theory, you will not be able to use SQL effectively except
in the most trivial cases.
Q All the new GUI tools enable me to click a button to write SQL. Why should I
spend time learning to write SQL manually?
A GUI tools have their place, and manually writing SQL has its place. Manually
written SQL is generally more efficient than GUI-written SQL. Also, a GUI SQL statement
is not as easy to read as a manually written SQL statement. Finally, knowing what
is going on behind the scenes when you use GUI tools will help you get the most out
of them.
Q So, if SQL is standardized, should I be able to program with SQL on any databases?
A No, you will be able to program with SQL only on RDBMS databases that support
SQL, such as MS-Access, Oracle, Sybase, and Informix. Although each vendor's implementation
will differ slightly from the others, you should be able to use SQL with very few
adjustments.
Workshop
The Workshop provides quiz questions to help solidify your understanding of the
material covered, as well as exercises to provide you with experience in using what
you have learned. Try to answer the quiz and exercise questions before checking the
answers in Appendix F, "Answers to Quizzes and Exercises."
Quiz
- 1. What makes SQL a nonprocedural language?
2. How can you tell whether a database is truly relational?
3. What can you do with SQL?
4. Name the process that separates data into distinct, unique sets.
Exercise
- Determine whether the database you use at work or at home is truly relational.
© Copyright, Macmillan Computer Publishing. All
rights reserved.
|