Teach Yourself SQL in 21 Days, Second Edition
- Day 5 -
Clauses in SQL
Objectives
Today's topic is clauses--not the kind that distribute presents during the holidays,
but the ones you use with a SELECT statement. By the end of the day you
will understand and be able to use the following clauses:
- WHERE
- STARTING WITH
- ORDER BY
- GROUP BY
- HAVING
To get a feel for where these functions fit in, examine the general syntax for
a SELECT statement:
SYNTAX:
SELECT [DISTINCT | ALL] { *
| { [schema.]{table | view | snapshot}.*
| expr } [ [AS] c_alias ]
[, { [schema.]{table | view | snapshot}.*
| expr } [ [AS] c_alias ] ] ... }
FROM [schema.]{table | view | snapshot}[@dblink] [t_alias]
[, [schema.]{table | view | snapshot}[@dblink] [t_alias] ] ...
[WHERE condition ]
[GROUP BY expr [, expr] ... [HAVING condition] ]
[{UNION | UNION ALL | INTERSECT | MINUS} SELECT command ]
[ORDER BY {expr|position} [ASC | DESC]
[, {expr|position} [ASC | DESC]] ...]
NOTE: In my experience with SQL, the ANSI
standard is really more of an ANSI "suggestion." The preceding syntax will
generally work with any SQL engine, but you may find some slight variations.
NOTE: You haven't yet had to deal with
a complicated syntax diagram. Because many people find syntax diagrams more puzzling
than illuminating when learning something new, this book has used simple examples
to illustrate particular points. However, we are now at the point where a syntax
diagram can help tie the familiar concepts to today's new material.
Don't worry about the exact syntax--it varies slightly from implementation to
implementation anyway. Instead, focus on the relationships. At the top of this statement
is SELECT, which you have used many times in the last few days. SELECT
is followed by FROM, which should appear with every SELECT statement
you typed. (You learn a new use for FROM tomorrow.) WHERE, GROUP
BY, HAVING, and ORDER BY all follow. (The other clauses in
the diagram--UNION, UNION ALL, INTERSECT, and MINUS--were
covered in Day 3, "Expressions, Conditions, and Operators.") Each clause
plays an important part in selecting and manipulating data.
NOTE: We have used two implementations
of SQL to prepare today's examples. One implementation has an SQL> prompt and
line numbers (Personal Oracle7), and the other (Borland's ISQL) does not. You will
also notice that the output displays vary slightly, depending on the implementation.
The WHERE Clause
Using just SELECT and FROM, you are limited to returning every
row in a table. For example, using these two key words on the CHECKS table,
you get all seven rows:
INPUT:
SQL> SELECT *
2 FROM CHECKS;
OUTPUT:
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
3 Ma Bell 200.32 Cellular Phone
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
16 Cash 25 Wild Night Out
17 Joans Gas 25.1 Gas
7 rows selected.
With WHERE in your vocabulary, you can be more selective. To find all
the checks you wrote with a value of more than 100 dollars, write this:
INPUT:
SQL> SELECT *
2 FROM CHECKS
3 WHERE AMOUNT > 100;
The WHERE clause returns the four instances in the table that meet the
required condition:
OUTPUT:
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
3 Ma Bell 200.32 Cellular Phone
5 Joes Stale $ Dent 150 Groceries
WHERE can also solve other popular puzzles. Given the following table
of names and locations, you can ask that popular question, Where's Waldo?
INPUT:
SQL> SELECT *
2 FROM PUZZLE;
OUTPUT:
NAME LOCATION
-------------- --------------
TYLER BACKYARD
MAJOR KITCHEN
SPEEDY LIVING ROOM
WALDO GARAGE
LADDIE UTILITY CLOSET
ARNOLD TV ROOM
6 rows selected.
INPUT:
SQL> SELECT LOCATION AS "WHERE'S WALDO?"
2 FROM PUZZLE
3 WHERE NAME = 'WALDO';
OUTPUT:
WHERE'S WALDO?
--------------
GARAGE
Sorry, we couldn't resist. We promise no more corny queries. (We're saving those
for that SQL bathroom humor book everyone's been wanting.) Nevertheless, this query
shows that the column used in the condition of the WHERE statement does
not have to be mentioned in the SELECT clause. In this example you selected
the location column but used WHERE on the name, which is perfectly legal.
Also notice the AS on the SELECT line. AS is an optional
assignment operator, assigning the alias WHERE'S WALDO? to LOCATION.
You might never see the AS again, because it involves extra typing. In most
implementations of SQL you can type
INPUT:
SQL> SELECT LOCATION "WHERE'S WALDO?"
2 FROM PUZZLE
3 WHERE NAME ='WALDO';
and get the same result as the previous query without using AS:
OUTPUT:
WHERE'S WALDO?
--------------
GARAGE
After SELECT and FROM, WHERE is the third most frequently
used SQL term.
The STARTING WITH Clause
STARTING WITH is an addition to the WHERE clause that works
exactly like LIKE(<exp>%). Compare the results of the following query:
INPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE LIKE('Ca%');
OUTPUT:
PAYEE AMOUNT REMARKS
==================== =============== ==============
Cash 25 Wild Night Out
Cash 60 Trip to Boston
Cash 34 Trip to Dayton
with the results from this query:
INPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE STARTING WITH('Ca');
OUTPUT:
PAYEE AMOUNT REMARKS
==================== =============== ==============
Cash 25 Wild Night Out
Cash 60 Trip to Boston
Cash 34 Trip to Dayton
The results are identical. You can even use them together, as shown here:
INPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE STARTING WITH('Ca')
OR
REMARKS LIKE 'G%';
OUTPUT:
PAYEE AMOUNT REMARKS
==================== =============== ===============
Local Utilities 98 Gas
Joes Stale $ Dent 150 Groceries
Cash 25 Wild Night Out
Joans Gas 25.1 Gas
Cash 60 Trip to Boston
Cash 34 Trip to Dayton
Joans Gas 15.75 Gas
WARNING: STARTING WITH is a common
feature of many implementations of SQL. Check your implementation before you grow
fond of it.
Order from Chaos: The ORDER BY Clause
From time to time you will want to present the results of your query in some kind
of order. As you know, however, SELECT FROM gives you a listing, and unless
you have defined a primary key (see Day 10, "Creating Views and Indexes"),
your query comes out in the order the rows were entered. Consider a beefed-up CHECKS
table:
INPUT:
SQL> SELECT * FROM CHECKS;
OUTPUT:
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
3 Ma Bell 200.32 Cellular Phone
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
16 Cash 25 Wild Night Out
17 Joans Gas 25.1 Gas
9 Abes Cleaners 24.35 X-Tra Starch
20 Abes Cleaners 10.5 All Dry Clean
8 Cash 60 Trip to Boston
21 Cash 34 Trip to Dayton
11 rows selected.
ANALYSIS:
You're going to have to trust me on this one, but the order of the output is exactly
the same order as the order in which the data was entered. After you read Day 8,
"Manipulating Data," and know how to use INSERT to create tables,
you can test how data is ordered by default on your own.
The ORDER BY clause gives you a way of ordering your results. For example,
to order the preceding listing by check number, you would use the following ORDER
BY clause:
INPUT:
SQL> SELECT *
2 FROM CHECKS
3 ORDER BY CHECK#;
OUTPUT:
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
3 Ma Bell 200.32 Cellular Phone
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
8 Cash 60 Trip to Boston
9 Abes Cleaners 24.35 X-Tra Starch
16 Cash 25 Wild Night Out
17 Joans Gas 25.1 Gas
20 Abes Cleaners 10.5 All Dry Clean
21 Cash 34 Trip to Dayton
11 rows selected.
Now the data is ordered the way you want it, not the way in which it was entered.
As the following example shows, ORDER requires BY; BY
is not optional.
INPUT/OUTPUT:
SQL> SELECT * FROM CHECKS ORDER CHECK#;
SELECT * FROM CHECKS ORDER CHECK#
*
ERROR at line 1:
ORA-00924: missing BY keyword
What if you want to list the data in reverse order, with the highest number or
letter first? You're in luck! The following query generates a list of PAYEEs
that stars at the end of the alphabet:
INPUT/OUTPUT:
SQL> SELECT *
2 FROM CHECKS
3 ORDER BY PAYEE DESC;
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
2 Reading R.R. 245.34 Train to Chicago
1 Ma Bell 150 Have sons next time
3 Ma Bell 200.32 Cellular Phone
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
17 Joans Gas 25.1 Gas
16 Cash 25 Wild Night Out
8 Cash 60 Trip to Boston
21 Cash 34 Trip to Dayton
9 Abes Cleaners 24.35 X-Tra Starch
20 Abes Cleaners 10.5 All Dry Clean
11 rows selected.
ANALYSIS:
The DESC at the end of the ORDER BY clause orders the list in
descending order instead of the default (ascending) order. The rarely used, optional
keyword ASC appears in the following statement:
INPUT:
SQL> SELECT PAYEE, AMOUNT
2 FROM CHECKS
3 ORDER BY CHECK# ASC;
OUTPUT:
PAYEE AMOUNT
-------------------- ---------
Ma Bell 150
Reading R.R. 245.34
Ma Bell 200.32
Local Utilities 98
Joes Stale $ Dent 150
Cash 60
Abes Cleaners 24.35
Cash 25
Joans Gas 25.1
Abes Cleaners 10.5
Cash 34
11 rows selected.
ANALYSIS:
The ordering in this list is identical to the ordering of the list at the beginning
of the section (without ASC) because ASC is the default. This query
also shows that the expression used after the ORDER BY clause does not have
to be in the SELECT statement. Although you selected only PAYEE
and AMOUNT, you were still able to order the list by CHECK#.
You can also use ORDER BY on more than one field. To order CHECKS
by PAYEE and REMARKS, you would query as follows:
INPUT:
SQL> SELECT *
2 FROM CHECKS
3 ORDER BY PAYEE, REMARKS;
OUTPUT:
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
20 Abes Cleaners 10.5 All Dry Clean
9 Abes Cleaners 24.35 X-Tra Starch
8 Cash 60 Trip to Boston
21 Cash 34 Trip to Dayton
16 Cash 25 Wild Night Out
17 Joans Gas 25.1 Gas
5 Joes Stale $ Dent 150 Groceries
4 Local Utilities 98 Gas
3 Ma Bell 200.32 Cellular Phone
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
ANALYSIS:
Notice the entries for Cash in the PAYEE column. In the previous ORDER
BY, the CHECK#s were in the order 16, 21, 8. Adding the field
REMARKS to the ORDER BY clause puts the entries in alphabetical
order according to REMARKS. Does the order of multiple columns in the ORDER
BY clause make a difference? Try the same query again but reverse PAYEE
and REMARKS:
INPUT:
SQL> SELECT *
2 FROM CHECKS
3 ORDER BY REMARKS, PAYEE;
OUTPUT:
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- --------------------
20 Abes Cleaners 10.5 All Dry Clean
3 Ma Bell 200.32 Cellular Phone
17 Joans Gas 25.1 Gas
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
8 Cash 60 Trip to Boston
21 Cash 34 Trip to Dayton
16 Cash 25 Wild Night Out
9 Abes Cleaners 24.35 X-Tra Starch
11 rows selected.
ANALYSIS:
As you probably guessed, the results are completely different. Here's how to list
one column in alphabetical order and list the second column in reverse alphabetical
order:
INPUT/OUTPUT:
SQL> SELECT *
2 FROM CHECKS
3 ORDER BY PAYEE ASC, REMARKS DESC;
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
9 Abes Cleaners 24.35 X-Tra Starch
20 Abes Cleaners 10.5 All Dry Clean
16 Cash 25 Wild Night Out
21 Cash 34 Trip to Dayton
8 Cash 60 Trip to Boston
17 Joans Gas 25.1 Gas
5 Joes Stale $ Dent 150 Groceries
4 Local Utilities 98 Gas
1 Ma Bell 150 Have sons next time
3 Ma Bell 200.32 Cellular Phone
2 Reading R.R. 245.34 Train to Chicago
11 rows selected.
ANALYSIS:
In this example PAYEE is sorted alphabetically, and REMARKS
appears in descending order. Note how the remarks in the three checks with a PAYEE
of Cash are sorted.
TIP: If you know that a column you want
to order your results by is the first column in a table, then you can type ORDER
BY 1 in place of spelling out the column name. See the following example.
INPUT/OUTPUT:
SQL> SELECT *
2 FROM CHECKS
3 ORDER BY 1;
CHECK# PAYEE AMOUNT REMARKS
-------- -------------------- -------- ------------------
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
3 Ma Bell 200.32 Cellular Phone
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
8 Cash 60 Trip to Boston
9 Abes Cleaners 24.35 X-Tra Starch
16 Cash 25 Wild Night Out
17 Joans Gas 25.1 Gas
20 Abes Cleaners 10.5 All Dry Clean
21 Cash 34 Trip to Dayton
11 rows selected.
ANALYSIS:
This result is identical to the result produced by the SELECT statement
that you used earlier today:
SELECT * FROM CHECKS ORDER BY CHECK#;
The GROUP BY Clause
On Day 3 you learned how to use aggregate functions (COUNT, SUM,
AVG, MIN, and MAX). If you wanted to find the total amount
of money spent from the slightly changed CHECKS table, you would type:
INPUT:
SELECT *
FROM CHECKS;
Here's the modified table:
OUTPUT:
CHECKNUM PAYEE AMOUNT REMARKS
======== =========== =============== ======================
1 Ma Bell 150 Have sons next time
2 Reading R.R. 245.34 Train to Chicago
3 Ma Bell 200.33 Cellular Phone
4 Local Utilities 98 Gas
5 Joes Stale $ Dent 150 Groceries
16 Cash 25 Wild Night Out
17 Joans Gas 25.1 Gas
9 Abes Cleaners 24.35 X-Tra Starch
20 Abes Cleaners 10.5 All Dry Clean
8 Cash 60 Trip to Boston
21 Cash 34 Trip to Dayton
30 Local Utilities 87.5 Water
31 Local Utilities 34 Sewer
25 Joans Gas 15.75 Gas
Then you would type:
INPUT/OUTPUT:
SELECT SUM(AMOUNT)
FROM CHECKS;
SUM
===============
1159.87
ANALYSIS:
This statement returns the sum of the column AMOUNT. What if you wanted
to find out how much you have spent on each PAYEE? SQL helps you with the
GROUP BY clause. To find out whom you have paid and how much, you would
query like this:
INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT)
FROM CHECKS
GROUP BY PAYEE;
PAYEE SUM
==================== ===============
Abes Cleaners 34.849998
Cash 119
Joans Gas 40.849998
Joes Stale $ Dent 150
Local Utilities 219.5
Ma Bell 350.33002
Reading R.R. 245.34
ANALYSIS:
The SELECT clause has a normal column selection, PAYEE, followed
by the aggregate function SUM(AMOUNT). If you had tried this query with
only the FROM CHECKS that follows, here's what you would see:
INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT)
FROM CHECKS;
Dynamic SQL Error
-SQL error code = -104
-invalid column reference
ANALYSIS:
SQL is complaining about the combination of the normal column and the aggregate
function. This condition requires the GROUP BY clause. GROUP BY
runs the aggregate function described in the SELECT statement for each grouping
of the column that follows the GROUP BY clause. The table CHECKS
returned 14 rows when queried with SELECT * FROM CHECKS. The query on the
same table, SELECT PAYEE, SUM(AMOUNT) FROM CHECKS GROUP BY PAYEE, took the
14 rows in the table and made seven groupings, returning the SUM of each
grouping.
Suppose you wanted to know how much you gave to whom with how many checks. Can
you use more than one aggregate function?
INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT), COUNT(PAYEE)
FROM CHECKS
GROUP BY PAYEE;
PAYEE SUM COUNT
==================== =============== ===========
Abes Cleaners 34.849998 2
Cash 119 3
Joans Gas 40.849998 2
Joes Stale $ Dent 150 1
Local Utilities 219.5 3
Ma Bell 350.33002 2
Reading R.R. 245.34 1
ANALYSIS:
This SQL is becoming increasingly useful! In the preceding example, you were able
to perform group functions on unique groups using the GROUP BY clause. Also
notice that the results were ordered by payee. GROUP BY also acts like the
ORDER BY clause. What would happen if you tried to group by more than one
column? Try this:
INPUT/OUTPUT:
SELECT PAYEE, SUM(AMOUNT), COUNT(PAYEE)
FROM CHECKS
GROUP BY PAYEE, REMARKS;
PAYEE SUM COUNT
==================== =============== ===========
Abes Cleaners 10.5 1
Abes Cleaners 24.35 1
Cash 60 1
Cash 34 1
Cash 25 1
Joans Gas 40.849998 2
Joes Stale $ Dent 150 1
Local Utilities 98 1
Local Utilities 34 1
Local Utilities 87.5 1
Ma Bell 200.33 1
Ma Bell 150 1
Reading R.R. 245.34 1
ANALYSIS:
The output has gone from 7 groupings of 14 rows to 13 groupings. What is different
about the one grouping with more than one check associated with it? Look at the entries
for Joans Gas:
INPUT/OUTPUT:
SELECT PAYEE, REMARKS
FROM CHECKS
WHERE PAYEE = 'Joans Gas';
PAYEE REMARKS
==================== ====================
Joans Gas Gas
Joans Gas Gas
ANALYSIS:
You see that the combination of PAYEE and REMARKS creates identical
entities, which SQL groups together into one line with the GROUP BY clause.
The other rows produce unique combinations of PAYEE and REMARKS
and are assigned their own unique groupings.
The next example finds the largest and smallest amounts, grouped by REMARKS:
INPUT/OUTPUT:
SELECT MIN(AMOUNT), MAX(AMOUNT)
FROM CHECKS
GROUP BY REMARKS;
MIN MAX
=============== ===============
245.34 245.34
10.5 10.5
200.33 200.33
15.75 98
150 150
150 150
34 34
60 60
34 34
87.5 87.5
25 25
24.35 24.35
Here's what will happen if you try to include in the select statement
a column that has several different values within the group formed by GROUP BY:
INPUT/OUTPUT:
SELECT PAYEE, MAX(AMOUNT), MIN(AMOUNT)
FROM CHECKS
GROUP BY REMARKS;
Dynamic SQL Error
-SQL error code = -104
-invalid column reference
ANALYSIS:
This query tries to group CHECKS by REMARK. When the query finds
two records with the same REMARK but different PAYEEs, such as
the rows that have GAS as a REMARK but have PAYEEs of
LOCAL UTILITIES and JOANS GAS, it throws an error.
The rule is, Don't use the SELECT statement on columns that have multiple
values for the GROUP BY clause column. The reverse is not true. You can
use GROUP BY on columns not mentioned in the SELECT statement.
For example:
INPUT/OUTPUT:
SELECT PAYEE, COUNT(AMOUNT)
FROM CHECKS
GROUP BY PAYEE, AMOUNT;
PAYEE COUNT
==================== ===========
Abes Cleaners 1
Abes Cleaners 1
Cash 1
Cash 1
Cash 1
Joans Gas 1
Joans Gas 1
Joes Stale $ Dent 1
Local Utilities 1
Local Utilities 1
Local Utilities 1
Ma Bell 1
Ma Bell 1
Reading R.R. 1
ANALYSIS:
This silly query shows how many checks you had written for identical amounts to
the same PAYEE. Its real purpose is to show that you can use AMOUNT
in the GROUP BY clause, even though it is not mentioned in the SELECT
clause. Try moving AMOUNT out of the GROUP BY clause and into the
SELECT clause, like this:
INPUT/OUTPUT:
SELECT PAYEE, AMOUNT, COUNT(AMOUNT)
FROM CHECKS
GROUP BY PAYEE;
Dynamic SQL Error
-SQL error code = -104
-invalid column reference
ANALYSIS:
SQL cannot run the query, which makes sense if you play the part of SQL for a
moment. Say you had to group the following lines:
INPUT/OUTPUT:
SELECT PAYEE, AMOUNT, REMARKS
FROM CHECKS
WHERE PAYEE ='Cash';
PAYEE AMOUNT REMARKS
==================== =============== ===============
Cash 25 Wild Night Out
Cash 60 Trip to Boston
Cash 34 Trip to Dayton
If the user asked you to output all three columns and group by PAYEE
only, where would you put the unique remarks? Remember you have only one row per
group when you use GROUP BY. SQL can't do two things at once, so it complains:
Error #31: Can't do two things at once.
The HAVING Clause
How can you qualify the data used in your GROUP BY clause? Use the table
ORGCHART and try this:
INPUT:
SELECT * FROM ORGCHART;
OUTPUT:
NAME TEAM SALARY SICKLEAVE ANNUALLEAVE
=============== ======== =========== =========== ===========
ADAMS RESEARCH 34000.00 34 12
WILKES MARKETING 31000.00 40 9
STOKES MARKETING 36000.00 20 19
MEZA COLLECTIONS 40000.00 30 27
MERRICK RESEARCH 45000.00 20 17
RICHARDSON MARKETING 42000.00 25 18
FURY COLLECTIONS 35000.00 22 14
PRECOURT PR 37500.00 24 24
If you wanted to group the output into divisions and show the average salary in
each division, you would type:
INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM;
TEAM AVG
=============== ===========
COLLECTIONS 37500.00
MARKETING 36333.33
PR 37500.00
RESEARCH 39500.00
The following statement qualifies this query to return only those departments
with average salaries under 38000:
INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
WHERE AVG(SALARY) < 38000
GROUP BY TEAM;
Dynamic SQL Error
-SQL error code = -104
-Invalid aggregate reference
ANALYSIS:
This error occurred because WHERE does not work with aggregate functions.
To make this query work, you need something new: the HAVING clause. If you
type the following query, you get what you ask for:
INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING AVG(SALARY) < 38000;
TEAM AVG
=============== ===========
COLLECTIONS 37500.00
MARKETING 36333.33
PR 37500.00
ANALYSIS:
HAVING enables you to use aggregate functions in a comparison statement,
providing for aggregate functions what WHERE provides for individual rows.
Does HAVING work with nonaggregate expressions? Try this:
INPUT/OUTPUT:
SELECT TEAM, AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING SALARY < 38000;
TEAM AVG
=============== ===========
PR 37500.00
ANALYSIS:
Why is this result different from the last query? The HAVING AVG(SALARY) <
38000 clause evaluated each grouping and returned only those with an average
salary of under 38000, just what you expected. HAVING SALARY < 38000,
on the other hand, had a different outcome. Take on the role of the SQL engine again.
If the user asks you to evaluate and return groups of divisions where SALARY
< 38000, you would examine each group and reject those where an individual
SALARY is greater than 38000. In each division except PR, you would
find at least one salary greater than 38000:
INPUT/OUTPUT:
SELECT NAME, TEAM, SALARY
FROM ORGCHART
ORDER BY TEAM;
NAME TEAM SALARY
=============== =============== ===========
FURY COLLECTIONS 35000.00
MEZA COLLECTIONS 40000.00
WILKES MARKETING 31000.00
STOKES MARKETING 36000.00
RICHARDSON MARKETING 42000.00
PRECOURT PR 37500.00
ADAMS RESEARCH 34000.00
MERRICK RESEARCH 45000.00
ANALYSIS:
Therefore, you would reject all other groups except PR. What you really asked
was Select all groups where no individual makes more than 38000. Don't you
just hate it when the computer does exactly what you tell it to?
WARNING: Some implementations of SQL return
an error if you use anything other than an aggregate function in a HAVING
clause. Don't bet the farm on using the previous example until you check the implementation
of the particular SQL you use.
Can you use more than one condition in your HAVING clause? Try this:
INPUT:
SELECT TEAM, AVG(SICKLEAVE),AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM
HAVING AVG(SICKLEAVE)>25 AND
AVG(ANNUALLEAVE)<20;
ANALYSIS:
The following table is grouped by TEAM. It shows all the teams with SICKLEAVE
averages above 25 days and ANNUALLEAVE averages below 20
days.
OUTPUT:
TEAM AVG AVG
=============== =========== ===========
MARKETING 28 15
RESEARCH 27 15
You can also use an aggregate function in the HAVING clause that was
not in the SELECT statement. For example:
INPUT/OUTPUT:
SELECT TEAM, AVG(SICKLEAVE),AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM
HAVING COUNT(TEAM) > 1;
TEAM AVG AVG
=============== =========== ===========
COLLECTIONS 26 21
MARKETING 28 15
RESEARCH 27 15
ANALYSIS:
This query returns the number of TEAMs with more than one member. COUNT(TEAM)
is not used in the SELECT statement but still functions as expected in the
HAVING clause.
The other logical operators all work well within the HAVING clause. Consider
this:
INPUT/OUTPUT:
SELECT TEAM,MIN(SALARY),MAX(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING AVG(SALARY) > 37000
OR
MIN(SALARY) > 32000;
TEAM MIN MAX
=============== =========== ===========
COLLECTIONS 35000.00 40000.00
PR 37500.00 37500.00
RESEARCH 34000.00 45000.00
The operator IN also works in a HAVING clause, as demonstrated
here:
INPUT/OUTPUT:
SELECT TEAM,AVG(SALARY)
FROM ORGCHART
GROUP BY TEAM
HAVING TEAM IN ('PR','RESEARCH');
TEAM AVG
=============== ===========
PR 37500.00
RESEARCH 39500.00
Combining Clauses
Nothing exists in a vacuum, so this section takes you through some composite examples
that demonstrate how combinations of clauses perform together.
Example 5.1
Find all the checks written for Cash and Gas in the CHECKS table and
order them by REMARKS.
INPUT:
SELECT PAYEE, REMARKS
FROM CHECKS
WHERE PAYEE = 'Cash'
OR REMARKS LIKE'Ga%'
ORDER BY REMARKS;
OUTPUT:
PAYEE REMARKS
==================== ====================
Joans Gas Gas
Joans Gas Gas
Local Utilities Gas
Cash Trip to Boston
Cash Trip to Dayton
Cash Wild Night Out
ANALYSIS:
Note the use of LIKE to find the REMARKS that started with Ga.
With the use of OR, data was returned if the WHERE clause met either
one of the two conditions.
What if you asked for the same information and group it by PAYEE? The
query would look something like this:
INPUT:
SELECT PAYEE, REMARKS
FROM CHECKS
WHERE PAYEE = 'Cash'
OR REMARKS LIKE'Ga%'
GROUP BY PAYEE
ORDER BY REMARKS;
ANALYSIS:
This query would not work because the SQL engine would not know what to do with
the remarks. Remember that whatever columns you put in the SELECT clause
must also be in the GROUP BY clause--unless you don't specify any columns
in the SELECT clause.
Example 5.2
Using the table ORGCHART, find the salary of everyone with less than
25 days of sick leave. Order the results by NAME.
INPUT:
SELECT NAME, SALARY
FROM ORGCHART
WHERE SICKLEAVE < 25
ORDER BY NAME;
OUTPUT:
NAME SALARY
=============== ===========
FURY 35000.00
MERRICK 45000.00
PRECOURT 37500.00
STOKES 36000.00
ANALYSIS:
This query is straightforward and enables you to use your new-found skills with
WHERE and ORDER BY.
Example 5.3
Again, using ORGCHART, display TEAM, AVG(SALARY), AVG(SICKLEAVE),
and AVG(ANNUALLEAVE) on each team:
INPUT:
SELECT TEAM,
AVG(SALARY),
AVG(SICKLEAVE),
AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM;
OUTPUT:
TEAM AVG AVG AVG
=============== =========== =========== ===========
COLLECTIONS 37500.00 26 21
MARKETING 36333.33 28 15
PR 37500.00 24 24
RESEARCH 39500.00 26 15
An interesting variation on this query follows. See if you can figure out what
happened:
INPUT/OUTPUT:
SELECT TEAM,
AVG(SALARY),
AVG(SICKLEAVE),
AVG(ANNUALLEAVE)
FROM ORGCHART
GROUP BY TEAM
ORDER BY NAME;
TEAM AVG AVG AVG
=============== =========== =========== ===========
RESEARCH 39500.00 27 15
COLLECTIONS 37500.00 26 21
PR 37500.00 24 24
MARKETING 36333.33 28 15
A simpler query using ORDER BY might offer a clue:
INPUT/OUTPUT:
SELECT NAME, TEAM
FROM ORGCHART
ORDER BY NAME, TEAM;
NAME TEAM
=============== ===========
ADAMS RESEARCH
FURY COLLECTIONS
MERRICK RESEARCH
MEZA COLLECTIONS
PRECOURT PR
RICHARDSON MARKETING
STOKES MARKETING
WILKES MARKETING
ANALYSIS:
When the SQL engine got around to ordering the results of the query, it used the
NAME column (remember, it is perfectly legal to use a column not specified
in the SELECT statement), ignored duplicate TEAM entries, and came
up with the order RESEARCH, COLLECTIONS, PR, and MARKETING.
Including TEAM in the ORDER BY clause is unnecessary, because you
have unique values in the NAME column. You can get the same result by typing
this statement:
INPUT/OUTPUT:
SELECT NAME, TEAM
FROM ORGCHART
ORDER BY NAME;
NAME TEAM
=============== ============
ADAMS RESEARCH
FURY COLLECTIONS
MERRICK RESEARCH
MEZA COLLECTIONS
PRECOURT PR
RICHARDSON MARKETING
STOKES MARKETING
WILKES MARKETING
While you are looking at variations, don't forget you can also reverse the order:
INPUT/OUTPUT:
SELECT NAME, TEAM
FROM ORGCHART
ORDER BY NAME DESC;
NAME TEAM
=============== ============
WILKES MARKETING
STOKES MARKETING
RICHARDSON MARKETING
PRECOURT PR
MEZA COLLECTIONS
MERRICK RESEARCH
FURY COLLECTIONS
ADAMS RESEARCH
Example 5.4: The Big Finale
Is it possible to use everything you have learned in one query? It is, but the
results will be convoluted because in many ways you are working with apples and oranges--or
aggregates and nonaggregates. For example, WHERE and ORDER BY are
usually found in queries that act on single rows, such as this:
INPUT/OUTPUT:
SELECT *
FROM ORGCHART
ORDER BY NAME DESC;
NAME TEAM SALARY SICKLEAVE ANNUALLEAVE
=============== ======== =========== =========== ===========
WILKES MARKETING 31000.00 40 9
STOKES MARKETING 36000.00 20 19
RICHARDSON MARKETING 42000.00 25 18
PRECOURT PR 37500.00 24 24
MEZA COLLECTIONS 40000.00 30 27
MERRICK RESEARCH 45000.00 20 17
FURY COLLECTIONS 35000.00 22 14
ADAMS RESEARCH 34000.00 34 12
GROUP BY and HAVING are normally seen in the company of aggregates:
INPUT/OUTPUT:
SELECT PAYEE,
SUM(AMOUNT) TOTAL,
COUNT(PAYEE) NUMBER_WRITTEN
FROM CHECKS
GROUP BY PAYEE
HAVING SUM(AMOUNT) > 50;
PAYEE TOTAL NUMBER_WRITTEN
==================== =============== ==============
Cash 119 3
Joes Stale $ Dent 150 1
Local Utilities 219.5 3
Ma Bell 350.33002 2
Reading R.R. 245.34 1
You have seen that combining these two groups of clauses can have unexpected results,
including the following:
INPUT:
SELECT PAYEE,
SUM(AMOUNT) TOTAL,
COUNT(PAYEE) NUMBER_WRITTEN
FROM CHECKS
WHERE AMOUNT >= 100
GROUP BY PAYEE
HAVING SUM(AMOUNT) > 50;
OUTPUT:
PAYEE TOTAL NUMBER_WRITTEN
==================== =============== ==============
Joes Stale $ Dent 150 1
Ma Bell 350.33002 2
Reading R.R. 245.34 1
Compare these two result sets and examine the raw data:
INPUT/OUTPUT:
SELECT PAYEE, AMOUNT
FROM CHECKS
ORDER BY PAYEE;
PAYEE AMOUNT
==================== ===============
Abes Cleaners 10.5
Abes Cleaners 24.35
Cash 25
Cash 34
Cash 60
Joans Gas 15.75
Joans Gas 25.1
Joes Stale $ Dent 150
Local Utilities 34
Local Utilities 87.5
Local Utilities 98
Ma Bell 150
Ma Bell 200.33
Reading R.R. 245.34
ANALYSIS:
You see how the WHERE clause filtered out all the checks less than 100
dollars before the GROUP BY was performed on the query. We are not trying
to tell you not to mix these groups--you may have a requirement that this sort of
construction will meet. However, you should not casually mix aggregate and nonaggregate
functions. The previous examples have been tables with only a handful of rows. (Otherwise,
you would need a cart to carry this book.) In the real world you will be working
with thousands and thousands (or billions and billions) of rows, and the subtle changes
caused by mixing these clauses might not be so apparent.
Summary
Today you learned all the clauses you need to exploit the power of a SELECT
statement. Remember to be careful what you ask for because you just might get it.
Your basic SQL education is complete. You already know enough to work effectively
with single tables. Tomorrow (Day 6, "Joining Tables") you will have the
opportunity to work with multiple tables.
Q&A
- Q I thought we covered some of these functions earlier this week? If so, why
are we covering them again?
A We did indeed cover WHERE on Day 3. You needed a knowledge of
WHERE to understand how certain operators worked. WHERE appears
again today because it is a clause, and today's topic is clauses.
Workshop
The Workshop provides quiz questions to help solidify your understanding of the
material covered, as well as exercises to provide you with experience in using what
you have learned. Try to answer the quiz and exercise questions before checking the
answers in Appendix F, "Answers to Quizzes and Exercises."
Quiz
- 1. Which clause works just like LIKE(<exp>%)?
2. What is the function of the GROUP BY clause, and what other
clause does it act like?
3. Will this SELECT work?
INPUT:
SQL> SELECT NAME, AVG(SALARY), DEPARTMENT
FROM PAY_TBL
WHERE DEPARTMENT = 'ACCOUNTING'
ORDER BY NAME
GROUP BY DEPARTMENT, SALARY;
- 4. When using the HAVING clause, do you always have to use a
GROUP BY also?
5. Can you use ORDER BY on a column that is not one of the columns
in the SELECT statement?
Exercises
- 1. Using the ORGCHART table from the preceding examples, find
out how many people on each team have 30 or more days of sick leave.
2. Using the CHECKS table, write a SELECT that will return
the following:
OUTPUT:
CHECK#_____PAYEE_______AMOUNT
1 MA BELL 150
© Copyright, Macmillan Computer Publishing. All
rights reserved.
|