Structured Query Language (SQL) was introduced by IBM as the language to interface with its prototype relational database management system, System R. The first commercially available SQL relational database management system was introduced in 1979 by
Oracle Corporation. Today, SQL has become an industry standard, and Oracle Corporation clearly leads the world in relational database management system technology.
Because SQL is a non-procedural language, sets of records can be manipulated instead of one record at a time. The syntax is free-flowing, enabling you to concentrate on the data presentation. Oracle has two optimizers (cost- and rule-based) that will
parse the syntax and format it into an efficient statement before the database engine receives it for processing. The database administrator (DBA) determines which optimizer is in effect for each database instance.
The American National Standards Institute (ANSI) has declared SQL as the standard language for relational database management systems. Most companies that produce relational database management systems support SQL and tend to comply with the ANSI SQL89
standard.
This chapter does not give a lesson on data modeling and creating a proper database schema. In order to write proper SQL statements, familiarity with database objects (tables, views, constraints) are essential.
One general rule to follow when you are writing SQL statements is that data types cannot be mixed. Conversion utilities are available to convert from one type to another. These conversion functions are covered later in this chapter.
The NUMBER data type is used to store zero, negative, positive, fixed, and floating point numbers with up to 38 digits of precision. Numbers range between 1.0x10 -130 and 1.0x10 126.
Numbers can be defined in one of three ways:
NUMBER(p,s)
where p is the precision up to 38 digits and s is the scale (number of digits to the right of the decimal point). The scale can range between -84 to 127.
NUMBER (p)
This is a fixed-point number with a scale of zero and a precision of p.
NUMBER
This is a floating-point number with a precision of 38.
The following table shows how Oracle stores different scales and precisions:
Actual Data
|
Defined as
|
Stored as
|
123456.789
|
NUMBER(6,2)
|
123456.79
|
123456.789
|
NUMBER(6)
|
123457
|
123456.789
|
NUMBER(6,-2)
|
123400
|
123456.789
|
NUMBER
|
123456.789 |
Instead of storing date and time information in a character or numeric format, IBM created a separate data type. For each DATE data type, the following information is stored:
Century
Year
Month
Day
Hour
Minute
Second
You can easily retrieve the current date and time by using the function SYSDATE.
Date arithmetic is possible using number constants or other dates. Only addition and subtraction are supported. For example, SYSDATE + 7 will return one week from today.
Every database system has a default date format that is defined by the initialization parameter NLS_DATE_FORMAT. This parameter is usually set to DD-MON-YY, where DD is the day of the month (the first day of the month is 01), MON is the abbreviated
month name, and YY is a two-digit year designation.
If you do not specify a time, the default time is 12:00:00 a.m. If only the time component is captured, the default date will be the first day of the current month.
There are four character types available:
- The CHAR data type is used where fixed-length fields are necessary. Any length up to 255 characters can be specified. The default length is 1. When data is entered, any space left over will be filled with blanks. All alpha-numeric characters are
allowed.
- The VARCHAR2 is used for variable-length fields. A length component must be supplied when you use this data type. The maximum length is 2000 characters. All alpha-numeric characters are allowed.
- The LONG data type is used to store large amounts of variable-length text. Any length up to 2 GB can be specified. Be aware that there are some restrictions to using this data type, such as:
Only one column per table can be defined as LONG.
A LONG column cannot be indexed.
A LONG column cannot be passed as an argument to a procedure.
A function cannot be used to return a LONG column.
A LONG column cannot be used in where, order by, group by, or connect by clauses.
- 4. The VARCHAR data type is synonymous with VARCHAR2. Oracle Corporation is reserving this for future use. Do not use this data type.
Two data types, RAW and LONGRAW, are available for storing binary type data such as digitized sound and images. These data types take on similar characteristics as the VARCHAR2 and LONG data types already mentioned.
Use the RAW data type to store binary data up to 2000 characters and use the LONGRAW data type to store binary data up to 2 GB.
Oracle only stores and retrieves binary data; no string manipulations are allowed. Data is retrieved as hexadecimal character values.
Every row in the database has an address. You can retrieve this address by using the ROWID function. The format of the ROWID is as follows:
BLOCK.ROW.FILE
BLOCK is the data block of the data FILE containing the ROW. The data is in hexadecimal format and has the data type ROWID.
MLSLABEL is a data type used to store the binary format of a label used on a secure operating system.
The CREATE statement opens the world to the user. Whether a simple temporary table is to be created or a complex database schema, you will repeatedly use the CREATE statement. Only a few of the more common CREATE statements are covered here.
Every database designer will have to create a table sometime. The CREATE TABLE system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create a table is
CREATE TABLE schema.TABLE (COLUMN DATATYPE default expression column constraint) table constraint
PCTFREE x PCTUSED x INITRANS x MAXTRANS x TABLESPACE name STORAGE clause CLUSTER cluster clause
ENABLE clause DISABLE clause AS subquery
In this syntax, SCHEMA is an optional parameter to identify which database schema to place this table in. The default is your own.
TABLE is mandatory and is the name of your table.
COLUMN DATATYPE are required to identify each column in the table. Separate the columns with commas. There is a maximum of 254 columns per table.
The DEFAULT expression is optional and is used to assign a default value to a column when a subsequent insert statement fails to assign a value.
COLUMN CONSTRAINT is optional. It is used to define an integrity constraint such as not null.
TABLE CONSTRAINT is optional and is used to define an integrity constraint as part of the table, such as the primary key.
PCTFREE is optional but has a default of 10. This indicates that 10 percent for each data block will be reserved for future updates to the table's rows. Integers from 1 to 99 are allowed.
PCTUSED is optional but has a default of 40. This indicates the minimum percentage of space used that Oracle maintains before a data block becomes a candidate for row insertion. Integers from 1 to 99 are allowed. The sum of PCTFREE and PCTUSED must be
less than 100.
INITRANS is optional but has a default of 1. Integers from 1 to 255 are allowed. It is recommended that you leave this alone. This is an allocation of the number of transaction entries assigned within the data block for the table.
MAXTRANS is optional but has a default that is a function of the data block size. This is used to identify the maximum number of concurrent transactions that can update a data block for your table. It is recommended that this parameter not be changed.
TABLESPACE is optional but has a default value as the tablespace name of the owner of the schema. A different tablespace name than the default can be used. Tablespace names are usually application-dependent. The DBA will be able to give proper
recommendations.
STORAGE is optional and has default characteristics defined by the DBA.
CLUSTER is optional and specifies that a table is to be part of a cluster. You must identify the columns from the table that need to be clustered. Typically, the cluster columns are columns that comprise the primary key.
ENABLE is optional and turns on an integrity constraint.
DISABLE is optional and turns off an integrity constraint.
AS SUBQUERY is optional and inserts the rows returned by the subquery into the table upon creation.
Once the table is created, you can use the ALTER TABLE command to make alterations to the table. To modify an integrity constraint, DROP the constraint first, and then re-create it.
Let's look at two examples on creating tables:
CREATE TABLE ADDRESSES (ADRS_ID NUMBER(6),
ACTIVE_DATE DATE,
BOX_NUMBER NUMBER(6),
ADDRS_1 VARCHAR2(40),
ADDRS_2 VARCHAR2(40),
CITY VARCHAR2(40),
STATE VARCHAR2(2),
ZIP VARCHAR2(10));
This is the simplest form of a table create using all of the default capabilities. The second example follows:
CREATE TABLE ADDRESSES (ADRS_ID NUMBER(6) CONSTRAINT PK_ADRS PRIMARY KEY,
ACTIVE_DATE DATE DEFAULT SYSDATE,
BOX_NUMBER NUMBER(6) DEFAULT NULL,
ADDRS_1 VARCHAR2(40) NOT NULL,
ADDRS_2 VARCHAR2(40) DEFAULT NULL,
CITY VARCHAR2(40) DEFAULT NULL,
STATE VARCHAR2(2) DEFAULT 'NY',
ZIP VARCHAR2(10))
PCTFREE 5
PCTUSED 65
TABLESPACE adrs_data
STORAGE (INITIAL 5140
NEXT 5140
MINEXTENTS 1
MAXEXTENTS 10
PCTINCREASE 10);
In this example, data constraints are being utilized and certain storage parameters will be in effect. Using PCTFREE and PCTUSED is a good idea if your data is relatively static.
Indexes are used to increase performance of the database. An index is created on one or more columns of a table or cluster. Multiple indexes per table are allowed. The CREATE INDEX system privilege is needed to execute this command. The DBA is
responsible for administering these privileges. The syntax to create an index is
CREATE INDEX schema.index ON schema.table (COLUMN ASC/DESC)
CLUSTER schema.cluster INITRANS x MAXTRANS x TABLESPACE name STORAGE clause PCTFREE x NOSORT
In this syntax, SCHEMA is an optional parameter to identify which database schema to place this table in. The default is your own.
INDEX is mandatory and is the name of the index.
ON is a mandatory reserved word.
TABLE is a mandatory table name upon which the index will be built.
COLUMN is the column name to be indexed. If there is more than one column, make sure they are in order of priority.
ASC/DESC are optional parameters. Indexes are built in ascending order by default. Use DESC for descending order.
CLUSTER is needed only if this index is for a cluster.
INITRANS is optional but has a default of 1. Integers from 1 to 255 are allowed. It is recommended that this parameter not be changed. This is an allocation of the number of transaction entries assigned within the data block for the index.
MAXTRANS is optional but has a default that is a function of the data block size. It is used to identify the maximum number of concurrent transactions that can update a data block for the index. It is recommended that this parameter not be changed.
TABLESPACE is optional but has a default value as the tablespace name of the owner of the schema. A different tablespace name than the default might be needed. The DBA will be able to give some recommendations.
STORAGE is optional and has default characteristics defined by the DBA.
PCTFREE is optional but has a default of 10. This indicates that 10 percent for each data block will be reserved for future updates to the index. Integers from 1 to 99 are allowed.
NOSORT is an optional parameter that will save time when creating the index if the table data is already stored in ascending order. This cannot be used if a clustered index is being created.
Using the addresses table defined from the create table example, two indexes will be created in the next example.
CREATE INDEX x_adrs_id ON ADDRESSES (ADRS_ID);
This will create an index on the adrs_id column only.
CREATE INDEX x_city_state ON ADDRESSES (CITY,STATE)
TABLESPACE application_indexes;
This index has two columns; CITY is the primary column. In order for queries to use an index, the column names must be part of the select statement. If a select statement included STATE but not CITY, the index would not be used. However, if the select
statement contained a reference to CITY but not STATE, part of the index would be used because CITY is the first column of the index.
Sequences are a great way to have the database automatically generate unique integer primary keys. The CREATE SEQUENCE system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create a
sequence is
CREATE SEQUENCE schema.name
INCREMENT BY x
START WITH x
MAXVALUE x NOMAXVALUE
MINVALUE x NOMINVALUE
CYCLE NOCYCLE
CACHE x NOCACHE
ORDER NOORDER
In this syntax, SCHEMA is an optional parameter that identifies which database schema to place this sequence in. The default is your own.
NAME is mandatory because it is the name of the sequence.
INCREMENT BY is optional. The default is one. Zero is not allowed. If a negative integer is specified, the sequence will descend in order. A positive integer will make the sequence ascend (the default).
START WITH is an optional integer that enables the sequence to begin anywhere.
MAXVALUE is an optional integer that places a limit on the sequence.
NOMAXVALUE is optional. It causes the maximum ascending limit to be 10 27 and -1 for descending sequences. This is the default.
MINVALUE is an optional integer that determines the minimum a sequence can be.
NOMINVALUE is optional. It causes the minimum ascending limit to be 1 and -(10 26) for descending sequences. This is the default.
CYCLE is an option that enables the sequence to continue even when the maximum has been reached. If the maximum is reached, the next sequence that will be generated is whatever the minimum value is.
NOCYCLE is an option that does not enable the sequence to generate values beyond the defined maximum or minimum. This is the default.
CACHE is an option that enables sequence numbers to be preallocated that will be stored in memory for faster access. The minimum value is 2.
NOCACHE is an option that will not enable the preallocation of sequence numbers.
ORDER is an option that ensures the sequence numbers are generated in order of request.
NOORDER is an option that does not ensure that sequence numbers are generated in the order they are requested.
If you want to create a sequence for your adrs_id column in the ADDRESSES table, it could look like the following example:
CREATE SEQUENCE adrs_seq
INCREMENT BY 5
START WITH 100;
To generate a new sequence number, use the pseudocolumn NEXTVAL. This needs to be preceded with your sequence name. For example, adrs_seq.nextval would return 100 for the first access and 105 for the second. If determining the current sequence number is
necessary, use CURRVAL. Therefore, adrs_seq.currval will return the current value of the sequence.
The purpose of this chapter is not to elaborate on every SQL statement. The ones given have been covered to give an overview of the more common create statements. Listed next is an alphabetical list of all objects that can be created with the CREATE
statement.
CREATE xxx, where xxx is one of the following:
CLUSTER
CONTROLFILE
DATABASE
DATABASE LINK
DATAFILE
FUNCTION
INDEX
PACKAGE BODY
PACKAGE
PROCEDURE
PROFILE
ROLE
ROLLBACK SEGMENT
SCHEMA
SEQUENCE
SNAPSHOT
SNAPSHOT LOG
SYNONYM
TABLE
TABLESPACE
TRIGGER
USER
VIEW
To retrieve data from the database, use the SELECT statement. Once again, proper privileges are required and are maintained by the DBA. The SELECT statement has the following format:
SELECT column(s)
FROM tables(s)
WHERE conditions are met
GROUP BY selected columns
ORDER BY column(s);
Every SQL statement ends with a semicolon (;). When you are writing scripts (disk files) that will be executed, you can also use a slash () to terminate the SQL statement.
When SELECT column(s) is used, it is assumed that all of the columns fitting the WHERE clause will be retrieved. It is sometimes necessary to only retrieve columns that are distinct from one another. To do this, use the reserved word DISTINCT before the
column descriptions. In the following example, a SELECT statement is used to retrieve all of the cities and states from the addresses table (defined previously).
SELECT city, state
FROM addresses;
When this code run, every city and state will be retrieved from the table. If 30 people lived in Rochester, NY, the data would be displayed 30 times. To see only one occurrence for each city and state use the DISTINCT qualifier, as shown in the
following example:
SELECT DISTINCT city, state
FROM addresses;
This will cause only one row to be retrieved for entries with Rochester, NY.
The FROM clause is a listing of all tables needed for the query. You can use table aliases to help simplify queries, as shown in the following example:
SELECT adrs.city, adrs.state
FROM addresses adrs;
In this example, the alias adrs has been given to the table addresses. The alias will be used to differentiate columns with the same name from different tables.
The WHERE clause is used to list the criteria necessary to restrict the output from the query or to join tables in the FROM clause. See the following example.
SELECT DISTINCT city, state
FROM addresses
WHERE state in ('CA','NY','CT')
AND city is NOT NULL;
This example will retrieve cities and states that are in the states of California, New York, and Connecticut. The check for NOT NULL cities will not bring data back if the city field was not filled in.
The GROUP BY clause tells Oracle how to group the records together when certain functions are used.
SELECT dept_no, SUM(emp_salary)
FROM emp
GROUP BY dept_no;
The GROUP BY example will list all department numbers once with the summation of the employee salaries for that particular department.
Functions are an intrinsic part of any SQL statement. Table 4.1 shows an alphabetical list of SQL functions.
Table 4.1. SQL functions.
Name
|
Type
|
Syntax
|
Returns
|
ABS
|
Number
|
ABS(n)
|
Absolute value of n.
|
ADD_MONTHS
|
Date
|
ADD_MONTHS(a,b)
|
Date a plus b months.
|
ASCII
|
Character
|
ASCII(c)
|
Decimal representation of c.
|
AVG
|
Group
|
AVG(DISTINCT|ALL n)
|
Average value of n. ALL is default.
|
CEIL
|
Number
|
CEIL(n)
|
Smallest integer equal to or greater than n.
|
CHARTOROWID
|
Conversion
|
CHARTOROWID(c)
|
Converts character to rowid data type.
|
CHR
|
Character
|
CHR(n)
|
Character having binary equivalent to n.
|
CONCAT
|
Character
|
CONCAT(1,2)
|
Character 1 concatenated with character 2.
|
CONVERT
|
Conversion
|
CONVERT(a,
|
Converts character
|
|
|
dest_c [,source_c])
|
string a from one character set to another. The source source_c to the destination character set dest_c.
|
COS
|
Number
|
COS(n)
|
Cosine of n.
|
COSH
|
Number
|
COSH(n)
|
Hyperbolic cosine of n.
|
COUNT
|
Group
|
COUNT(DISTINCT|ALL e)
|
Number of rows in a query. ALL is default. e can be represented as * to indicate all columns.
|
EXP
|
Number
|
EXP(n)
|
e raised to the nth power.
|
FLOOR
|
Number
|
FLOOR(n)
|
Largest integer equal to or less than n.
|
GREATEST
|
Other
|
GREATEST(e [,e]...)
|
The greatest of the list of expressions e.
|
HEXTORAW
|
Conversion
|
HEXTORAW(c)
|
Converts hexadecimal character c to raw.
|
INITCAP
|
Character
|
INITCAP(c)
|
c with the first letter of each word in uppercase.
|
INSTR
|
Character
|
INSTR
|
Searches 1 with
|
|
|
(1, 2 [, n [, m]])
|
nth character for mth occurrence of 2 and returns the position of the occurrence.
|
INSTRB
|
Character
|
INSTRB(1,2[,n[,m]])
|
Same as INSTR except numeric parameters are in terms of bytes.
|
LAST_DAY
|
Dae
|
LAST_DAY(a)
|
Last day of the month (date) containing a.
|
LEAST
|
Other
|
LEAST(e [,e]...)
|
The least of the list of expressions e.
|
LENGTH
|
Character
|
LENGTH(c)
|
Number of characters in c. If c is a fixed-length data type (char), all trailing blanks are included.
|
LENGTHB
|
Character
|
LENGTHB(c)
|
Same as LENGTH except in bytes.
|
LN
|
Number
|
LN(n)
|
Natural logarithm if n, where n > 0.
|
LOG
|
Number
|
LOG(b,n)
|
Logarithm, base b, of n.
|
LOWER
|
Character
|
LOWER(c)
|
c with all letters in lowercase.
|
LPAD
|
Character
|
LPAD(1,n [,2])
|
Character 1 left padded to length of n. If character 2 is not omitted, use as a pattern instead of blanks.
|
LTRIM
|
Character
|
LTRIM(c [,set])
|
Removed characters from the left of c. If set s defined, remove initial characters up to the first character not in set.
|
MAX
|
Other
|
MAX(DISTINCT|ALL e)
|
Maximum of expression e. ALL is default.
|
MIN
|
Other
|
MIN(DISTINCT|ALL e)
|
Minimum of expression e. ALL is default.
|
MOD
|
Number
|
MOD(r,n)
|
Remainder of r divided by n.
|
MONTHS_BETWEEN
|
Date
|
MONTHS_BETWEEN(a,b)
|
Number of days between dates a and b.
|
NEW_TIME
|
Date
|
NEW_TIME(a, z1, z2)
|
Date and time in time zone z2 when date and time in time zone z1 are a).
|
NEXT_DAY
|
Date
|
NEXT_DAY(a, c)
|
Date of first weekday identified by c that is later than date a.
|
NLSSORT
|
Character
|
NLSSORT((c [,parm])
|
String of bytes to sort c.
|
NLS_INITCAP
|
Character
|
NLS_INITCAP
|
c with the first
|
|
|
(c [,parm])
|
letter of each word in uppercase. parm has the form of NLS_SORT = s where s is a linguistic sort or binary.
|
NLS_LOWER
|
Character
|
NLS_LOWER(c [,parm])
|
c with all letters lowercase. See parm above.
|
NLS_UPPER
|
Character
|
NLS_UPPER(c [,parm])
|
c with all letters uppercase. See parm above.
|
NVL
|
Other
|
NVL(e1, e2)
|
If e1 is null, returns e2. If e1 is not null, returns e1.
|
POWER
|
Number
|
POWER(m,n)
|
m raised to the nth power.
|
RAWTOHEX
|
Conversion
|
RAWTOHEX(raw)
|
Converts raw value to its hexadecimal equivalent.
|
REPLACE
|
Character
|
REPLACE(c, s1 [, r2])
|
Replace each occurrence of string s1 in c with r2. If r2 is omitted then all occurrences of s1 are removed.
|
ROUND
|
Date
|
ROUND(n [,f])
|
Date rounded to format model f. If f is omitted, n will be rounded to nearest day.
|
ROUND
|
Number
|
ROUND(n[,m])
|
n rounded to m places right of decimal point. If m is omitted, to 0 places.
|
ROWIDTOCHAR
|
Conversion
|
ROWIDTOCHAR(rowid)
|
Converts rowid to varchar2 format with length of 18.
|
RPAD
|
Character
|
RPAD(1, n [, 2])
|
1 right-padded to length of n with 2.
|
RTRIM
|
Character
|
RTRIM(c [, s])
|
c with characters removed after last character not in set s. If s is omitted, set defaulted to ''.
|
SIGN
|
Number
|
SIGN(n)
|
-1 if n < 0, 0 if n = 0, 1 if n > 0.
|
SIN
|
Number
|
SIN(n)
|
Sine of n.
|
SINH
|
Number
|
SINH(n)
|
Hyperbolic sine of n.
|
SOUNDEX
|
Character
|
SOUNDEX(c)
|
A string with phonetic representation of c.
|
SUBSTR
|
Character
|
SUBSTR(c, m [,n])
|
A portion of c beginning at character number m for n characters. If m is negative, Oracle counts backward from the end of c. If n is omitted, all characters are returned to the end of c.
|
SUBSTRB
|
Character
|
SUBSTRB(c, m [,n])
|
The same as SUBSTR except m and n are number of bytes.
|
SQRT
|
Number
|
SQRT(n)
|
Square root of n.
|
STDDEV
|
Group
|
STDDEV(DISTINCT|ALL n)
|
Standard deviation of number n.
|
SUM
|
Group
|
SUM(DISTINCT|ALL n)
|
Sum of numbers n.
|
SYSDATE
|
Date
|
SYSDATE
|
Current date and time.
|
TAN
|
Number
|
TAN(n)
|
Tangent of n.
|
TANH
|
Number
|
TANH(n)
|
Hyperbolic tangent of n.
|
TO_CHAR
|
Conversion
|
TO_CHAR
|
Converts d date to
|
|
|
(d [,f [,parm])
|
varchar2 data type with format f and nls_date_language of parm.
|
TO_CHAR
|
Conversion
|
TO_CHAR
|
Converts n number
|
|
|
(n [,f [,parm])
|
data type to a varchar2 equivalent and number format element parm.
|
TO_DATE
|
Conversion
|
TO_DATE
|
Converts varchar2
|
|
|
(c [, f [, parm])
|
data type c to date data type with format f and nls date format element parm.
|
TO_MULTI_BYTE
|
Conversion
|
TO_MULTI_BYTE(c)
|
Converts c to their corresponding multibyte equivalent.
|
TO_NUMBER
|
Conversion
|
TO_NUMBER
|
Converts character
|
|
|
(c [,f [, parm]])
|
c to a number using format f and nls number format element parm.
|
TO_SINGLE_BYTE
|
Conversion
|
TO_SINGLE_BYTE(c)
|
Converts multibyte character c to its single byte equivalent.
|
TRANSLATE
|
Character
|
TRANSLATE(c, f, t)
|
c with each occurrences in f with each corresponding character in t.
|
TRUNC
|
Date
|
TRUNC(c [,f])
|
c with time portion truncated to format f.
|
TRUNC
|
Number
|
TRUNC(n[,m])
|
n truncated to m decimal places. If m is omitted, to 0 places.
|
UID
|
Other
|
UID
|
An integer that uniquely identifies the user.
|
USER
|
Other
|
USER
|
Current user as a varchar2.
|
UPPER
|
Character
|
UPPER(c)
|
c with all letters in uppercase.
|
VARIANCE
|
Group
|
VARIANCE
|
Variance of number
|
|
|
(DISTINCT|ALL n)
|
n.
|
VSIZE
|
Other
|
VSIZE(e)
|
Number of bytes from the internal representation of e. |
Now look at some examples using functions.
SELECT SUBSTR(addrs_1,1,30),
city, state, zip
FROM addresses
WHERE addrs_1 is not null
AND UPPER(city) = 'ROCHESTER'
AND TO_NUMBER(SUBSTR(zip,1,5)) > 14525
AND NVL(active_date,SYSDATE) > TO_DATE('01-JAN-90');
Notice the use of the UPPER function. When Oracle performs character string comparisons, the case (upper- and lower-) of the strings in question have to match exactly. Therefore, 'Rochester' does not equal 'ROCHESTER'. The UPPER function will ensure
that the column city will be converted to uppercase prior to the comparison of the literal 'ROCHESTER'.
The SUBSTR function is also used to retrieve the characters 1 through 30 of column addrs_1. All remaining characters beyond 30 will not be seen. This function is also used in the WHERE clause to retrieve the first five characters of the zip column
before converting it to a numerical value. The comparison is made after the conversion has taken place.
If the column active_date contains any nulls, they will be included in the data set because of the NVL function. If active_date is null, the current date will be returned before the comparison is made to the constant '01-JAN-90'. The constant
'01-JAN-90' is converted to a date data type to ensure format compatibility. For a complete list of all date formats, see "Elements of SQL" in Oracle's SQL Language Reference Manual.
SELECT dept_no,
SUM(emp_salary),
AVG(emp_salary)
FROM emp
WHERE dept_no = dept_no
GROUP BY dept_no;
This example shows the use of the SUM and AVG functions. The retrieved data will show the summation of employee salaries and the average salary by department. Notice that the GROUP BY clause has to be used in this query.
To ensure that your data contains all of the required columns and restrictions, you must be familiar with the database schema. If a schema diagram is not available, there are numerous ways to find out what tables or views might be needed for writing
queries. One way is to look at some of the data dictionary tables.
To view all of the data dictionary table names, issue the following SELECT statement:
SELECT table_name
FROM dictionary
ORDER BY table_name;
Some of the tables of interest should be all_tables, all_columns, all_views and all_constraints.
To view the column names of these tables, issue 'DESC table_name'. DESC stands for DESCribe and 'table_name' is the name of the table in question, such as 'all_tables'. Therefore, 'DESC all_tables' will return all of the columns and their data types for
the table 'all_tables'.
With the help of the data dictionary tables, it is possible to determine what tables, views, and constraints are in effect for the application in question.
Tables are physically joined in the FROM clause of your query. They are logically joined in the WHERE clause. Table columns that appear in the WHERE clause must have the table name listed in the FROM clause. The WHERE clause is where the tables relate
one to another.
The way in which the WHERE clause is constructed greatly affects the performance of the query. A two-table join does not necessarily perform better than a 10-table join.
If there are a lot of queries that have a large number of tables joined together (more than seven tables, for example), you might need to consider denormalizing certain data elements to reduce the number of table joins. This type of denormalization
might be required when user productivity or system performance has significantly decreased.
Table 4.2 shows three tables that you will be working with for the examples.
Table. 4.2. Table descriptions.
Table Name
|
Column Name
|
Data Type
|
emp
|
emp_id
|
number(6)
|
emp
|
adrs_id
|
number(6)
|
emp
|
first_name
|
varchar2(40)
|
emp
|
last_name
|
varchar2(40)
|
emp
|
dept_no
|
number(3)
|
emp
|
hire_date
|
date
|
emp
|
job_title
|
varchar2(40)
|
emp
|
salary
|
number(6)
|
emp
|
manager_id
|
number(6)
|
|
|
|
dept
|
dept_no
|
number(3)
|
dept
|
name
|
varchar(40)
|
dept
|
adrs_id
|
number(6)
|
|
|
|
addresses
|
adrs_id
|
number(6)
|
addresses
|
active_date
|
date
|
addresses
|
box_number
|
number(6)
|
addresses
|
adrs_1
|
varchar2(40)
|
addresses
|
adrs_2
|
varchar2(40)
|
addresses
|
city
|
varchar2(40)
|
addresses
|
state
|
varchar2(2)
|
addresses
|
zip
|
varchar2(10) |
In the following example, a query is written that will list all departments with their corresponding employees and the city in which the department resides.
SELECT d.name,
e.last_name,
e.first_name,
a.city
FROM emp e,
dept d,
addresses a
WHERE d.dept_no = e.dept_no
AND a.adrs_id = d.adrs_id
ORDER BY d.name,e.last_name,e.first_name;
If the employee city needed to be retrieved as well, the query could be written like the following:
SELECT d.name,
a.city dept_city,
e.last_ name,
e.first_name,
z.city emp_city
FROM emp e,
dept d,
addresses a,
addresses z
WHERE d.dept_no = e.dept_no
AND a.adrs_id = d.adrs_id
AND z.adrs_id = e.adrs_id
ORDER BY d.name,e.last_name,e.first_name;
In this example the addresses table was joined twice, enabling the city column to be retrieved for both the department and employee. In order to clarify the output, aliases were assigned to the different city columns in the SELECT portion of the query.
The following example adds the employee manager's name to the query.
SELECT d.name,
a.city dept_city,
e.last_name,
e.first_name,
z.city emp_city,
m.first_name || m.last_name manager
FROM emp e,
dept d,
addresses a,
addresses z,
emp m
WHERE d.dept_no = e.dept_no
AND a.adrs_id = d.adrs_id
AND z.adrs_id = e.adrs_id
AND m.emp_id = e.manager_id
ORDER BY d.name,e.last_name,e.first_name;
The output from this query will cause the manager (alias) column to appear as one column even though it is made from two columns. The symbol (||) is used to concatenate columns together.
A Cartesian join happens when the WHERE clause is not properly constructed. A record is returned for every occurrence in table Z and table X. See the following example:
SELECT X.name,
Z.last_name,
Z.first_name
FROM emp Z,
dept X
ORDER BY X.name, Z.last_name;
If the emp table had 10 employees and the department table had three departments, this query would return 30 rows. For each department name, all employees would be listed because the tables are not joined properly (not at all in this example). If the
join condition WHERE X.dept_no = Z.dept_no existed, only 10 rows would be retrieved.
When the columns of a table are outer joined, this tells the database to retrieve rows even if data is not found. The plus symbol (+) is used to denote an outer join condition, as shown in the following example:
SELECT d.name,
a.city,
e.last_name,
e.first_name
FROM emp e,
dept d,
addresses a
WHERE d.dept_no(+) = e.dept_no
AND a.adrs_id = d.adrs_id
ORDER BY d.name,e.last_name,e.first_name;
If the president of the company was never assigned a department, his name would never be retrieved in previous examples because his department number would be null. The outer join would cause all rows to be retrieved even if there is not a match for
dept_no.
Outer joins are effective but will make the query perform slower. You might need to rewrite the query if you need to improve performance.
Subqueries, or nested queries, are used to bring back a set of rows to be used by the parent query. Depending on how the subquery is written, it can be executed once for the parent query or it can be executed once for each row returned by the parent
query. If the subquery is executed for each row of the parent, this is called a correlated subquery.
A correlated subquery can be easily identified if it contains any references to the parent subquery columns in its WHERE clause. Columns from the subquery cannot be referenced anywhere else in the parent query. The following example demonstrates a
non-correlated subquery.
SELECT e.first_name,
e.last_name,
e.job_title
FROM emp e
WHERE e.dept_no in (SELECT dept_no
FROM dept
WHERE name = 'ADMIN');
In this example, all employee names and job titles will be retrieved for the department 'ADMIN'. Notice the use of the operator in when referring to the subquery. The in operator is used when one or more rows might be returned by a subquery. If the
equal operator (=) is used, it is assumed that only one row will be returned. If the equal operator (=) is used and more than one row is returned, Oracle will return an error.
This statement could have been written by directly joining the dept table with the emp table in the main or parent query. Subqueries are sometimes used for performance gain. If the parent query contains a lot of tables, it might be advantageous to break
up the WHERE clause into subqueries.
SELECT d.name,
e.first_name,
e.last_name,
e.job_title
FROM emp e,
dept d
WHERE e.dept_no = d.dept_no
AND d.adrs_id = (SELECT adrs_id
FROM ADDRESSES
WHERE adrs_id = d.adrs_id)
ORDER BY d.name, e.job_title, e.last_name;
In this example, all employees with their corresponding departments will be retrieved only for departments that have a valid adrs_id in the addresses table. This is a correlated subquery because the subquery references a column in the parent query.
SELECT d.name,
e.first_name,
e.last_name,
e.job_title
FROM emp e,
dept d
WHERE e.dept_no = d.dept_no
AND not exists (SELECT 'X'
FROM ADDRESSES
WHERE city in ('ROCHESTER','NEW YORK')
AND adrs_id = d.adrs_id)
ORDER BY d.name, e.job_title, e.last_name;
This example will return all departments and employees except where departments are located in 'ROCHESTER' and 'NEW YORK'. SELECT 'X' will return a true or false type answer that will be evaluated by the not exists operator. Any constant could be used
here; 'X' is only one example.
One of the most powerful and overlooked SQL statements is the DECODE statement. The DECODE statement has the following syntax:
DECODE(val, exp1, exp2, exp3, exp4, ..., def);
DECODE will first evaluate the value or expression val and then compare expression exp1 to val. If val equals exp1, expression exp2 will be returned. If val does not equal exp1, expression exp3 will be evaluated and returns
expression exp4 if val equals exp3. This process continues until all expressions have been evaluated. If there are no matches, the default def will be returned.
SELECT e.first_name,
e.last_name,
e.job_title,
DECODE(e.job_title, 'President', '******', e.salary)
FROM emp e
WHERE e.emp_id in (SELECT NVL(z.manager_id, e.emp_id)
FROM emp z);
In this example, all manager names will be retrieved with their salaries. When the row identifying the president is displayed, show '******' instead of his salary. Also notice the NVL function used to evaluate a null manager ID. Only the president will
have a null manager ID, which would not have been retrieved without the NVL.
Also notice that DECODE is evaluating job_title and returning salary, which would normally be a data type mismatch since the job title and salary columns are different data types but is okay here.
SELECT e.first_name,
e.last_name,
e.job_title,
e.salary
FROM emp e
WHERE DECODE(USER,'PRES',e.emp_id,
UPPER(e.last_name),e.emp_id, 0) = e.emp_id ;
In this example, if the user is the president, all employees will be returned with their corresponding salary. For all other users, only one row will be retrieved, enabling the user to see his or her own salary only.
SELECT e.first_name,
e.last_name,
e.job_title,
DECODE(USER,'ADMIN',DECODE(e.job_title, 'PRESEDENT', '*****', e.salary),
'PRES', e.salary, '******')
FROM emp e
WHERE e.emp_id in (SELECT NVL(z.manager_id, e.emp_id)
FROM emp z);
In this example, the DECODE statement is nested with another DECODE statement. If the Oracle user is 'ADMIN', show the salaries except for the president's salary. If the Oracle user is 'PRES', show all salaries and if the user is anybody else, return
'******'.
Another place the DECODE statement can be used is in the ORDER BY clause. The next example will sort the output in such a way that the president is the first row returned followed by the departments 'SALES', 'ADMIN', and then 'IS' with their
corresponding employees.
SELECT d.name,
e.job_title,
e.first_name,
e.last_name
FROM emp e,
dept d
WHERE d.dept_no = e.dept_no
ORDER BY DECODE(e.job_title,'PRESIDENT', 0,
DECODE(d.name,'SALES', 1,
'ADMIN', 2, 3)), e.last_name;
This example does not ORDER BY e.job_title but uses this column to search for the title 'PRESIDENT' and returns a 0. For all other rows, another DECODE is used to evaluate the department name and returning numbers 1, 2, or 3 depending upon what the
department name is. After the DECODEs are finished, the data is further sorted by employee last name e.last_name.
The INSERT statement is used to put new rows into the database. This can be done one row at a time using the VALUES expression or a whole set of records at a time using a subquery. The following is the syntax for an INSERT statement:
INSERT INTO schema.table column(s) VALUES subquery
where schema is an optional parameter to identify which database schema to use for the insert. The default is your own.
table is mandatory and is the name of the table.
column is a list of columns that will receive the inserted values.
VALUES is used when one row of data will be inserted. Values are represented as constants.
subquery is used when the VALUES option is not used. The columns in the subquery must match the sequence and data types of the columns in the insert list.
INSERT INTO dept (dept_no,
name,
adrs_id)
VALUES (dept_seq.NEXTVAL,
'CUSTOMER SERVICE',
adrs_seq.NEXTVAL);
This example inserts one row into the table dept. Sequences dept_seq and adrs_seq are used to retrieve the next numeric values for dept_no and adrs_id.
If multiple rows need to be inserted, the INSERT EXAMPLE 1 statement would have to executed for each individual row. If a subquery can be used, multiple rows would be inserted for each row returned by the subquery.
INSERT INTO emp (emp_id,
first_name,
last_name,
dept_no,
hire_date,
job_title,
salary,
manager_id)
SELECT emp_seq.NEXTVAL,
new.first_name,
new.last_name,
30,
SYSDATE,
'CUSTOMER REPRESENTATIVE',
new.salary,
220
FROM candidates new
WHERE new.accept = 'YES'
AND new.dept_no = 30;
This example will insert all rows form the candidates table that have been assigned to department number 30 and have been accepted. Because the department number and manager ID are known, they are used as constants in the subquery. The UPDATE statement
is used to change existing rows in the database. The syntax for the UPDATE statement is
UPDATE schema.table SET column(s) = expr sub query WHERE condition
where
schema is an optional parameter to identify which database schema to use for the update. The default is your own.
table is mandatory and is the name of the table.
SET is a mandatory reserved word.
column is a list of columns that will receive the updated values.
expr is the new value to be assigned.
sub query is a select statement that will retrieve the new data values.
WHERE is optional and is used to restrict which rows are to be updated.
UPDATE emp
SET dept_no = 30
WHERE last_name = 'DOE'
AND first_name = 'JOHN';
This example will transfer an employee named JOHN DOE to department 30. If there is more than one JOHN DOE, further restrictions will have to be made in the WHERE clause.
UPDATE emp
SET salary = salary + (salary * .05);
This update example will give everyone in table emp a 5 percent increase in salary.
UPDATE emp a
SET a.salary = (SELECT a.salary
+ (a.salary * DECODE(d.name, 'SALES', .1,
'ADMIN', .07,
.06))
FROM dept d
WHERE d.dept_no = a.dept_no)
WHERE a.dept_no = (SELECT dept_no
FROM dept y, addresses z
WHERE y.adrs_id = z.adrs_id
AND z.city = 'ROCHESTER');
This example will give raises to employees located in Rochester. The amount of the raise is handled by the DECODE statement evaluating the department name. Employees in the Sales department will receive a 10 percent raise, the Admin department receives
a seven percent raise, and everyone else receives a 6 percent raise.
The DELETE statement is used to remove database rows. The syntax for DELETE is
DELETE FROM schema.table WHERE condition
where
SCHEMA is an optional parameter to identify which database schema to use for the delete. The default is your own.
TABLE is mandatory and is the name of the table.
WHERE restricts the delete operation.
DELETE FROM addresses
WHERE adrs_id = (SELECT e.adrs_id
FROM emp e
WHERE e.last_name = 'DOE'
AND e.first_name = 'JOHN');
DELETE FROM emp e
WHERE e.last_name = 'DOE'
AND e.first_name = 'JOHN';
If employee John Doe left the company, you probably would want to delete him from the database. One way to accomplish this is to delete the row containing his name from the addresses table and the emp table. In order to find John Doe in the addresses
table, you must perform a subquery using the emp table. Therefore, the entry in the emp table has to be the last row to be deleted, or else there would be an orphan row in the addresses table.
DELETE FROM dept
WHERE adrs_id is null;
In this example, all rows in the dept table will be deleted if the corresponding adrs_id is null.
Deletes are permanent! Once the commit has taken place, it is impossible to get the row(s) back apart from issuing an INSERT statement. There is not an undo command available.
Any time a SQL statement is constructed with multiple tables, a parent/child relationship is usually in effect.
The user must be familiar with the database schema in use and the corresponding constraints in order to properly join tables. Writing poorly constructed SELECT statements will not harm the database but might decrease the system performance and possibly
give a false relationship representation to the users. If there are poorly constructed INSERT, UPDATE, or DELETE statements, the effect could be disastrous.
Before you see any examples, certain assumptions need to be made.
- An employee cannot be entered without a department number. This indicates that the emp table is a child of the dept table.
- Addresses do not have to be entered when creating a new employee or department. Therefore, the addresses table is optional and is a child of the emp table and a child of the dept table.
If these constraints are enforced in the database, protection would be provided when a parent row is deleted but does not delete the corresponding children.
SELECT d.name dept_name,
d.dept_no dept_number,
e.first_name || e.last_name emp_name,
e.job_title title,
e.hire_date start_date
FROM dept d,
emp e
WHERE d.dept_no = e.dept_no
ORDER BY d.name, e.last_name;
In this example, all the department names and numbers will be displayed (the parent) with all of the corresponding employees (the children) in the departments.
SELECT d.name dept_name,
d.dept_no dept_number,
e.first_name || e.last_name emp_name,
e.job_title title,
e.hire_date start_date,
DECODE(a.box_number, NULL, a.adrs_1, a.box_number) address,
DECODE(a.adrs_2, NULL, NULL, a.adrs_2) address_2,
a.city || ', '||a.state ||' '||a.zip city_stat_zip
FROM dept d,
emp e,
addresses a
WHERE d.dept_no = e.dept_no
AND e.adrs_id = a.adrs_id (+)
ORDER BY d.name, e.last_name;
This example shows the addition of the optional child table, called addresses. An outer join, (+) is used so that the employee row will still be retrieved even if there is no address information available yet. The DECODEs will retrieve the box number or
adrs 1 depending upon the existence of box number.
When you write INSERTs, UPDATEs, or DELETEs, be careful and make sure the proper relationships exist within the subqueries. If you must manipulate each row from the query, you must use a cursor that is part of the PL/SQL language.
The following is a summary of some of the information already given in this chapter, along with some new ideas.
- When you are comparing date data types, it might be wise to truncate the dates (TRUNC hire_date) to ensure that the time component does not cause erroneous results. If the application enables the insertion of time into the date data types, then the
inserted times will also be taken into consideration when manipulating dates.
- When writing SQL statements, null values will not be considered by the database unless they are explicitly called for.
SELECT e.first_name || e.last_name emp_name,
z.first_name || z.last_name manager
FROM emp e,
emp z
WHERE z.emp_id = e.manager_id;
In this example, all rows for the employee table will be retrieved except for one, the row containing the president. This is because the manager_id for the president is null.
The following example shows how to retrieve the president along with the other employees.
SELECT e.first_name || e.last_name emp_name,
z.first_name || z.last_name manager
FROM emp e,
emp z
WHERE z.emp_id = NVL(e.manager_id, e.emp_id);
This code checks for a null manager ID. If the manager ID is null, the database will return the employee ID, which will validate properly and return a row for the president.
This chapter covered the major components of the SQL language, placing emphasis on functionality that is frequently used or misunderstood (such as with the DECODE statement). The reader should have sufficient information to venture out and begin writing
good SQL code.
|