Teach Yourself SQL in 21 Days, Second Edition
- 15 -
Streamlining SQL Statements for Improved Performance
Streamlining SQL statements is as much a part of application performance as database
designing and tuning. No matter how fine-tuned the database or how sound the database
structure, you will not receive timely query results that are acceptable to you,
or even worse, the customer, if you don't follow some basic guidelines. Trust us,
if the customer is not satisfied, then you can bet your boss won't be satisfied either.
Objectives
You already know about the major components of the relational database language
of SQL and how to communicate with the database; now it's time to apply your knowledge
to real-life performance concerns. The objective of Day 15 is to recommend methods
for improving the performance of, or streamlining, an SQL statement. By the end of
today, you should
- Understand the concept of streamlining your SQL code
- Understand the differences between batch loads and transactional processing and
their effects on database performance
- Be able to manipulate the conditions in your query to expedite data retrieval
- Be familiar with some underlying elements that affect the tuning of the entire
database
Here's an analogy to help you understand the phrase streamline an SQL statement:
The objective of competitive swimmers is to complete an event in as little time as
possible without being disqualified. The swimmers must have an acceptable technique,
be able to torpedo themselves through the water, and use all their physical resources
as effectively as possible. With each stroke and breath they take, competitive swimmers
remain streamlined and move through the water with very little resistance.
Look at your SQL query the same way. You should always know exactly what you want
to accomplish and then strive to follow the path of least resistance. The more time
you spend planning, the less time you'll have to spend revising later. Your goal
should always be to retrieve accurate data and to do so in as little time as possible.
An end user waiting on a slow query is like a hungry diner impatiently awaiting a
tardy meal. Although you can write most queries in several ways, the arrangement
of the components within the query is the factor that makes the difference of seconds,
minutes, and sometimes hours when you execute the query. Streamlining SQL
is the process of finding the optimal arrangement of the elements within your query.
In addition to streamlining your SQL statement, you should also consider several
other factors when trying to improve general database performance, for example, concurrent
user transactions that occur within a database, indexing of tables, and deep-down
database tuning.
NOTE: Today's examples use Personal Oracle7
and tools that are available with the Oracle7.3 relational database management system.
The concepts discussed today are not restricted to Oracle; they may be applied to
other relational database management systems.
Make Your SQL Statements Readable
Even though readability doesn't affect the actual performance of SQL statements,
good programming practice calls for readable code. Readability is especially important
if you have multiple conditions in the WHERE clause. Anyone reading the
clause should be able to determine whether the tables are being joined properly and
should be able to understand the order of the conditions.
Try to read this statement:
SQL> SELECT EMPLOYEE_TBL.EMPLOYEE_ID, EMPLOYEE_TBL.NAME,EMPLOYEE_PAY_TBL.SALARY,EMPLOYEE_PAY_TBL.HIRE_DATE
2 FROM EMPLOYEE_TBL, EMPLOYEE_PAY_TBL
3 WHERE EMPLOYEE_TBL.EMPLOYEE_ID = EMPLOYEE_PAY_TBL.EMPLOYEE_ID AND
4 EMPLOYEE_PAY_TBL.SALARY > 30000 OR (EMPLOYEE_PAY_TBL.SALARY BETWEEN 25000
5 AND 30000 AND EMPLOYEE_PAY_TBL.HIRE_DATE < SYSDATE - 365);
Here's the same query reformatted to enhance readability:
SQL> SELECT E.EMPLOYEE_ID, E.NAME, P.SALARY, P.HIRE_DATE
2 FROM EMPLOYEE_TBL E,
3 EMPLOYEE_PAY_TBL P
4 WHERE E.EMPLOYEE_ID = P.EMPLOYEE_ID
5 AND P.SALARY > 30000
6 OR (P.SALARY BETWEEN 25000 AND 30000
7 AND P.HIRE_DATE < SYSDATE - 365);
NOTE: Notice the use of table aliases
in the preceding query. EMPLOYEE_TBL in line 2 has been assigned the alias
E, and EMPLOYEE_PAY_TBL in line 3 has been assigned the alias P.
You can see that in lines 4, 5, 6, and 7, the E and P stand for
the full table names. Aliases require much less typing than spelling out the full
table name, and even more important, queries that use aliases are more organized
and easier to read than queries that are cluttered with unnecessarily long full table
names.
The two queries are identical, but the second one is obviously much easier to
read. It is very structured; that is, the logical components of the query
have been separated by carriage returns and consistent spacing. You can quickly see
what is being selected (the SELECT clause), what tables are being accessed
(the FROM clause), and what conditions need to be met (the WHERE
clause).
The Full-Table Scan
A full-table scan occurs when the database server reads every record in a table
in order to execute an SQL statement. Full-table scans are normally an issue when
dealing with queries or the SELECT statement. However, a full-table scan
can also come into play when dealing with updates and deletes. A full-table scan
occurs when the columns in the WHERE clause do not have an index associated
with them. A full-table scan is like reading a book from cover to cover, trying to
find a keyword. Most often, you will opt to use the index.
You can avoid a full-table scan by creating an index on columns that are used
as conditions in the WHERE clause of an SQL statement. Indexes provide a
direct path to the data the same way an index in a book refers the reader to a page
number. Adding an index speeds up data access.
Although programmers usually frown upon full-table scans, they are sometimes appropriate.
For example:
- You are selecting most of the rows from a table.
- You are updating every row in a table.
- The tables are small.
In the first two cases an index would be inefficient because the database server
would have to refer to the index, read the table, refer to the index again, read
the table again, and so on. On the other hand, indexes are most efficient when the
data you are accessing is a small percentage, usually no more than 10 to 15 percent,
of the total data contained within the table.
In addition, indexes are best used on large tables. You should always consider
table size when you are designing tables and indexes. Properly indexing tables involves
familiarity with the data, knowing which columns will be referenced most, and may
require experimentation to see which indexes work best.
NOTE: When speaking of a "large table,"
large is a relative term. A table that is extremely large to one individual
may be minute to another. The size of a table is relative to the size of other tables
in the database, to the disk space available, to the number of disks available, and
simple common sense. Obviously, a 2GB table is large, whereas a 16KB table is small.
In a database environment where the average table size is 100MB, a 500MB table may
be considered massive.
Adding a New Index
You will often find situations in which an SQL statement is running for an unreasonable
amount of time, although the performance of other statements seems to be acceptable;
for example, when conditions for data retrieval change or when table structures change.
We have also seen this type of slowdown when a new screen or window has been added
to a front-end application. One of the first things to do when you begin to troubleshoot
is to find out whether the target table has an index. In most of the cases we have
seen, the target table has an index, but one of the new conditions in the WHERE
clause may lack an index. Looking at the WHERE clause of the SQL statement,
we have asked, Should we add another index? The answer may be yes if:
- The most restrictive condition(s) returns less than 10 percent of the rows in
a table.
- The most restrictive condition(s) will be used often in an SQL statement.
- Condition(s) on columns with an index will return unique values.
- Columns are often referenced in the ORDER BY and GROUP BY clauses.
Composite indexes may also be used. A composite index is an index on two
or more columns in a table. These indexes can be more efficient than single-column
indexes if the indexed columns are often used together as conditions in the WHERE
clause of an SQL statement. If the indexed columns are used separately as well as
together, especially in other queries, single-column indexes may be more appropriate.
Use your judgment and run tests on your data to see which type of index best suits
your database.
Arrangement of Elements in a Query
The best arrangement of elements within your query, particularly in the WHERE
clause, really depends on the order of the processing steps in a specific implementation.
The arrangement of conditions depends on the columns that are indexed, as well as
on which condition will retrieve the fewest records.
You do not have to use a column that is indexed in the WHERE clause,
but it is obviously more beneficial to do so. Try to narrow down the results of the
SQL statement by using an index that returns the fewest number of rows. The condition
that returns the fewest records in a table is said to be the most restrictive
condition. As a general statement, you should place the most restrictive conditions
last in the WHERE clause. (Oracle's query optimizer reads a WHERE
clause from the bottom up, so in a sense, you would be placing the most restrictive
condition first.)
When the optimizer reads the most restrictive condition first, it is able to narrow
down the first set of results before proceeding to the next condition. The next condition,
instead of looking at the whole table, should look at the subset that was selected
by the most selective condition. Ultimately, data is retrieved faster. The most selective
condition may be unclear in complex queries with multiple conditions, subqueries,
calculations, and several combinations of the AND, OR, and LIKE.
TIP: Always check your database documentation
to see how SQL statements are processed in your implementation.
The following test is one of many we have run to measure the difference of elapsed
time between two uniquely arranged queries with the same content. These examples
use Oracle7.3 relational database management system. Remember, the optimizer in this
implementation reads the WHERE clause from the bottom up.
Before creating the SELECT statement, we selected distinct row counts
on each condition that we planned to use. Here are the values selected for each condition:
Condition |
Distinct Values |
calc_ytd = '-2109490.8' |
13,000 + |
dt_stmp = '01-SEP-96' |
15 |
output_cd = '001' |
13 |
activity_cd = 'IN' |
10 |
status_cd = 'A' |
4 |
function_cd = '060' |
6 |
NOTE: The most restrictive condition is
also the condition with the most distinct values.
The next example places the most restrictive conditions first in the WHERE
clause:
INPUT:
SQL> SET TIMING ON
2 SELECT COUNT(*)
3 FROM FACT_TABLE
4 WHERE CALC_YTD = '-2109490.8'
5 AND DT_STMP = '01-SEP-96'
6 AND OUTPUT_CD = '001'
7 AND ACTIVITY_CD = 'IN'
8 AND STATUS_CD = 'A'
9 AND FUNCTION_CD = '060';
OUTPUT:
COUNT(*)
--------
8
1 row selected.
Elapsed: 00:00:15.37
This example places the most restrictive conditions last in the WHERE
clause:
INPUT/OUTPUT:
SQL> SET TIMING ON
2 SELECT COUNT(*)
3 FROM FACT_TABLE
4 WHERE FUNCTION_CD = '060'
5 AND STATUS_CD = 'A'
6 AND ACTIVITY_CD = 'IN'
7 AND OUTPUT_CD = '001'
8 AND DT_STMP = '01-SEP-96'
9 AND CALC_YTD = '-2109490.8';
COUNT(*)
--------
8
1 row selected.
Elapsed: 00:00:01.80
ANALYSIS:
Notice the difference in elapsed time. Simply changing the order of conditions
according to the given table statistics, the second query ran almost 14 seconds faster
than the first one. Imagine the difference on a poorly structured query that runs
for three hours!
Procedures
For queries that are executed on a regular basis, try to use procedures. A procedure
is a potentially large group of SQL statements. (Refer to Day 13, "Advanced
SQL Topics.")
Procedures are compiled by the database engine and then executed. Unlike an SQL
statement, the database engine need not optimize the procedure before it is executed.
Procedures, as opposed to numerous individual queries, may be easier for the user
to maintain and more efficient for the database.
Avoiding OR
Avoid using the logical operator OR in a query if possible. OR
inevitably slows down nearly any query against a table of substantial size. We find
that IN is generally much quicker than OR. This advice certainly
doesn't agree with documentation stating that optimizers convert IN arguments
to OR conditions. Nevertheless, here is an example of a query using multiple
ORs:
INPUT:
SQL> SELECT *
2 FROM FACT_TABLE
3 WHERE STATUS_CD = 'A'
4 OR STATUS_CD = 'B'
5 OR STATUS_CD = 'C'
6 OR STATUS_CD = 'D'
7 OR STATUS_CD = 'E'
8 OR STATUS_CD = 'F'
9 ORDER BY STATUS_CD;
Here is the same query using SUBSTR and IN:
INPUT:
SQL> SELECT *
2 FROM FACT_TABLE
3 WHERE STATUS_CD IN ('A','B','C','D','E','F')
4 ORDER BY STATUS_CD;
ANALYSIS:
Try testing something similar for yourself. Although books are excellent sources
for standards and direction, you will find it is often useful to come to your own
conclusions on certain things, such as performance.
Here is another example using SUBSTR and IN. Notice that the
first query combines LIKE with OR.
INPUT:
SQL> SELECT *
2 FROM FACT_TABLE
3 WHERE PROD_CD LIKE 'AB%'
4 OR PROD_CD LIKE 'AC%'
5 OR PROD_CD LIKE 'BB%'
6 OR PROD_CD LIKE 'BC%'
7 OR PROD_CD LIKE 'CC%'
8 ORDER BY PROD_CD;
SQL> SELECT *
2 FROM FACT_TABLE
3 WHERE SUBSTR(PROD_CD,1,2) IN ('AB','AC','BB','BC','CC')
4 ORDER BY PROD_CD;
ANALYSIS:
The second example not only avoids the OR but also eliminates the combination
of the OR and LIKE operators. You may want to try this example
to see what the real-time performance difference is for your data.
OLAP Versus OLTP
When tuning a database, you must first determine what the database is being used
for. An online analytical processing (OLAP) database is a system whose function is
to provide query capabilities to the end user for statistical and general informational
purposes. The data retrieved in this type of environment is often used for statistical
reports that aid in the corporate decision-making process. These types of systems
are also referred to as decision support systems (DSS). An online transactional processing
(OLTP) database is a system whose main function is to provide an environment for
end-user input and may also involve queries against day-to-day information. OLTP
systems are used to manipulate information within the database on a daily basis.
Data warehouses and DSSs get their data from online transactional databases and sometimes
from other OLAP systems.
OLTP Tuning
A transactional database is a delicate system that is heavily accessed in the
form of transactions and queries against day-to-day information. However, an OLTP
does not usually require a vast sort area, at least not to the extent to which it
is required in an OLAP environment. Most OLTP transactions are quick and do not involve
much sorting.
One of the biggest issues in a transactional database is rollback segments. The
amount and size of rollback segments heavily depend on how many users are concurrently
accessing the database, as well as the amount of work in each transaction. The best
approach is to have several rollback segments in a transactional environment.
Another concern in a transactional environment is the integrity of the transaction
logs, which are written to after each transaction. These logs exist for the sole
purpose of recovery. Therefore, each SQL implementation needs a way to back up the
logs for use in a "point in time recovery." SQL Server uses dump devices;
Oracle uses a database mode known as ARCHIVELOG mode. Transaction logs also involve
a performance consideration because backing up logs requires additional overhead.
OLAP Tuning
Tuning OLAP systems, such as a data warehouse or decision support system, is much
different from tuning a transaction database. Normally, more space is needed for
sorting.
Because the purpose of this type of system is to retrieve useful decision-making
data, you can expect many complex queries, which normally involve grouping and sorting
of data. Compared to a transactional database, OLAP systems typically take more space
for the sort area but less space for the rollback area.
Most transactions in an OLAP system take place as part of a batch process. Instead
of having several rollback areas for user input, you may resort to one large rollback
area for the loads, which can be taken offline during daily activity to reduce overhead.
Batch Loads Versus Transactional Processing
A major factor in the performance of a database and SQL statements is the type
of processing that takes place within a database. One type of processing is OLTP,
discussed earlier today. When we talk about transactional processing, we are going
to refer to two types: user input and batch loads.
Regular user input usually consists of SQL statements such as INSERT,
UPDATE, and DELETE. These types of transactions are often performed
by the end user, or the customer. End users are normally using a front-end application
such as PowerBuilder to interface with the database, and therefore they seldom issue
visible SQL statements. Nevertheless, the SQL code has already been generated for
the user by the front-end application.
Your main focus when optimizing the performance of a database should be the end-user
transactions. After all, "no customer" equates to "no database,"
which in turn means that you are out of a job. Always try to keep your customers
happy, even though their expectations of system/database performance may sometimes
be unreasonable. One consideration with end-user input is the number of concurrent
users. The more concurrent database users you have, the greater the possibilities
of performance degradation.
What is a batch load? A batch load performs heaps of transactions against
the database at once. For example, suppose you are archiving last year's data into
a massive history table. You may need to insert thousands, or even millions, of rows
of data into your history table. You probably wouldn't want to do this task manually,
so you are likely to create a batch job or script to automate the process. (Numerous
techniques are available for loading data in a batch.) Batch loads are notorious
for taxing system and database resources. These database resources may include table
access, system catalog access, the database rollback segment, and sort area space;
system resources may include available CPU and shared memory. Many other factors
are involved, depending on your operating system and database server.
Both end-user transactions and batch loads are necessary for most databases to
be successful, but your system could experience serious performance problems if these
two types of processing lock horns. Therefore, you should know the difference between
them and keep them segregated as much as possible. For example, you would not want
to load massive amounts of data into the database when user activity is high. The
database response may already be slow because of the number of concurrent users.
Always try to run batch loads when user activity is at a minimum. Many shops reserve
times in the evenings or early morning to load data in batch to avoid interfering
with daily processing.
You should always plan the timing for massive batch loads, being careful to avoid
scheduling them when the database is expected to be available for normal use. Figure
15.1 depicts heavy batch updates running concurrently with several user processes,
all contending for system resources.
Figure 15.1.
System resource contention.
As you can see, many processes are contending for system resources. The heavy
batch updates that are being done throw a monkey wrench into the equation. Instead
of the system resources being dispersed somewhat evenly among the users, the batch
updates appear to be hogging them. This situation is just the beginning of resource
contention. As the batch transactions proceed, the user processes may eventually
be forced out of the picture. This condition is not a good way of doing business.
Even if the system has only one user, significant contention for that user could
occur.
Another problem with batch processes is that the process may hold locks on a table
that a user is trying to access. If there is a lock on a table, the user will be
refused access until the lock is freed by the batch process, which could be hours.
Batch processes should take place when system resources are at their best if possible.
Don't make the users' transactions compete with batch. Nobody wins that game.
Optimizing Data Loads by Dropping Indexes
One way to expedite batch updates is by dropping indexes. Imagine the history
table with many thousands of rows. That history table is also likely to have one
or more indexes. When you think of an index, you normally think of faster table access,
but in the case of batch loads, you can benefit by dropping the index(es).
When you load data into a table with an index, you can usually expect a great
deal of index use, especially if you are updating a high percentage of rows in the
table. Look at it this way. If you are studying a book and highlighting key points
for future reference, you may find it quicker to browse through the book from beginning
to end rather than using the index to locate your key points. (Using the index would
be efficient if you were highlighting only a small portion of the book.)
To maximize the efficiency of batch loads/updates that affect a high percentage
of rows in a table, you can take these three basic steps to disable an index:
- 1. Drop the appropriate index(es).
2. Load/update the table's data.
3. Rebuild the table's index.
A Frequent COMMIT Keeps the DBA Away
When performing batch transactions, you must know how often to perform a "commit."
As you learned on Day 11, "Controlling Transactions," a COMMIT
statement finalizes a transaction. A COMMIT saves a transaction or writes
any changes to the applicable table(s). Behind the scenes, however, much more is
going on. Some areas in the database are reserved to store completed transactions
before the changes are actually written to the target table. Oracle calls these areas
rollback segments. When you issue a COMMIT statement, transactions
associated with your SQL session in the rollback segment are updated in the target
table. After the update takes place, the contents of the rollback segment are removed.
A ROLLBACK command, on the other hand, clears the contents of the rollback
segment without updating the target table.
As you can guess, if you never issue a COMMIT or ROLLBACK command,
transactions keep building within the rollback segments. Subsequently, if the data
you are loading is greater in size than the available space in the rollback segments,
the database will essentially come to a halt and ban further transactional activity.
Not issuing COMMIT commands is a common programming pitfall; regular COMMITs
help to ensure stable performance of the entire database system.
The management of rollback segments is a complex and vital database administrator
(DBA) responsibility because transactions dynamically affect the rollback segments,
and in turn, affect the overall performance of the database as well as individual
SQL statements. So when you are loading large amounts of data, be sure to issue the
COMMIT command on a regular basis. Check with your DBA for advice on how
often to commit during batch transactions. (See Figure 15.2.)
Figure 15.2.
The rollback area.
As you can see in Figure 15.2, when a user performs a transaction, the changes
are retained in the rollback area.
Rebuilding Tables and Indexes in a Dynamic Environment
The term dynamic database environment refers to a large database that is
in a constant state of change. The changes that we are referring to are frequent
batch updates and continual daily transactional processing. Dynamic databases usually
entail heavy OLTP systems, but can also refer to DSSs or data warehouses, depending
upon the volume and frequency of data loads.
The result of constant high-volume changes to a database is growth, which in turn
yields fragmentation. Fragmentation can easily get out of hand if growth is not managed
properly. Oracle allocates an initial extent to tables when they are created. When
data is loaded and fills the table's initial extent, a next extent, which is also
allocated when the table is created, is taken.
Sizing tables and indexes is essentially a DBA function and can drastically affect
SQL statement performance. The first step in growth management is to be proactive.
Allow room for tables to grow from day one, within reason. Also plan to defragment
the database on a regular basis, even if doing so means developing a weekly routine.
Here are the basic conceptual steps involved in defragmenting tables and indexes
in a relational database management system:
- 1. Get a good backup of the table(s) and/or index(es).
2. Drop the table(s) and/or index(es).
3. Rebuild the table(s) and/or index(es) with new space allocation.
4. Restore the data into the newly built table(s).
5. Re-create the index(es) if necessary.
6. Reestablish user/role permissions on the table if necessary.
7. Save the backup of your table until you are absolutely sure that the new
table was built successfully. If you choose to discard the backup of the original
table, you should first make a backup of the new table after the data has been fully
restored.
WARNING: Never get rid of the backup of
your table until you are sure that the new table was built successfully.
The following example demonstrates a practical use of a mailing list table in
an Oracle database environment.
INPUT:
CREATE TABLE MAILING_TBL_BKUP AS
SELECT * FROM MAILING_TBL;
OUTPUT:
Table Created.
INPUT/OUTPUT:
drop table mailing_tbl;
Table Dropped.
CREATE TABLE MAILING_TBL
(
INDIVIDUAL_ID VARCHAR2(12) NOT NULL,
INDIVIDUAL_NAME VARCHAR2(30) NOT NULL,
ADDRESS VARCHAR(40) NOT NULL,
CITY VARCHAR(25) NOT NULL,
STATE VARCHAR(2) NOT NULL,
ZIP_CODE VARCHAR(9) NOT NULL,
)
TABLESPACE TABLESPACE_NAME
STORAGE ( INITIAL NEW_SIZE,
NEXT NEW_SIZE );
Table created.
INSERT INTO MAILING_TBL
select * from mailing_tbl_bkup;
93,451 rows inserted.
CREATE INDEX MAILING_IDX ON MAILING TABLE
(
INDIVIDUAL_ID
)
TABLESPACE TABLESPACE_NAME
STORAGE ( INITIAL NEW_SIZE,
NEXT NEW_SIZE );
Index Created.
grant select on mailing_tbl to public;
Grant Succeeded.
drop table mailing_tbl_bkup;
Table Dropped.
ANALYSIS:
Rebuilding tables and indexes that have grown enables you to optimize storage,
which improves overall performance. Remember to drop the backup table only after
you have verified that the new table has been created successfully. Also keep in
mind that you can achieve the same results with other methods. Check the options
that are available to you in your database documentation.
Tuning the Database
Tuning a database is the process of fine-tuning the database server's performance.
As a newcomer to SQL, you probably will not be exposed to database tuning unless
you are a new DBA or a DBA moving into a relational database environment. Whether
you will be managing a database or using SQL in applications or programming, you
will benefit by knowing something about the database-tuning process. The key to the
success of any database is for all parties to work together. Some general tips for
tuning a database follow.
- Minimize the overall size required for the database.
- It's good to allow room for growth when designing a database, but don't go overboard.
Don't tie up resources that you may need to accommodate database growth.
- Experiment with the user process's time-slice variable.
- This variable controls the amount of time the database server's scheduler allocates
to each user's process.
- Optimize the network packet size used by applications.
- The larger the amount of data sent over the network, the larger the network packet
size should be. Consult your database and network documentation for more details.
- Store transaction logs on separate hard disks.
- For each transaction that takes place, the server must write the changes to the
transaction logs. If you store these log files on the same disk as you store data,
you could create a performance bottleneck. (See Figure 15.3.)
- Stripe extremely large tables across multiple disks.
- If concurrent users are accessing a large table that is spread over multiple
disks, there is much less chance of having to wait for system resources. (See Figure
15.3.)
- Store database sort area, system catalog area, and rollback areas on separate
hard disks.
- These are all areas in the database that most users access frequently. By spreading
these areas over multiple disk drives, you are maximizing the use of system resources.
(See Figure 15.3.)
- This system administrator function can drastically improve database performance.
Adding CPUs can speed up data processing for obvious reasons. If you have multiple
CPUs on a machine, then you may be able to implement parallel processing strategies.
See your database documentation for more information on parallel processing, if it
is available with your implementation.
- Generally, the more the better.
- Store tables and indexes on separate hard disks.
- You should store indexes and their related tables on separate disk drives when-
ever possible. This arrangement enables the table to be read at the same time the
index is being referenced on another disk. The capability to store objects on multiple
disks may depend on how many disks are connected to a controller. (See Figure 15.3.)
Figure 15.3 shows a simple example of how you might segregate the major areas
of your database.
Figure 15.3.
Using available disks to enhance performance.
The scenario in Figure 15.3 uses four devices: disk01 through disk04. The objective
when spreading your heavy database areas and objects is to keep areas of high use
away from each another.
- Disk01-- The system catalog stores information about tables, indexes, users,
statistics, database files, sizing, growth information, and other pertinent data
that is often accessed by a high percentage of transactions.
- Disk02--Transaction logs are updated every time a change is made to a table (insert,
update, or delete). Transaction logs are a grand factor in an online transactional
database. They are not of great concern in a read-only environment, such as a data
warehouse or DSS.
- Disk03--Rollback segments are also significant in a transactional environment.
However, if there is little transactional activity (insert, update, delete), rollback
segments will not be heavily used.
- Disk04-- The database's sort area, on the other hand, is used as a temporary
area for SQL statement processing when sorting data, as in a GROUP BY or
ORDER BY clause. Sort areas are typically an issue in a data warehouse or
DSS. However, the use of sort areas should also be considered in a transactional
environment.
TIP: Also note how the application tables
and indexes have been placed on each disk. Tables and indexes should be spread as
much as possible.
Notice that in Figure 15.3 the tables and indexes are stored on different devices.
You can also see how a "Big Table" or index may be striped across
two or more devices. This technique splits the table into smaller segments that can
be accessed simultaneously. Striping a table or index across multiple devices is
a way to control fragmentation. In this scenario, tables may be read while their
corresponding indexes are being referenced, which increases the speed of overall
data access.
This example is really quite simple. Depending on the function, size, and system-related
issues of your database, you may find a similar method for optimizing system resources
that works better. In a perfect world where money is no obstacle, the best configuration
is to have a separate disk for each major database entity, including large tables
and indexes.
NOTE: The DBA and system administrator
should work together to balance database space allocation and optimize the memory
that is available on the server.
Tuning a database very much depends on the specific database system you are using.
Obviously, tuning a database entails much more than just preparing queries and letting
them fly. On the other hand, you won't get much reward for tuning a database when
the application SQL is not fine-tuned itself. Professionals who tune databases for
a living often specialize on one database product and learn as much as they possibly
can about its features and idiosyncrasies. Although database tuning is often looked
upon as a painful task, it can provide very lucrative employment for the people who
truly understand it.
Performance Obstacles
We have already mentioned some of the countless possible pitfalls that can hinder
the general performance of a database. These are typically general bottlenecks that
involve system-level maintenance, database maintenance, and management of SQL statement
processing.
This section summarizes the most common obstacles in system performance and database
response time.
- Not making use of available devices on the server--A company purchases multiple
disk drives for a reason. If you do not use them accordingly by spreading apart the
vital database components, you are limiting the performance capabilities. Maximizing
the use of system resources is just as important as maximizing the use of the database
server capabilities.
- Not performing frequent COMMITs--Failing to use periodic COMMITs
or ROLLBACKs during heavy batch loads will ultimately result in database
bottlenecks.
- Allowing batch loads to interfere with daily processing--Running batch loads
during times when the database is expected to be available will cause problems for
everybody. The batch process will be in a perpetual battle with end users for system
resources.
- Being careless when creating SQL statements--Carelessly creating complex SQL
statements will more than likely contribute to substandard response time.
TIP: You can use various methods to optimize
the structure of an SQL statement, depending upon the steps taken by the database
server during SQL statement processing.
- Running batch loads with table indexes--You could end up with a batch load that
runs all day and all night, as opposed to a batch load that finishes within a few
hours. Indexes slow down batch loads that are accessing a high percentage of the
rows in a table.
- Having too many concurrent users for allocated memory--As the number of concurrent
database and system users grows, you may need to allocate more memory for the shared
process. See your system administrator.
- Creating indexes on columns with few unique values--Indexing on a column such
as GENDER, which has only two unique values, is not very efficient. Instead,
try to index columns that will return a low percentage of rows in a query.
- Creating indexes on small tables--By the time the index is referenced and the
data read, a full-table scan could have been accomplished.
- Not managing system resources efficiently--Poor management of system resources
can result from wasted space during database initialization, table creation, uncontrolled
fragmentation, and irregular system/database maintenance.
- Not sizing tables and indexes properly--Poor estimates for tables and indexes
that grow tremendously in a large database environment can lead to serious fragmentation
problems, which if not tended to, will snowball into more serious problems.
Built-In Tuning Tools
Check with your DBA or database vendor to determine what tools are available to
you for performance measuring and tuning. You can use performance-tuning tools to
identify deficiencies in the data access path; in addition, these tools can sometimes
suggest changes to improve the performance of a particular SQL statement.
Oracle has two popular tools for managing SQL statement performance. These tools
are explain plan and tkprof. The explain plan
tool identifies the access path that will be taken when the SQL statement is executed.
tkprof measures the performance by time elapsed during each phase of SQL
statement processing. Oracle Corporation also provides other tools that help with
SQL statement and database analysis, but the two mentioned here are the most popular.
If you want to simply measure the elapsed time of a query in Oracle, you can use
the SQL*Plus command SET TIMING ON.
SET TIMING ON and other SET commands are covered in more depth
on Day 20, "SQL*Plus."
Sybase's SQL Server has diagnostic tools for SQL statements. These options are
in the form of SET commands that you can add to your SQL statements. (These
commands are similar to Oracle's SET commands). Some common commands are
SET SHOWPLAN ON, SET STATISTIC IO ON, and SET STATISTICS TIME
ON. These SET commands display output concerning the steps performed
in a query, the number of reads and writes required to perform the query, and general
statement-parsing information. SQL Server SET commands are covered on Day
19, "Transact-SQL: An Introduction."
Summary
Two major elements of streamlining, or tuning, directly affect the performance
of SQL statements: application tuning and database tuning. Each has its own role,
but one cannot be optimally tuned without the other. The first step toward success
is for the technical team and system engineers to work together to balance resources
and take full advantage of the database features that aid in improving performance.
Many of these features are built into the database software provided by the vendor.
Application developers must know the data. The key to an optimal database design
is thorough knowledge of the application's data. Developers and production programmers
must know when to use indexes, when to add another index, and when to allow batch
jobs to run. Always plan batch loads and keep batch processing separate from daily
transactional processing.
Databases can be tuned to improve the performance of individual applications that
access them. Database administrators must be concerned with the daily operation and
performance of the database. In addition to the meticulous tuning that occurs behind
the scenes, the DBA can usually offer creative suggestions for accessing data more
efficiently, such as manipulating indexes or reconstructing an SQL statement. The
DBA should also be familiar with the tools that are readily available with the database
software to measure performance and provide suggestions for statement tweaking.
Q&A
- Q If I streamline my SQL statement, how much of a gain in performance should
I expect?
A Performance gain depends on the size of your tables, whether or not columns
in the table are indexed, and other relative factors. In a very large database, a
complex query that runs for hours can sometimes be cut to minutes. In the case of
transactional processing, streamlining an SQL statement can save important seconds
for the end user.
Q How do I coordinate my batch loads or updates?
A Check with the database administrator and, of course, with management when
scheduling a batch load or update. If you are a system engineer, you probably will
not know everything that is going on within the database.
Q How often should I commit my batch transactions?
A Check with the DBA for advice. The DBA will need to know approximately how
much data you are inserting, updating, or deleting. The frequency of COMMIT
statements should also take into account other batch loads occurring simultaneously
with other database activities.
Q Should I stripe all of my tables?
A Striping offers performance benefits only for large tables and/or for tables
that are heavily accessed on a regular basis.
Workshop
The Workshop provides quiz questions to help solidify your understanding of the
material covered, as well as exercises to provide you with experience in using what
you have learned. Try to answer the quiz and exercise questions before checking the
answers in Appendix F, "Answers to Quizzes and Exercises."
Quiz
- 1. What does streamline an SQL statement mean?
2. Should tables and their corresponding indexes reside on the same disk?
3. Why is the arrangement of conditions in an SQL statement important?
4. What happens during a full-table scan?
5. How can you avoid a full-table scan?
6. What are some common hindrances of general performance?
Exercises
- 1. Make the following SQL statement more readable.
SELECT EMPLOYEE.LAST_NAME, EMPLOYEE.FIRST_NAME, EMPLOYEE.MIDDLE_NAME,
EMPLOYEE.ADDRESS, EMPLOYEE.PHONE_NUMBER, PAYROLL.SALARY, PAYROLL.POSITION,
EMPLOYEE.SSN, PAYROLL.START_DATE FROM EMPLOYEE, PAYROLL WHERE
EMPLOYEE.SSN = PAYROLL.SSN AND EMPLOYEE.LAST_NAME LIKE 'S%' AND
PAYROLL.SALARY > 20000;
- 2. Rearrange the conditions in the following query to optimize data retrieval
time. Use the following statistics (on the tables in their entirety) to determine
the order of the conditions:
593 individuals have the last name SMITH.
712 individuals live in INDIANAPOLIS.
3,492 individuals are MALE.
1,233 individuals earn a salary >= 30,000.
5,009 individuals are single.
- Individual_id is the primary key for both tables.
SELECT M.INDIVIDUAL_NAME, M.ADDRESS, M.CITY, M.STATE, M.ZIP_CODE,
S.SEX, S.MARITAL_STATUS, S.SALARY
FROM MAILING_TBL M,
INDIVIDUAL_STAT_TBL S
WHERE M.NAME LIKE 'SMITH%'
AND M.CITY = 'INDIANAPOLIS'
AND S.SEX = 'MALE'
AND S.SALARY >= 30000
AND S.MARITAL_STATUS = 'S'
AND M.INDIVIDUAL_ID = S.INDIVIDUAL_ID;
--------------
© Copyright, Macmillan Computer Publishing. All
rights reserved.
|