Teach Yourself Oracle 8 In 21 Days
- Day 20 -
Effectively Tuning and Optimizing the Database
Performance can be one of the most fun or most frustrating aspects of administering
a database. It is almost always extremely challenging, and forces you to use all
your experience to isolate the bottleneck and determine a response. I find performance
tuning and system optimization to be extremely rewarding and challenging; I enjoy
it, and I hope you do too.
I hate to refer to performance bottlenecks as bugs or problems because
those terms suggest something is broken. Sometimes bottlenecks occur because you
have reached the capacity of your system, and no tuning or changes can alter that
fact; other solutions must be found. The capacity planner must try to anticipate
both when the system will reach capacity and what new resources are needed before
this happens.
NOTE: Oracle instance and OS tuning should
be performed in conjunction with application tuning. Because much of the instance
tuning is based on the data-access patterns and memory usage of the application,
changes to the application might result in the opportunity to retune the instance
for greater optimization. A perfectly optimized Oracle instance cannot make up for
a poorly tuned application. Unfortunately, this book does not cover application tuning
in detail, but several good books on the subject are available. In fact, I have written
just such a book: Oracle Performance Tuning and Optimization (Sams Publishing).
Performance tuning and capacity planning are closely related. They rely on and
play off of each other. The capacity-planning stage assumes that you have a well-tuned
system, and bases most of its recommendations on this. The system-tuning stage assumes
you have a system that can support the number of users and the number of tasks being
performed. The capacity-planning and system-performance engineers often work together
to determine where the bottleneck is and how to resolve it.
What Is a Bottleneck?
New Term: The term bottleneck
refers to the component or components that have a limiting factor on the system.
Some bottlenecks result from inherent limitations of your system, but others can
be fixed via parameter tuning. Many times, additional components are needed to fix
bottlenecks. System tuning typically involves finding system bottlenecks and eliminating
them. Examples of bottlenecks that are typically found in an Oracle system include
- Disk drive bottleneck--A disk drive can handle only so many I/Os per second before
latencies increase to unacceptable levels. The systemwide I/Os per second is fixed,
so if your CPU and memory system requests I/Os more quickly than the I/O subsystem
can deliver, a disk bottleneck will occur.
- Memory bottleneck--If you have inadequate memory, you won't have a good cache-hit
rate. A high cache-hit rate indicates that most of the data you need is in the Oracle
buffer cache. This reduces the number of needed I/Os.
- CPU bottleneck--In the case of a CPU bottleneck, you are running your system
as quickly as the CPUs can run.
- Network bottleneck--A network bottleneck occurs when so much traffic is over
the network, you can't transfer data any faster. This is important to backup and
recovery.
- Bus bottleneck--A bus bottleneck occurs when your hardware cannot transfer data
across the CPU to memory or I/O bus. This is a hardware limitation.
- Other bottlenecks--Numerous other bottlenecks, either hardware, software, or
a combination of the two, can occur.
Some bottlenecks can be fixed easily, and others are difficult to overcome. Later
today you will see how to determine the location of a bottleneck and receive some
hints on how to fix bottlenecks.
NOTE: Every system is different. There
is no magic diagnostic tool that can tell you how to tune your system. It is an acquired
skill that you will develop with practice.
Performance Tuning Methodology
Determining what, if anything, is a bottleneck in your system can be difficult.
I have found that following a tuning methodology or plan can help you to reach the
correct conclusion in the shortest amount of time. By tuning methodology,
I mean a way of looking at performance issues and determining of the root of those
issues. Here is the performance tuning and optimization methodology I use:
- 1. Analyze the system--Determine whether a performance issue actually
exists. You might determine this by running transactions and checking response times
or by monitoring the system statistics. You should look at OS stats as well as Oracle
stats.
2. Determine the cause--Here you form a hypothesis about where the bottleneck
is by looking at the analysis from step 1.
3. Develop a solution or a test--If you are certain you have determined
where the bottleneck is, develop a solution for it. If not, develop a test to verify
your hypothesis. When you develop the test, you should know what the results of the
test will tell you.
4. Run the test or implement the solution--If you have hit upon a solution,
implement it. If not, run your test under carefully controlled conditions. This test
will be used to determine whether you are right or wrong.
5. Analyze the results--After the test has been run, you must analyze the
result. Always view your results with skepticism. If something is too good to be
believed, perhaps you should not believe it.
NOTE: You should know what you expect
from the test you have developed. Many times a test result can provide good data
if the result is positive, but no useful information if the result is negative. For
example, adding another disk drive will tell you that you had a disk bottleneck if
this addition increases performance, but does not tell you what the problem is if
performance does not increase. These tests are still valuable, even though only one
result gives you good data.
Although you don't need to use my system, it helps if you develop some system
for Troubleshooting.
Analyze the System
Analyzing the system gives you a good place to start. The first step in analyzing
a system is to determine whether you have a problem. You should periodically monitor
the following:
- Response times--Do users complain that the system seems slower? Periodically
ask your user community how the performance of the system seems to them; some-times
end users are reluctant to tell you about prospective problems until they reach a
critical state. It is a good idea to have a test query that you can run to compare
current response times with previous response times. If your test query takes significantly
longer to run than previously, you are seeing some kind of system slowdown.
- Throughput--Look at the system as a whole by examining the system performance
monitor (perfmon for NT, sar for UNIX). Compare current and past results. Is your
CPU utilization significantly higher than in previous samples? What about disk usage?
You will look at these in more detail later today.
- Your intuition--Sometimes the system just feels wrong. If you think something
isn't quite right, check it out. You know your system better than anyone.
After you analyze the system, you might determine that there are no performance
issues, that everything is running fine. Great! But if you do come across performance
problems, you can start determining the cause and possible solutions.
Determine the Cause
The next stage is to form a hypothesis of what you think the problem is. This
should be based on analysis of the system and what you know of the system's design
and capacity. For example, if you have a system with five 4.3GB SCSI disk drives
for datafiles, the system should be able to support up to 350 I/Os per second before
you see large disk latencies. If you are driving those disk drives at 450 I/Os per
second, you could hypothesize that you are having a disk I/O problem. This hypothesis
should be specific enough to help lead you to a solution to the bottleneck.
Develop a Solution or a Test
Develop a solution to the bottleneck or some sort of test to confirm your hypothesis.
Whether you need to develop a solution or a test depends on how certain you are about
having discovered the bottleneck and how expensive the solution is. If you are certain
you have a disk bottleneck and have more disk drives available, it is not necessary
to test your theory. But if you are not certain or think that other issues might
be involved, it is a good idea to develop a test. Be sure your test will give you
meaningful information.
WARNING: Remember the difference between
cause and effect. You might see something that looks like a disk bottleneck
when in fact a lack of memory is causing unnecessary I/O.
Anticipate Your Results
The test you devise to confirm or rule out a certain bottleneck should lead to
expected results. If you are unsure what positive and negative results would indicate,
the test is not worth running. I often run tests where a positive result is meaningful
but a negative tells me nothing. This test is still worth running because that positive
result might either solve the problem or rule out a cause. Some examples of the kinds
of testing you can do are shown later today.
Run the Test or Implement the Solution
Either run a test that will give you valuable information about the bottleneck
or try to fix the problem. Carefully monitor the system in both cases. Even if it
seems that the performance of the system has immediately and dramatically improved,
you should still monitor the system and collect some data. If you run into a performance
bottleneck in the future, this data can be valuable.
Analyze the Results
Monitor the system after a change has been made or a test has been run to determine
whether the bottleneck has been eliminated. In the case of a capacity issue, the
bottleneck might appear to have been eliminated, but the fix might be only temporary.
Carefully analyze your system so you can anticipate when you will run out of capacity
again.
You will sometimes find that the results of your test prove that your hypothesis
was incorrect. If this is the case, return to step 2 to determine the cause of the
bottleneck.
Question Your Results
Sometimes testing does not afford a valid result. This typically occurs when you
make a change that you expect to either increase or decrease performance, and neither
occurs. This indicates that one of two things has occurred:
- The bottleneck is somewhere else--If you have a bus bandwidth problem and add
more memory, there will be no improvement because you are limited by the speed of
the bus.
- The test was flawed--This could occur for any number of reasons. Perhaps you
modified the wrong parameter file, or perhaps the change requires a system reboot.
If your results are flawed, be skeptical. Check the results. It's okay if you
don't believe the results and want to try another test. Only rarely does a test turn
out to be flawed or do other problems occur.
Operating System Overview
If you understand the basic concepts of how the operating system works, you will
have a better feel for how to monitor and tune the operating system.
Windows NT
Microsoft Windows NT, a relatively new operating system, has quickly gained popularity.
Windows NT has file and print services similar to those offered by NetWare, but Windows
NT is used primarily as an application server or client operating system. Windows
NT comes in two varieties:
- NT Workstation--NT Workstation is designed for the client users and does not
contain many of the management pieces that come standard with NT Server.
- NT Server--NT Server includes the management tools necessary for maintaining
a server.
Windows NT, a multiprocessor operating system, can take advantage of scaleable
performance increases via the addition of CPUs. Windows NT is also a server operating
system; there is no facility within Windows NT to provide login functionality from
terminals. From the beginning, Windows NT was designed to support clients over network
connections. The Windows NT operating system provides functionality such as 16-bit
Windows application support and a GUI, but because of this functionality, some overhead
is associated with Windows NT.
Architectural Overview of Windows NT
New Term: Windows NT is based on a
microkernel architecture. In a microkernel architecture, the core kernel
is small because most of the OS functionality is removed from the kernel. Because
of this, the system is modular; large pieces of the OS can be easily replaced. Although
a microkernel does not provide additional performance, it does provide a flex-ibility
that OS and device-driver developers can exploit. Because of the flexible subsystem
architecture, OS changes such as file systems, hardware architecture, and memory
subsystems can easily be replaced.
In Windows NT, hardware support is provided through the Hardware Abstraction Layer
(HAL). The HAL is employed by hardware vendors to support their systems' base architectures
and to provide device-driver support. Because of the microkernel and HAL, Windows
NT can support different architectures such as Intel, DEC Alpha, MIPS, PowerPC, and
so on. The microkernel is the common code base in which each of these architectures
is rooted.
Most 16-bit applications written for Windows can run on non-Intel architectures
through the use of a compatibility mode driver. Applications written specifically
for Windows NT can be ported to non-Intel platforms with minimal effort. However,
applications that have not been written to take advantage of a particular architecture
might not run optimally. Oracle is currently supported only on the Intel architecture.
As with all Oracle ports, Oracle has been optimized for Windows NT and specifically
optimized for Windows NT on Intel platforms.
The Windows NT architecture provides for the use of threads, sometimes
known as lightweight processes. Through the use of threads instead of processes,
much of the overhead associated with process switching is reduced. Threads are automatically
created when the Oracle instance is started (see Table 20.1).
Table 20.1. Oracle service threads.
Thread number |
Oracle process |
0, 1 |
Oracle Service |
2 |
PMON |
3 |
DBWR |
4 |
LGWR |
5 |
SMON |
6 |
RECO |
A shadow thread is created on behalf of each user accessing the Oracle database.
Remember, The shadow processes communicate with the user and interact with Oracle
to carry out the user's requests. For example, if the user process requests a piece
of data not already in the SGA, the shadow process is responsible for reading the
data blocks from the datafiles into the SGA. Under Windows NT, these processes are
invoked as kernel threads. Each of these threads has its own thread number.
UNIX
UNIX, born in 1969, predates NetWare, Windows NT, and OS/2 by many years. UNIX
is different from NetWare, Windows NT, and OS/2 in that it was not designed as a
client/server operating system. That is not to say that UNIX is not now used as a
network server, but that network service was not the original intent of the operating
system.
Even though UNIX is fairly standard, a divergence in UNIX has occurred over the
years. Because almost every major computer company develops and sells a version of
UNIX, there are differences in the UNIX offerings from the different vendors.
The PC server market boasts three main UNIX versions: SCO UNIX, SCO UnixWare,
and Solaris from SunSoft. IBM calls its UNIX offering AIX. Hewlett-Packard has a
UNIX operating system called HP-UX; AT&T Global Information Solutions simply
calls its product UNIX System V. SunSoft produces versions of UNIX for both Intel
and Sun SPARC processors; its products are called Solaris and Solaris X86. Currently,
most vendors (with the exception of SCO) base their versions of UNIX on one common
core: UNIX System V Release 4, which is sometimes referred to as UNIX SVR4.
In many cases, applications are binarily compatible between operating systems
on the same platform, but it is not uncommon for the vendor to add special enhancements
in the operating system for performance. Oracle always takes advantage of OS-specific
features, even though it would be simpler to ship just one binary.
Architectural Overview of UNIX
UNIX consists of a core piece called the kernel surrounded by applications and
tools. The kernel contains all hardware support, device drivers, scheduling routines,
and the network stack. Unlike the microkernel architecture used in Windows NT, the
UNIX kernel contains all the core operating system functionality.
UNIX is a virtual-memory operating system but is very configurable. In the UNIX
operating system, not all memory is allocated and used in the same manner. Shared
memory used for the Oracle SGA is treated differently from normal memory. Shared
memory is allocated at boot time, is not available to general user processes, and
must be allocated through shared memory system calls in the operating system.
The fact that shared memory is treated differently allows certain functions to
occur. Some varieties of UNIX allow you to allocate shared memory using a 4MB memory
page size. This arrangement cuts down on page entries that must be maintained by
the operating system and guarantees larger chunks of contiguous memory. Shared memory
is also unique in that it is nonpageable. This means that if you have enough shared
memory to create the SGA at instance startup, you don't ever have to worry about
the SGA being paged out.
Other features that have been added over the years to the UNIX operating system
to improve database performance include realtime processing, high-speed semaphores,
and asynchronous I/O. Not all these features are in every implementation of UNIX.
Check with your OS vendor to see what is available for your environment.
Finding a Performance Bottleneck
Finding a performance bottleneck can be difficult. There is no proven formula
or list of things to look for that immediately tells you what the problem is. Finding
a performance bottleneck is an investigative task, and the more experience you get,
the better at it you will be.
Knowing Your System's Limitations
To determine whether a system component is prompting a performance bottleneck,
ascertain the limitations of that component when possible. Doing so allows you to
determine whether you are exceeding that limitation. Unfortunately, these limitations
are difficult to gauge in most cases, but it is possible to characterize the component
in some cases. The limitations of the I/O subsystem and in the networking components
are ascertainable.
The I/O Subsystem
Some pieces of the I/O subsystem, such as the disk drive, have been well tested
and documented over the last few years; limitations of the controller and bus are
not as well known. The disk drive is a physical piece of hardware that usually comes
with a specification. This specification tells you certain things about the disk
drive, such as
- Average seek time--This is the average time it takes for the disk arm to move
from one location to another. This relates to the number of random I/Os per second
that you can perform.
- Average track-to-track seek time--This is the time it takes to move to the next
adjacent track on the disk drive. This relates to the number of sequential I/Os per
second that you can perform.
- Transfer time--This is the time it takes to move the data from the disk to the
system bus.
- Rotations per second--This is how quickly the disk spins. This relates to how
quickly you can rotate to the next bit of data on the disk.
A typical disk drive might have an average seek time of 9 milliseconds; for random
I/Os, it takes 9 milliseconds to move to the next data. Add 1 millisecond for data
transfer to make 10 milliseconds per random I/O. Divide 1 second by 10 milliseconds
to see how many times per second you can perform a random seek:
Seeks = 1 Sec / 0.01 Sec/Seek = 100 Seeks
On Day 4, "Properly Sizing Your Database and Planning for Growth," you
saw the "knee of the curve" theory. According to that theory, you don't
want to exceed 60-70% of your available capacity; otherwise, times increase. This
is why you should not exceed 60-70 seeks per second for random I/O.
The Network Subsystem
The network subsystem is another area where you can compare your throughput against
a known maximum. Ethernet networks are available in a number of different throughput
levels. This is usually indicated in the name of the network hardware. A few of the
network hardware specifications include
- 10BaseT--This is a 10Mb/second ethernet. With this hardware you can have a peak
throughput of 10Mb, or approximately 1.25MB/second.
- 100BaseT--This is a 100Mb/second ethernet. With this hardware you can have a
peak throughput of 100Mb, or approximately 12.5MB/second.
Monitor your system to you determine whether you are close to meeting or exceeding
the available throughput. Remember that smaller packets are not as efficient, so
if you send many small requests (common in database applications), you might not
realize the full potential of the network.
A good way to determine how much network bandwidth you are using is by employing
a network monitor. Network monitors can be hardware or software devices. Most operating
systems do not provide any way to monitor the network from your system because your
network card normally passes only packets addressed to that machine. Other packets
are ignored.
Memory
Memory problems sometimes manifest themselves as other problems. For example,
a lack of memory might cause excessive I/O, which will appear as an I/O problem.
Of prime importance is the fact that the system is not paging or swapping. Paging
and swapping occur when no physical memory is available for code processing. The
CPU pages, or moves data out of memory, onto a paging area on disk. This operation
can be quite expensive. If you are seeing significant paging, add more memory to
the system or reduce the Oracle DB block buffer cache to free more memory.
WARNING: If your system is paging or swapping,
you have a severe performance problem. No amount of tuning will make up for the performance
degradation caused by this paging or swapping. If you are seeing excessive paging,
get more memory. If you are at the limit of memory that your system can support,
start looking for a new system.
Monitoring the System
After you know your limitations, you can determine whether you are exceeding them
by monitoring the system. There are several ways to monitor the system; one of the
best ways under Windows NT is with the Performance Monitor, or perfmon. On
a UNIX system, you can monitor many objects with the sar utility. Most operating
systems also offer a graphical monitoring tool. These tools are good, but are specific
to the brand of hardware you purchase. Look in your OS documentation for instructions
on how to use these tools.
Oracle provides a set of useful scripts for monitoring the performance of the
RDBMS. These scripts are called UTLBSTAT (begin) and UTLESTAT (end).
UTLBSTAT should be run first, followed shortly thereafter by UTLESTAT.
UTLESTAT displays a large number of statistic and counter values based on
the time since UTLBSTAT was run.
Using perfmon
Select Programs | Administrative Tools (Common) | Performance Monitor to invoke
perfmon. The Performance Monitor appears in chart mode, which is fine for small numbers
of items (such as a trend over an extended period of time). If you are looking at
a large amount of data, use report mode; in this mode, values are displayed onscreen
as numerical values. Invoke report mode by selecting it via the View drop-down menu
or by clicking the Report button.
WARNING: perfmon adds some CPU overhead
to the system. If you extend the report inverval, perfmon will run less often, thus
reducing CPU overhead. Increasing the report interval also increases the accuracy
of the report. I recommend setting the report interval to 15 or 30 seconds.
Looking for I/O Bottlenecks with perfmon
To use perfmon to view events in the system, you must select counters. Each of
these counters represents a different event that is monitored. To view counter information,
do the following:
- 1. Click the + button to add a new counter.
2. From the Add to Report screen, choose the PhysicalDisk object from the
Object drop-down menu, as shown in Figure 20.1.
Figure 20.1.
Choose the PhysicalDisk object.
- 3. You are presented with a number of objects from which you can select.
You can monitor many useful objects, but my opinion is that the following items are
most important in monitoring the disk I/O subsystem:
- Disk Reads/Sec--The number of reads per second to that disk or volume.
- Disk Writes/Sec--The number of writes per second to that disk or volume.
- Disk Transfers/Sec--The total number of reads and writes per second to that disk
or volume.
- Avg. Disk Sec/Read--The average time it takes for the read operation to occur.
- Avg. Disk Sec/Write--The average time it takes for the read operation to occur.
- Avg. Disk Sec/Transfer--The average time it takes for the read and write operations
to occur.
- Avg. Disk Queue Length--The average number of I/Os in the disk I/O subsystem.
NOTE: A hardware disk array will appear
to the NT system as a single disk. Divide the number of I/Os per second by the number
of disk drives to get the I/Os per second per disk drive. This is not 100% accurate
because the I/O balance may not be perfect, but it is the best that you can do.
Of particular note should be the number of I/Os per second per disk drive and
the disk latency. A typical disk drive or disk array should complete I/Os in 20-30
milliseconds (0.020-0.030 seconds). If any volume is much higher than that (more
than 0.060 seconds), you are probably seeing some sort of disk bottleneck. Consider
adding more disk drives.
If you are running on UNIX, you can get this information using sar or other monitoring
tools.
Looking for Memory Bottlenecks with perfmon
In Windows NT, the Pages/Sec counter under the Memory objects can be used to indicate
that the system is paging. If you see significant activity with this counter, you
are using too much memory. If you cannot add more memory, you should reduce the memory
used by Oracle and other processes or reduce the number of users on the system.
Using UTLBSTAT and UTLESTAT
Within the directory ORANTRDBMS80ADMIN are a number of scripts used
internally within Oracle. You ran CATPROC.ORA and CATALOG.ORA from
this directory when you created a database. UTLBSTAT and UTLESTAT
are also run from here.
UTLBSTAT sets up some internal tables and takes a snapshot of some of
the internal Oracle counters. After you run under load for a time, UTLESTAT
takes another snapshot and compares the results. UTLESTAT displays a large
number of statistic and counter values based on the time since UTLBSTAT
was run.
NOTE: The UTLBSTAT.SQL and UTLESTAT.SQL
scripts each have a CONNECT INTERNAL string at the top. Most likely, this
will not work under NT. Simply comment out this line and run the script from the
INTERNAL user account. Place the word Rem at the beginning of those
lines.
The UTLBSTAT and UTLESTAT scripts can be run from the SVRMGR30
utility. After you connect in as INTERNAL or SYS, you can run the
UTLBSTAT script with the following syntax:
@D:ORANTRDBMS80ADMINUTLBSTAT;
After you run for a while under load, you can finish monitoring Oracle and view
the results by running the UTLESTAT script with the following syntax:
@D:ORANTRDBMS80ADMINUTLESTAT;
NOTE: The usefulness of the data retrieved
from UTLBSTAT and UTLESTAT might depend on the length of time for
which the data is collected. When looking at specific events, five minutes might
be enough. When looking at long-term performance, you might need to run for hours
or more. If you are interested in a specific event such as a long-running query,
the runtime of the event will be sufficient.
The result of UTLESTAT is, in its entirety, quite long. Following is
the resulting data from UTLESTAT only; the echo of the SQL statements has
been removed:
OUTPUT:
LIBRARY GETS GETHITRATI PINS PINHITRATI RELOADS INVALIDATI
------------ ---------- ---------- ---------- ---------- ---------- ----------
BODY 0 1 0 1 0 0
CLUSTER 97 1 50 1 0 0
INDEX 0 1 0 1 0 0
OBJECT 0 1 0 1 0 0
PIPE 0 1 0 1 0 0
SQL AREA 120 .808 364 .874 0 0
TABLE/PROCED 235 .923 265 .879 0 0
TRIGGER 0 1 0 1 0 0
Statistic Total Per Transact Per Logon Per Second
---------------------------- ----------- ------------ ----------- -----------
DBWR buffers scanned 601 601 150.25 2.86
DBWR free buffers found 581 581 145.25 2.77
DBWR lru scans 35 35 8.75 .17
DBWR make free requests 35 35 8.75 .17
DBWR summed scan depth 601 601 150.25 2.86
DBWR timeouts 68 68 17 .32
SQL*Net roundtrips to/from 129 129 32.25 .61
background timeouts 210 210 52.5 1
buffer is not pinned count 2175 2175 543.75 10.36
buffer is pinned count 6343 6343 1585.75 30.2
bytes received via SQL*Net 11212 11212 2803 53.39
bytes sent via SQL*Net to c 8270 8270 2067.5 39.38
calls to get snapshot scn: 145 145 36.25 .69
calls to kcmgas 1 1 .25 0
calls to kcmgcs 10 10 2.5 .05
calls to kcmgrs 181 181 45.25 .86
cleanouts only - consistent 5 5 1.25 .02
cluster key scan block gets 801 801 200.25 3.81
cluster key scans 506 506 126.5 2.41
commit cleanouts 16 16 4 .08
commit cleanouts successful 16 16 4 .08
consistent gets 4557 4557 1139.25 21.7
cursor authentications 6 6 1.5 .03
db block changes 104 104 26 .5
db block gets 141 141 35.25 .67
enqueue releases 26 26 6.5 .12
enqueue requests 20 20 5 .1
execute count 144 144 36 .69
free buffer inspected 7 7 1.75 .03
free buffer requested 1383 1383 345.75 6.59
immediate (CR) block cleano 5 5 1.25 .02
logons cumulative 4 4 1 .02
messages received 38 38 9.5 .18
messages sent 39 39 9.75 .19
no work - consistent read g 4063 4063 1015.75 19.35
opened cursors cumulative 109 109 27.25 .52
parse count (hard) 23 23 5.75 .11
parse count (total) 120 120 30 .57
physical reads 1369 1369 342.25 6.52
physical writes 45 45 11.25 .21
recursive calls 2030 2030 507.5 9.67
redo blocks written 49 49 12.25 .23
redo entries 67 67 16.75 .32
redo size 25360 25360 6340 120.76
redo small copies 67 67 16.75 .32
redo synch writes 1 1 .25 0
redo wastage 644 644 161 3.07
redo writes 3 3 .75 .01
session logical reads 4698 4698 1174.5 22.37
session pga memory 645168 645168 161292 3072.23
session pga memory max 645168 645168 161292 3072.23
session uga memory 13772 13772 3443 65.58
session uga memory max 144296 144296 36074 687.12
sorts (memory) 85 85 21.25 .4
sorts (rows) 783 783 195.75 3.73
table fetch by rowid 2148 2148 537 10.23
table fetch continued row 2 2 .5 .01
table scan blocks gotten 1085 1085 271.25 5.17
table scan rows gotten 456 456 114 2.17
table scans (long tables) 2 2 .5 .01
table scans (short tables) 17 17 4.25 .08
total file opens 5 5 1.25 .02
user calls 139 139 34.75 .66
user commits 1 1 .25 0
write requests 6 6 1.5 .03
Average Write Queue Length
--------------------------
0
Event Name Count Total Time Avg Time
--------------------------------- ------------ ------------- -------------
SQL*Net message from client 150 0 0
SQL*Net message to client 150 0 0
control file sequential read 68 0 0
db file scattered read 137 0 0
db file sequential read 413 0 0
file identify 3 0 0
file open 9 0 0
log file sync 1 0 0
Event Name Count Total Time Avg Time
--------------------------------- ------------ ------------- -------------
control file parallel write 2 0 0
control file sequential read 2 0 0
db file parallel write 6 0 0
db file scattered read 1 0 0
db file sequential read 5 0 0
db file single write 2 0 0
log file parallel write 3 0 0
log file sync 1 0 0
pmon timer 70 0 0
rdbms ipc message 250 0 0
smon timer 1 0 0
LATCH_NAME GETS MISSES HIT_RATIO SLEEPS SLEEPS/MISS
------------------- ---------- ----------- ----------- ----------- -----------
Active checkpoint 69 0 1 0 0
Checkpoint queue l 1488 0 1 0 0
Token Manager 22 0 1 0 0
cache buffer handl 43 0 1 0 0
cache buffers chai 10389 0 1 0 0
cache buffers lru 1451 0 1 0 0
dml lock allocatio 11 0 1 0 0
enqueue hash chain 24 0 1 0 0
enqueues 62 0 1 0 0
ktm global data 1 0 1 0 0
library cache 3099 0 1 0 0
library cache load 64 0 1 0 0
list of block allo 1 0 1 0 0
messages 536 0 1 0 0
modify parameter v 4 0 1 0 0
multiblock read ob 288 0 1 0 0
ncodef allocation 3 0 1 0 0
process allocation 1 0 1 0 0
redo allocation 115 0 1 0 0
row cache objects 1318 0 1 0 0
session allocation 56 0 1 0 0
session idle bit 288 0 1 0 0
session switching 3 0 1 0 0
shared pool 1063 0 1 0 0
sort extent pool 1 0 1 0 0
system commit numb 193 0 1 0 0
transaction alloca 7 0 1 0 0
undo global data 101 0 1 0 0
LATCH_NAME NOWAIT_GETS NOWAIT_MISSES NOWAIT_HIT_RATIO
------------------- --------------- ---------------- ----------------
Token Manager 3 0 1
cache buffers chai 8643 0 1
cache buffers lru 1465 0 1
library cache 3 0 1
multiblock read ob 1 0 1
process allocation 1 0 1
row cache objects 3 0 1
vecio buf des 3 0 1
CLASS COUNT TIME
------------------ ---------------- ----------------
0 rows selected.
UNDO_SEGMENT TRANS_TBL_GETS TRANS_TBL_WAITS
UNDO_BYTES_WRITTEN SEGMENT_SIZE_BYTES XACTS
SHRINKS WRAPS
------------------ ------------------ ------------------
------------------ ------------------ ------------------
- ------------------ ------------------
0 8
0 2718 1226752
0 0 0
1 row selected.
NAME VALUE
-------------------------------------- --------------------------------------
background_dump_dest %RDBMS80%trace
compatible 8.0.0.0.0
control_files D:DATABASEctl1dogs.ora
db_block_size 2048
db_files 1024
db_name dogs
dml_locks 100
log_buffer 2048, 8192
log_checkpoint_interval 10000
max_dump_file_size 10240
processes 20, 50
remote_login_passwordfile SHARED
sequence_cache_hash_buckets 10
shared_pool_size 3500000
snapshot_refresh_processes 1
user_dump_dest %RDBMS80%trace
NAME GET_REQS GET_MISS SCAN_REQ SCAN_MIS MOD_REQS COUNT CUR_USAG
--------------- -------- -------- -------- -------- -------- -------- --------
dc_free_extents 9 9 0 0 0 27 10
dc_rollback_seg 2 0 0 0 0 7 2
dc_users 28 0 0 0 0 14 1
dc_objects 169 16 0 0 0 144 138
dc_synonyms 2 1 0 0 0 12 1
dc_usernames 128 0 0 0 0 20 2
dc_object_ids 108 5 0 0 0 104 97
TABLE_SPACE
READS BLKS_READ READ_TIME
WRITES BLKS_WRT WRITE_TIME MEGABYTES
------------------------------------------------------------------------------
---------- ---------- ---------- ----
---- ---------- ---------- ----------
DOGS
0 0 0
0 0 0 10
SYSTEMB
555 1378 0
53 53 0 26
TABLE_SPACE FILE_NAME
READS BLKS_READ READ_TIME
WRITES BLKS_WRT WRITE_TIME MEGABYTES
------------------------------ ------------------------------------------------
" WIDTH="14" HEIGHT="9" ALIGN="BOTTOM"
BORDER="0">; ---------- ---------- ---------- ----
---- ---------- ---------- ----------
DOGS D:DATABASEDOGS1.DBF
0 0 0
0 0 0 10
SYSTEMB D:DATABASESYSDOGS.ORA
555 1378 0
53 53 0 26
START_TIME END_TIME
------------------------ ------------------------
15-jun-97 16:39:39 15-jun-97 16:43:09
BANNER
----------------------------------------------------------------
Oracle8 Server Release 8.0.2.0.2 - Beta
PL/SQL Release 3.0.2.0.2 - Beta
CORE Version 4.0.2.0.1 - Production
TNS for 32-bit Windows: Version 3.0.2.0.0 - Beta
NLSRTL Version 3.3.0.0.1 - Beta
ANLYSIS: As you can see, a wealth of data collected within Oracle can be
retrieved via the UTLBSTAT and UTLESTAT SQL scripts. This data
is retrieved mainly from the Oracle V$ views. These views are the window
into the Oracle internal performance tables.
NOTE: When using a script such as UTLESTAT
where the output data is useful, employ the spool option under SVRMGR30.
This option copies the onscreen data into a file. If you specify a filename, that
name will be used. Use the following syntax:
SVRMGR> SPOOL stats;
System Tuning
I like to divide tuning into the following components:
- Software tuning--This involves both the OS and the RDBMS.
- Hardware tuning--This includes proper system design and disk layout.
- Application tuning--This involves properly designing the application and database
to take advantage of performance features and planning.
Oracle Tuning
This section focuses on things that can be tuned in the Oracle RDBMS. A complete
list of Oracle tuning parameters is given in Appendix B, "Oracle Tuning Parameters."
The SGA
The System Global Area (SGA) contains the shared pool, the redo log buffer, and
the database block buffers.
The Shared Pool
The shared pool contains the library cache, the data-dictionary cache, and the
shared session area (with the multithreaded server).
The Library Cache
The library cache contains the shared SQL and PL/SQL areas. You can improve performance
by increasing the cache-hit rate in the library cache and by speeding access to the
library cache by holding infrequently used SQL statements in cache longer.
The V$LIBRARYCACHE table contains statistics about how well you are using
the library cache. The important columns to view in this table are PINS
and RELOADS:
- PINS--The number of times the item in the library cache was executed.
- RELOADS--The number of times the library cache missed and the library
object was reloaded.
A small number of reloads relative to the number of executions indicates a high
cache-hit rate.
The Data-Dictionary Cache
The data-dictionary cache contains a set of tables and views that Oracle uses
as a reference to the database. Here Oracle stores information about the logical
and physical structure of the database.
To check the efficiency of the data-dictionary cache, check the cache-hit rate.
Statistics for the data-dictionary cache are stored in the dynamic performance table
V$ROWCACHE (the data-dictionary cache is sometimes known as the row cache).
The important columns to view in this table are GETS and GETMISSES:
- GETS--The total number of requests for the particular item.
- GETMISSES--The total number of requests resulting in cache misses.
To determine your cache-hit ratio, use the following SQL statement or look at
PINHITRATIO in the output of UTLESTAT:
SELECT SUM(reloads) "Cache Misses",
SUM(pints) "Executions",
100 * (SUM(reloads) / SUM(pins) ) "Cache Miss Percent"
FROMB V$librarycache;
This will show you the cache-miss percent:
Cache Hit % = 100% - Cache Miss %
A low number of cache misses is expected, especially during startup when the cache
has not been populated.
Shared Session Information
In a multithreaded server configuration, the session information is also stored
in the shared pool. This information includes the private SQL areas as well as sort
areas. Ensure that you do not run out of memory for this shared session information.
To determine whether you need to increase space for these shared sessions, you
can extract the sum of memory allocated for all sessions and the maximum amount of
memory allocated for sessions from the dynamic performance table V$SESSTAT.
If the maximum amount of memory used is high, it might be necessary to increase the
size of the shared pool. Because the shared pool is used for other functions as well,
such as the library cache and the data-dictionary cache, it is a good idea to increase
the size of the shared pool to accommodate this additional memory usage. If you have
enough memory in your system, increase the shared pool by the maximum amount of memory
used by the shared server processes; if you have a limited amount of memory, use
the sum of memory allocated to sessions that was obtained when an average number
of users was connected and running as a basis for the amount of memory by which to
increase the shared pool.
Database Block Buffer Cache
The most important Oracle cache in the system is probably the buffer cache. The
buffer cache makes up the majority of the Oracle SGA and is used for every query
and update in the system. The statistics for the buffer cache are kept in the dynamic
performance table V$SYSSTAT and are included in the output from UTLESTAT.
The important columns to view in this table are listed here:
- PHYSICAL READS--This indicates the total number of requests that result
in a disk access. This is a cache miss.
- DB BLOCK GET--This indicates the number of requests for blocks in current
mode. Buffers are typically retrieved in current mode for INSERT, UPDATE,
and DELETE statements.
- CONSISTENT GET--This indicates the number of requests for blocks in
consistent mode. Buffers are typically retrieved in consistent mode for queries.
The sum of the values in DB BLOCK GETS and CONSISTENT GETS represents
the total number of requests for data. The cache-hit ratio is determined using this
formula:
Cache-hit ratio = 1 - (PHYSICAL READS / (DB BLOCK GETS + CONSISTENT
GETS))
The block buffers are the most important area of the SGA and must be tuned because
of the large effect they have on the system and the number of resources they consume.
Performance Enhancements
I prefer to separate the performance-enhancement options from the general tuning
of Oracle. Performance enhancements tend to be things that might or might not help
your configuration and application; in fact, they might hurt. On the other hand,
tuning parameters always help, based on the correct interpretation of Oracle statistics.
The following sections review a few of the enhancements you have seen throughout
the book.
Block Size
Depending on your configuration and data-access patterns, you might be able to
benefit from using a larger block size. With a larger block size, you get the benefit
of less wasted space and more efficient I/O in certain conditions. Here are a few
guidelines that might help you decide whether changing the size of DB_BLOCK_SIZE
can help you:
- OLTP systems benefit from smaller blocks--If your application is OLTP in nature,
you will not benefit from larger blocks. OLTP data typically fits well in the default
block size; larger blocks unnecessarily eject blocks from the SGA.
- DSS systems benefit from larger blocks--In the DSS system in which table scans
are common, retrieving more data at a time results in a performance increase.
- Larger databases benefit from larger blocks--Larger databases see a space benefit
from less wastage per block.
- Databases with large rows benefit from larger blocks--If your rows are extremely
large (as is sometimes the case with images or text) and don't fit in the default
block, you will see a definite benefit from a larger block size.
Because unnecessarily changing the block size increases I/O overhead, this change
does carry some risk. Change the block size with caution.
Clusters
New Term: A cluster, sometimes
called an index cluster, is an optional method of storing tables in an Oracle
database. Within a cluster, multiple related tables are stored together to improve
access time to the related items. Clusters are useful in cases where related data
is often accessed together. The existence of a cluster is transparent to users and
applications; the cluster affects only how data is stored.
A cluster can be useful for tables in which data is primarily accessed together
in a join. In such situations, the reduced I/O needed to bring the additional data
into the SGA and the fact that the data is already cached can be a big advantage.
However, for situations in which the tables have a large number of INSERT
statements or the data is not frequently accessed together, a cluster is not useful
and should not be used. Do not cluster tables if full-table scans are often performed
on only one of the tables in the cluster. The additional space required by the cluster
and the additional I/O reduces performance.
Reduction of Fragmentation
New Term: Fragmentation occurs
when pieces of the database are no longer contiguous. Fragmentation can consist of
disk fragmentation or tablespace fragmentation. Both of these types
of fragmentation usually affect performance. Disk fragmentation usually causes multiple
I/Os to occur when one I/O would have been sufficient (for example, with chained
or migrated rows). Disk fragmentation can occur when the extents that comprise the
database segments are noncontiguous; this is sometimes caused by excessive dynamic
growth.
Tablespace fragmentation is caused by the dropping and creating of segments. This
can produce large free areas between segments, which results in the inefficient use
of space and excessive disk seeks over the empty areas. Tablespace fragmentation
can also prevent Oracle from taking advantage of multiblock reads.
One way to eliminate fragmentation is to export the table or tablespace data,
remove and re-create the table or tablespace, and import the data. By eliminating
fragmentation, you can reduce excessive I/Os and CPU usage, streamlining data access.
Any overhead and unnecessary I/Os you can reduce will improve system performance.
Hash Clusters
New Term: A hash cluster is
similar to a cluster except that it uses a hash function rather than an index to
reference the cluster key. A hash cluster stores the data based on the result of
a hash function. The hash function is a numeric function that determines the data
block in the cluster based on the value of the cluster key. To achieve good performance
from a hash cluster, you must meet the following criteria:
- The cluster key value must be unique.
- The majority of queries must be equality queries on the cluster key.
- The size of the table must be static (little growth occurs).
- The value of the cluster key must not change.
If you can take advantage of hashing by meeting this strict criteria, you will
see good performance. Hashing is extremely efficient under the right conditions;
however, having a hash cluster under the wrong conditions can degrade performance.
Indexes
New Term: An index, like the
index in this book, is an optional structure designed to help you achieve faster
access to your data. When optimally configured and used, indexes can significantly
reduce I/O to the datafiles and greatly improve performance. You must first decide
whether an index is appropriate for the data and access patterns in your particular
system. Having decided to use an index, you must decide which columns to index. Indexing
appropriately can greatly improve performance by reducing I/Os and speeding access
times.
Careful planning and periodic testing with SQL Trace can lead to the effective
use of indexes, with optimal performance being the outcome. Even though indexes have
been built on tables in your database, they might not necessarily be used. If the
SQL statement is not correctly formed, the index might be bypassed and a full-table
scan might be the result. This is why the application developer and the system designer
must communicate constantly throughout the development and deployment cycle. The
application should be tested and the SQL statements analyzed to ensure that indexes
are being used where intended. Indexes are covered in detail on Day 13, "Using
Indexes and Sequences."
Multiblock Reads
When performing table scans, Oracle can read more than one block at a time, thus
speeding I/Os. Consequently, a larger chunk of data can be read from the disk, thus
eliminating some disk seeks. The reduction of disk seeks and the reading of larger
blocks reduces both I/O and CPU overhead.
The amount of data read in a multiblock read is specified by the Oracle initialization
parameter DB_FILE_MULTIBLOCK_READ_COUNT. The value for this parameter should
always be set high because there is rarely any disadvantage in doing so. The size
of the individual I/O requests depends on DB_FILE_MULTIBLOCK_READ_COUNT
and DB_BLOCK_SIZE. A good value for multiblock reads is 64KB.
Multiblock Writes
Multiblock writes are similar to multiblock reads and have many of the same requirements.
Multiblock writes are available through the direct path loader as well as through
sorts and index creations. As with multiblock reads, the multiblock writes reduce
I/O and CPU overhead by writing multiple database blocks in one larger I/O operation.
The amount of data written in a multiblock write is specified by the Oracle initialization
parameter DB_FILE_MULTIBLOCK_WRITE_COUNT. The size of the individual I/O
requests depends on both DB_FILE_MULTIBLOCK_WRITE_COUNT and DB_BLOCK_SIZE.
As with multiblock reads, a good value is 64KB.
The Oracle Parallel Query Option
The Oracle Parallel Query option makes it possible for some Oracle functions to
be processed by multiple server processes. The functions affected are queries, index
creation, data loading, and recovery. For each of these functions, the general principle
is the same: Keep processing while Oracle waits for I/O.
For most queries, the time spent waiting for the data to be retrieved from disk
usually overshadows the amount of time actually spent processing the results. With
the Parallel Query option, you can compensate for this wasted time by using several
server processes to execute the query. While one process is waiting for I/Os to complete,
other processes can execute. If you are running on a Symmetric Multiprocessor (SMP)
computer, a cluster, or a Massively Parallel Processing (MPP) machine, you can take
maximum advantage of the Parallel Query option. The amount of parallelism can be
tuned with several of the Oracle initialization parameters:
- PARALLEL_DEFAULT_MAX_SCANS--This parameter specifies the maximum number
of query servers to used by default for a query. This valued is used only if no value
is specified in a PARALLEL hint or in the PARALLEL definition clause.
This parameter limits the number of query servers used by default when the value
of the PARALLEL_DEFAULT_SCANSIZE is used by the query coordinator.
- PARALLEL_DEFAULT_SCANSIZ--This parameter specifies the number of query
servers to be used for a particular table. The size of the table divided by PARALLEL_DEFAULT_SCANSIZE
determines the number of query servers, up to PARALLEL_DEFAULT_MAX_SCANS.
- PARALLEL_MAX_SERVER--This parameter specifies the maximum number of
query servers or parallel recovery processes available for this instance.
- RECOVERY_PARALLELIS--This parameter specifies the number of processes
to be used for instance or media recovery. A large value can greatly reduce instance
recovery time. A value of 0 or 1 indicates that parallel recovery will not be performed
and that recovery will be serial. A good value for this parameter is in the range
of the number of disks you have (up to 50).
I am a big fan of the Parallel Query option. I have seen great improvements from
the use of parallel queries as well as dramatic reductions in recovery time when
the parallel recovery feature is used. You saw this in yesterday's lesson, "Advanced
Oracle Options."
The Oracle Parallel Server Option
The Oracle Parallel Server option is one of the most innovative and impressive
options available from Oracle. With the Parallel Server option, you can cluster several
computers using a shared-disk subsystem and have multiple Oracle instances access
the same database. If your application is suitable, you can see good scalability
from adding computers.
The Oracle Parallel Server option uses a sophisticated locking mechanism in conjunction
with a shared-disk subsystem to allow multiple instances to access the same data.
If you have an application that can take advantage of the Oracle parallel server
architecture, you should see some good performance improvements.
The two areas that can most influence the performance of your parallel server
system are data partitioning and PCM lock management. Both of these can make a huge
difference in the performance of your system.
- Partitioning--By properly partitioning your data to reduce lock traffic and contention
for blocks between servers, you can enhance performance. Try to balance your users
so that the users accessing the same tables are on the same machine; doing so can
reduce contention for locks.
- PCM locks--By carefully managing the number of locks on each table, you can enhance
performance. Tables with a lot of traffic between nodes should have more locks than
tables with less contention. By balancing the number of locks, you can reduce overhead.
TIP: By taking advantage of read-only
tablespaces where applicable, you can reduce the number of PCM locks in your system.
Because read-only tablespaces do not allow updates, no locking is necessary.
Spin Counts
Multiprocessor environments might benefit from tuning of the parameter SPIN_COUNT.
Under normal circumstances, if a latch is not available, the process sleeps and then
wakes up to try the latch again. If you are on a multiprocessor system, it is likely
that the process holding the latch is currently processing on another CPU and will
be finished in a short time. If you set SPIN_COUNT to a value greater than
0, the process spins while counting down from SPIN_COUNT to 0. If the latch
is still not available, the process goes to sleep.
Setting SPIN_COUNT can hurt performance if you're not careful. This parameter
should be set only for multiprocessor computers and should be monitored for effectiveness.
A good value to try is 2,000. The value of SPIN_COUNT specifies how many
times the process will spin before putting itself to sleep. Because the speed of
processors varies, the time it takes to spin also varies; however, the speed of the
process holding the desired resource will also vary with the speed of the processor.
OS Tuning
OS tuning is specific and depends on the OS you are running. I will present some
of the basics on tuning Windows NT and UNIX here.
NT Tuning Considerations
With Windows NT, probably the most important tuning consideration is ensuring
that sufficient physical memory is available to run Oracle and the required number
of server processes. Windows NT uses virtual memory, which means that Oracle and
user processes can allocate an almost unlimited amount of memory through paging.
If you are not careful, you might overconfigure the amount of virtual memory that
you are using and exceed the amount of physical memory in the system. If this occurs,
the system will begin paging and performance will be severely degraded.
In addition to the concern about memory, other areas of importance in tuning the
Windows NT system include the network and I/O subsystems and the reduction in OS
overhead.
Memory
Windows NT uses only virtual memory. Because of this, all memory in the system
is treated equally, limited only by the size of the paging file; you must be wary
of user processes consuming large amounts of memory and causing paging out the SGA.
In Windows NT, there are no limitations on memory used for user processes, disk caching,
print caching, and so on; thus, it is best to dedicate the Windows NT server to either
file and print services or to application services.
NOTE: The amount of virtual memory with
Windows NT is limited by the size of the paging file. If the paging file is too small,
you will get warning messages and be forced to resize it and reboot.
The size of one process's virtual memory, also known as the working set size, is
limited to 2GB under Windows NT 4.0 and earlier, and 3GB with the Windows NT Enterprise
Edition.
If you use asynchronous I/O (AIO) and Oracle can lock down memory for AIO, the
database block buffers are not swappable. Monitor the system on a regular basis to
ensure that no paging is occurring in the system.
Reduce Unnecessary Memory Usage
You can take some measures to reduce the amount of memory used by Windows NT.
Invoke the Control Panel's Network Settings screen and choose the Maximize Throughput
for Network Applications option. This optimizes the server memory for network applications,
reducing some of the file-system caching and overhead in memory management. Also
remove any network protocols not needed by the system to cut down on system overhead
and memory usage via the Control Panel.
Also use the Control Panel to turn off any services you are not using. Doing so
reduces memory usage and CPU overhead. By reducing all the unnecessary services,
you can increase the performance of the system.
SGA Tuning
To maximize performance, allocate as much memory as possible to the SGA. Use the
techniques discussed on Days 9, "Managing Data," and 10, "Administering
User Accounts," to determine whether the memory can best be used for the shared
pool or for database block buffers. Because the SGA resides in virtual memory, it
is important that you not allocate so much memory for the SGA that the system pages.
The overhead of paging and swapping overshadows any benefit you might gain from a
larger SGA. Remember to save memory for the user processes. You should frequently
monitor your system to ensure that no paging occurs at any time.
User Capacity
You can easily determine the amount of memory necessary for your application on
a per-user basis. Start Oracle and note the amount of available memory by using the
Windows NT Performance Monitor. Monitor the Available Bytes option under Memory in
the Performance Monitor. After users begin accessing the application in a typical
manner, record the amount of memory again. Take the difference and divide this result
by the number of users accessing the application. Multiply this per-user memory usage
value by the maximum number of users who might be connected to the application to
determine the amount of memory you must reserve for user connections. Be sure to
add a little extra memory just in case. Be careful that a user's PGA does not consume
too much system memory.
To increase the number of Oracle connections, you might also have to adjust the
Oracle initialization parameter PROCESSES. This parameter should reflect
the maximum number of user connections you expect to have plus the Oracle background
processes. You should also include some extra processes for administrative tasks.
Network
Minimal network tuning is necessary with Windows NT. However, you can enhance
performance by prioritizing the network bindings. Do so the with the Control Panel's
Network Configuration utility. If you put the network protocol you use most frequently
first on the list, that protocol gets highest priority. Place other network protocols
in the list in the order in which you use them. Removing any protocols you do not
use can also boost performance by reducing memory consumption and CPU overhead. Also
be sure that the system is configured as a server in the Network Configuration
screen.
I/O Subsystem
It is important that performance not be bound by physical I/O rates. If you ensure
that the system disks are not driven past their limits, you can avoid this. This
is covered in detail on Days 14, "Using Oracle Clusters, Stored Procedures,
and Database Links," and 15, "Managing Job Queues and Using Oracle Auditing."
With Windows NT, asynchronous I/O (AIO) is always enabled. There is no need to
adjust any tuning parameters to ensure that AIO is enabled.
The default block size for Oracle on Windows NT is 2,048. This might be sufficient.
If you have a large database or if the data access is primarily sequential, you might
want to increase DB_BLOCK_SIZE to 4,096. Although the data blocks take up
more space in the SGA, the number of I/Os performed is significantly reduced. It
doesn't take much more overhead and time to retrieve 4KB of data from the disks than
it does to get 2KB of data.
If data access is primarily sequential, you might benefit by setting the DB_BLOCK_SIZE
parameter to 8,192. Because sequential access to the database will read the next
block anyway, larger block sizes will read that data into the SGA before you need
it. If your data access is random, you will waste space in the SGA. The value you
choose for the block size affects performance either for better or worse. If you
are unsure, leave the parameter at the default of 2,048.
When creating Windows NT file systems, you have several choices:
Although each of these performs well in certain situations, I recommend using
NTFS. NTFS provides you with the best level of overall performance and is the file
system that Microsoft endorses.
UNIX Tuning Considerations
As is true for NT and UNIX, the most important tuning consideration is ensuring
that sufficient memory is available to run Oracle and the required number of server
processes. Because UNIX is a virtual memory operating system, you can always start
more server processes; but if you are not careful, you might begin paging.
Many varieties of UNIX have made available extended features such as the Post-Wait
Semaphore and asynchronous I/O to enhance Oracle performance. Other areas of importance
in tuning the UNIX system include the network and I/O subsystem.
Memory
Memory is important in the UNIX operating system. As you have seen in earlier
lessons, the way to increase performance is to maximize the use of the SGA to avoid
costly disk I/Os. In UNIX as in all operating systems, it is important to avoid paging.
Because the SGA is in shared memory in UNIX, the SGA is guaranteed not to page--but
the server processes can page if too much memory is being used.
It is important to allocate enough shared memory to accommodate the SGA. If you
do not have enough memory to create the SGA, the instance does not start. If this
happens, you must either configure the operating system to allow more shared memory
or reduce the size of the SGA. If you are using enhanced 4MB shared memory pages,
you can allocate the amount of shared memory only in 4MB units.
Be careful not to allocate so much shared memory for the SGA that user and server
processes page. Be sure to periodically monitor the system to ensure that no paging
is occurring. To maximize memory used by the SGA, reduce all unnecessary memory used
by the operating system and limit the amount of memory that can be allocated by users.
Reduce Unnecessary Memory Usage
One of the best ways to free memory for Oracle is to siphon from memory used by
the file-system buffers. These buffers are used by the OS to cache data. A significant
performance increase can incur when files that have been cached are accessed. But
Oracle uses a feature called synchronous I/O to ensure that writes to the
disk are not returned until the data has actually been written. Because Oracle must
guarantee that I/Os have been written to the disk to ensure data integrity, OS disk
write caching cannot be performed.
By default, the number of file-system buffers is determined by the amount of memory
in the system. Because Oracle bypasses the disk cache on writes and uses the SGA
for reads, you don't really need a large number of file-system buffers. By reducing
this number, you might see a slight performance decrease with OS operations, but
any additional memory allocated to the SGA increases Oracle performance. The parameters
used to tune the file-system buffers is fairly vendor dependent; this should be documented
in your OS manuals.
If you reduce the unnecessary memory used for disk caching, more memory can be
allocated to the SGA. Don't reduce the disk cache so far that it is difficult to
run OS commands and access Oracle parameter files. Do not set the disk cache buffers
to less than 600. These values represent the number of 512 byte blocks.
SGA Tuning
To maximize performance, allocate as much memory as possible to the SGA. Use the
techniques discussed earlier today to determine whether memory is best used for the
shared pool or for databases block buffers.
In UNIX, the shared memory area used by Oracle for the SGA is usually contiguous.
However, if you have multiple instances of Oracle that have started and stopped several
times, the shared memory area might no longer be contiguous. If you use 4MB pages,
you are guaranteed to have at least 4MB of contiguous memory.
The amount of memory allocated for shared memory is the product of two OS-tunable
parameters: SHMMAX and SHMSEG. SHMMAX specifies the maximum
size of a shared memory segment; SHMSEG specifies the maximum number of
shared memory segments available in the system. Applications are responsible for
allocating only the amount of shared memory they need and so do not waste space.
In SCO UNIX and UnixWare, the parameters SHMMAX and SHMSEG are
located in the UNIX parameter file /ETC/CONF/CF.D/STUNE. In Solaris, the
shared memory parameters are set in the file /ETC/SYSTEM. Your OS administrator's
manual should have more information about setting system parameters.
NOTE: It is much more efficient for Oracle
to have one large shared memory segment than several smaller ones. Therefore, SHMMAX
should have a value larger than the size of the SGA.
Most varieties of UNIX that run on the Intel platform have a variety of 4MB pages.
By referencing your OS vendor-specific documentation, you can determine how to tune
this.
The use of 4MB pages can improve performance dramatically, especially if you are
running on an Intel Pentium processor.
NOTE: In UNIX implementations that use
4MB pages, there is usually a threshold at which 4MB pages are used. This threshold
is approximately 3MB. If you allocate a 2.5MB shared memory segment, you do not get
a 4MB page.
The amount of memory allocated to a single user must also be tuned in the UNIX
operating system. Because Oracle is treated the same as any other user, you must
allocate enough memory for Oracle to use for the SGA and the server processes. By
referencing your OS-specific documentation, you can see how to configure this.
Ensure that sufficient memory is available so that Oracle can allocate for the
SGA. Remember to save memory for the user processes as well. You should frequently
monitor your system to ensure that no paging is occurring at any time.
User Capacity
You can easily determine the amount of memory necessary for your application on
a per-user basis. Start Oracle and note the amount of available memory available
with the UNIX utility sar -r. The output from sar -r consists of freemem (free memory
pages) and freeswp (free swap pages). The value given in the freemem column is the
number of 4KB pages available. When users begin accessing the application in a typical
manner, record the amount of memory again. Take the difference and divide this result
by the number of users accessing the application. This value is the per-user memory
usage. Multiply this value by the maximum number of users who might connect to the
application to determine the amount of memory you must reserve for user connections.
Be sure to leave a little extra memory just in case.
The size of a user's PGA is not bound by any initialization parameters. Because
of this, be careful that a user's PGA does not consume too much system memory.
The UNIX operating system parameters MAXUP and NPROC must also
be set to allow a sufficient number of users to connect. Remember that when users
connect to Oracle, an Oracle shadow process is created under the Oracle user ID.
Therefore, you must increase not only the number of processes systemwide, but also
the per-user process limits.
The per-user process limit is set with the OS parameter MAXUP. The maximum
number of processes systemwide is set by the OS parameter NPROC. Both values
are in the stune file. NPROC should be at least 50 greater than
MAXUP to account for OS processes.
To increase the number of Oracle connections, you might also have to adjust the
Oracle initialization parameter PROCESSES. The PROCESSES parameter
should reflect the maximum number of user connections you expect to have plus the
Oracle background processes. You should also include some extra processes for administrative
tasks.
Network
With UNIX, the amount of network tuning is usually minimal. Typically, no network
tuning is required, and the network tuning that is required is vendor specific. Refer
to your OS documentation for any networking tuning that is required.
I/O Subsystem
As with all other operating systems described today, it is important to ensure
that performance is not bound by physical I/O rates. Be sure that random I/Os do
not exceed the physical limitations of the disk drives. Refer to Days 14 and 15 for
details.
With UNIX, you have the choice of using the UNIX file system for your data storage
or the RAW device interface. This choice is not always an easy one. The RAW device
interface is more difficult to manage but provides a higher level of performance.
File-system files are much easier to use but have more overhead associated with them.
File System
Using the UNIX file system is easier than using RAW devices. Using file-system
files, Oracle simply creates the file. However, when using the file system, Oracle
must contend with the UNIX disk-caching system and use synchronous writes to ensure
that the write request does not return to the DBWR or LGWR before it has actually
written the data to disk. With the UNIX file system, there is also the overhead of
the data being read into the UNIX disk cache and then being copied to the SGA. This
arrangement causes additional overhead on reads. Finally, when you use the file system,
you are not guaranteed to have contiguous blocks on the disk--in fact, you are almost
guaranteed not to have contiguous blocks.
RAW Device Interface
The RAW device interface allows for the least amount of overhead you can achieve
with UNIX I/Os. When using the RAW device interface, UNIX simply assigns a section
of the disk to each RAW device. This portion of the disk is contiguous; accesses
to it bypass all disk caching and file-system overhead.
RAW devices are not as easy to manage because each device is considered one big
chunk of data for the operating system. Backup and recovery must be handled slightly
differently because file copies do not work, and the size of the raw device cannot
be changed after it is created. Backup operations must be done via the UNIX DD
command or by a third-party backup utility that supports raw devices.
RAW devices give greater performance with less overhead and are fully supported
by Oracle. Whether you use RAW devices is a decision you must make based on ease
of use and increased performance.
Asynchronous I/O
With UNIX, AIO is not always enabled. It is necessary to enable AIO in both the
OS and in Oracle. By using AIO, the DBWR can manage many I/Os at once, eliminating
the need for multiple DBWR processes. List I/O allows the DBWR to pass to the OS
a list of AIO commands, reducing the number of calls it must make.
NOTE: If asynchronous I/O is not available
to you, you can compensate by adding DBWR processes. You should have one or two DBWR
processes per data disk. Use the parameter DB_WRITERS to increase the number
of DBWR processes.
For some UNIX implementations, the following Oracle initialization parameters
for asynchronous I/O should also be set:
- USE_ASYNC_IO--This parameter tells Oracle that the DBWR should use asynchronous
I/O. Set it to TRUE.
- LGWR_USE_ASYNC_IO--This parameter tells Oracle that the LGWR should
use asynchronous I/O. Set it to TRUE.
You should always use asynchronous I/O (if possible). When you use asynchronous
I/O, you can keep the number of DBWR processes to one and therefore reduce process
overhead.
Hardware Tuning
The I/O system should be designed and implemented with the following goals in
mind:
- Isolate sequential I/O--By isolating sequential I/O so that it is purely sequential
to the disk, you can greatly enhance throughput. Any random I/O to these disks degrades
performance. Writes to the redo log are sequential.
- Spread out random I/O--Random I/O performance can be increased by adding more
disk drives to the system and spreading out the I/Os among the disks. I/Os to the
datafiles are typically random (especially in OLTP systems).
If you follow these guidelines and plan your system so that your disk drives can
support the amount of disk I/O demanded of them, I/O should not be a problem.
Application Tuning
One of the most common areas that performance can be enhanced is in the application
itself. SQL statements can often be redesigned to dramatically improve performance.
I recently worked on a project where one of the engineers redesigned the SQL statements
of a set of queries that took more than four hours to run to more effectively use
indexes; after the engineer made the changes, the queries took less than 10 minutes
to run.
Here are a few things that can be done to improve the application itself. My other
book, Oracle Performance Tuning and Optimization (Sams Publishing) covers
these topics in great detail.
- Use EXPLAIN PLAN--By using EXPLAIN PLAN, you can analyze the
execution plan the optimizer has chosen for the SQL statement.
- Use SQL*Trace--By using SQL*Trace, you can analyze the execution of the SQL statement
and determine any potential bottlenecks.
- Tune SQL statements--You can improve performance by configuring SQL statements
to take advantage of such things as indexes, clusters, and hash clusters.
- Use the Oracle optimizer--Take advantage of the optimizer to improve performance.
- Use procedures and packages--Use procedures, functions, and packages to improve
performance.
- Provide for data integrity--Consider the importance of data integrity and how
to optimally provide for it.
- Use hints--Use hints to take advantage of information you know about the data
and application.
If you tune the SQL statements and application to take advantage of indexes, clusters,
and other features, you should see significant improvement in performance.
Summary
Today you looked at various ways of determining whether some sort of performance
bottleneck exists, and if one does, how to isolate that bottleneck and provide solutions.
The lesson began by defining the term bottleneck. You were also presented
with my performance tuning methodology. If you follow these steps (or steps of your
own), you should be able to find the bottleneck or at least find a way to work around
it.
Today's lesson also provided a brief overview into the operation and function
of your OS. Next, you saw a way of looking at the system in terms of performance.
How you monitor the system was also shown today. Much can be learned from the NT
Performance Monitor and sar, and much can be learned from Oracle itself using UTLBSTAT
and UTLESTAT. With these tools, you can narrow the focus of your performance
investigations and determine potential changes to your system.
What's Next?
Tomorrow's lesson is the last of the 21 days of instruction on the ins and outs
of Oracle8. This book finishes up by teaching you the specifics of the Oracle8 networking
system. You will learn how to configure SQL*Net and use the Oracle Name Server. Also,
you will learn how to configure and use the Oracle Web server. Other miscellaneous
topics will also be covered in the final lesson of the book.
Q&A
- Q What is a bottleneck?
A The term bottleneck refers to a component in the system that is
slowing down the whole system.
Q What type of operating systems are NT and UNIX?
A Windows NT and UNIX are virtual memory operating systems. This means
that the memory used by a single process can be larger than the physical memory in
the system. When more memory is needed than is available, some of the memory is paged
out.
Q What sort of things are important when Troubleshooting
a performance bottleneck?
A There are many important things to keep in mind, but I feel that of primary
importance is that you follow a methodology. By using some sort of method of Troubleshooting, you can develop tests that can help you resolve
the issues involved.
Q Should you always believe your results?
A No. Be skeptical. If something doesn't seem right, try another test.
Workshop
The workshop provides quiz questions to help you solidify your understanding of
the material covered and exercises to provide you with experience in using what you've
learned. The answers to the quiz questions can be found in Appendix A, "Answers."
Quiz
- 1. Name some areas that could be potential bottlenecks.
2. What is the best tool to use to look at the Oracle shared pool, cache-hit
ratio?
3. How many random I/Os per second per disk drive are recommended before
latencies start to increase?
4. How many sequential I/Os per second per disk drive are recommended before
latencies start to increase?
5. What is disk latency?
6. How do you calculate the buffer cache-hit ratio?
7. How do you know whether the shared pool is large enough?
8. What two factors make an index effective?
9. Which is run first: UTLBSTAT.SQL or UTLESTAT.SQL?
10. Why might you use a large block size?
Exercises
- 1. Turn on diskperf and reboot your system. Use the Performance Monitor
to look at physical disk statistics.
2. Use UTLBSTAT.SQL and UTLESTAT.SQL and analyze the
results.
3. Check out your system's block buffer cache-hit ratio.
4. Determine the cache-hit ratio on your data-dictionary cache and your
library cache.
© Copyright, Macmillan Computer Publishing. All
rights reserved.
|