Teach Yourself Oracle 8 In 21 Days
- Day 17 -
Recovering the Database
Yesterday you learned how to back up your database; today you will learn how to
recover your database in the event of an emergency. Several types of failures can
necessitate a recovery:
- An instance failure necessitates an instance recovery. As long as no permanent
hardware failures have occurred, the instance recovery will be automatic and complete.
Enabling parallel recovery improves performance.
- A hardware failure that results in the loss of a datafile necessitates a media
recovery. This involves both recovering the lost data and performing an instance
recovery (if necessary). This process will be explained in the section titled "Media
Recovery."
- A user error that results in the loss of data might necessitate a point-in-time
recovery, which allows you to recover up to a certain point before the failure occurred.
This can be very useful, but very problematic. The point-in-time recovery is detailed
in the section titled "The Point-in-Time Recovery."
No matter what type of recovery is required, the important task of recovery should
be performed as quickly as possible. If a failure requiring recovery occurs, many
users will have little to do until the recovery is complete. But even though the
recovery operation must be completed quickly, you should not rush. Any mistakes during
the recovery will only delay the resumption of normal operations.
Recovery Versus Restoration
Today you will learn about recovery and restoration operations. These two concepts
might sometimes be thought of as the same, but they are actually quite different:
- Recovery is the act of bringing the database back to where it was an instant
before the failure, and refers to the process whereby Oracle rolls forward committed
transactions and rolls back noncommitted transactions. Recovery is automatic.
- Restoration is the act of replacing a datafile with a backup copy. You
can restore a database only if it is not running in ARCHIVELOG mode and
has overwritten redo log files. Restoration is a manual operation requiring operator
intervention (either by hand or via the graphical administration tools).
Recovering Your Database
Despite how reliable hardware has become, there are still occasions when the system
might fail (perhaps because of a component failure or a power failure). If the failure
causes no data loss, Oracle can recover itself; this is referred to as an instance
recovery. If data has been lost, data must be restored from a previous backup
and recovered; this is referred to as media recovery. If some event, such
as an accidental deletion of a table, has occurred, a point-in-time recovery
must be performed to avoid recovering the table-drop statement and repeating the
mistake.
Instance Recovery
New Term: When an instance failure
occurs, the Oracle recovery process can completely recover the database upon instance
startup. All transactions that were committed at the time of the failure will be
recovered, or rolled forward, and all transactions that were in process (also
known as in-flight transactions) will be rolled back.
NOTE: Instance recovery can be quite time
consuming. How much time instance recovery takes depends on the number of dirty buffers
in the SGA. The number of dirty buffers depends on how much time has passed since
the last checkpoint and the number of data modifications.
The instance-recovery process is automatic. When the instance is started, the
startup process examines the datafiles and redo log files to determine whether the
instance was properly shut down. At this point, the redo log is read and the affected
transactions are rolled forward or back. If the checkpoint launched by the last log
switch was completed, the transactions in fewer than one log file will require recovery.
TIP: If you tune the checkpoint interval
and recovery parallelism, you can shorten the recovery time.
Users cannot access the database during the recovery process; only after the instance
recovery is complete can users access the database. For this reason, recovery time
should be kept to a minimum. Using the parallel-recovery feature of the Parallel
Query option can help you reduce the time it takes for instance recovery to be completed.
As you will see on Day 19, "Advanced Oracle Options," the number of processes
or threads that perform the instance recovery can be tuned. If you tune the number
of recovery processes to run best with your system, you can optimize the recovery
interval.
Media Recovery
In the event of a media failure (if, for example, a datafile is damaged), data
recovery is necessary. If a datafile is damaged, you must restore it from backup
before recovery can occur. The Oracle recovery process then applies archive log files
and redo log files to restore the damaged datafile to its prior-to-failure state.
This process might require the use of all archive log files created since the backup
to recover the restored datafile, which can be quite time consuming.
TIP: If you schedule frequent backups,
you can shorten restoration and recovery time. Recovery time depends both on how
much time has passed and how much data has been modified since the last backup.
By placing the latest backup files as well as any archive log files created since
the last backup online, you can shorten the time it takes to start the recovery process.
If you have a plan and have everything ready to go, things will run more smoothly.
In order to assist the recovery process, Oracle8 provides several different methods
for database recovery:
- Backup Manager
- OS facilities
- The Import utility
- The NT Recovery utility
Each of these utilities can perform an effective backup; personal preference and
your system needs dictate which one you use.
Backup Manager
As with the database-backup procedure, you can perform a recovery using Enterprise
Manager's Backup Manager utility. To do so, follow these steps:
- 1. After you invoke Backup Manager, you can start the recovery process
by selecting Recover | Restore Wizard.
2. The Restore wizard leads you through a series of screens that are similar
to those of the Backup wizard. The first screen, shown in Figure 17.1, allows you
to choose what type of restore you want to perform. Note that I have selected the
Tablespaces radio button. After you decide what type of restore you want to perform,
click Next.
Figure 17.1.
Screen one of the Restore wizard.
NOTE: The available options in this case
are Tablespaces and Datafiles. Because the instance is up and running in this example,
the database recovery option is not available. If the instance was down, Database
would be an option.
- 3. Because I chose the Tablespaces option, the Tablespaces screen (shown
in Figure 17.2) appears. From here you select what tablespaces you want restored;
you can choose as many or as few tablespaces as you want. Note that I have chosen
to restore the DOGS tablespace. After you select a tablespace, click Next.
Figure 17.2.
The Tablespaces screen of the Restore wizard.
- 4. After you select the tablespace(s) you want recovered, the Restore
wizard determines which datafiles will be restored. The Rename screen, shown in Figure
17.3, allows you to change the name of the datafile to be used in the restore. This
is valuable because the volume that contains that datafile might not be available
due to a hardware failure. If you want, you can change the name of the datafile.
When you finish, click Next.
Figure 17.3.
The Rename screen of the Restore wizard.
- 5. The Channels screen, shown in Figure 17.4, allows you to select the
channel(s) to be used during the recovery process. Depending on how many backup files
and datafiles you are working with, a larger number of channels can improve performance
by adding some parallelism. It might occasionally be necessary to change the channel
from the one used in the backup operation. After you determine what channel to use,
click Next.
Figure 17.4.
The Channels screen of the Restore wizard.
- 6. Finally, the Restore wizard presents the Summary screen, shown in Figure
17.5. If all is correct, start the restoration process by clicking OK.
Figure 17.5.
The Summary screen of the Restore wizard.
If you want to change the type of restore operation from an online to an offline
restore, you can change the state of the instance from the main screen of the Backup
Manager utility. To shut down the database, simply select the Shutdown radio button
and click Apply (see Figure 17.6).
After you select the Shutdown option, you are asked what type of shutdown you
want to perform. Your options are
- Normal--If you select the Normal option, the shutdown process waits for all users
to disconnect and then continues. During this time, no new connections are allowed.
- Immediate--If you select the Immediate option, all idle connections are disconnected,
current transactions are rolled back, and the instance is shut down.
- Abort--If you select the Abort option, the Oracle instance is immediately terminated.
A shutdown with Abort causes media recovery to be necessary.
Figure 17.6.
Change the state of the database from the main screen of Backup Manager.
- Transactional--If you select the Transactional option, currently running transactions
are allowed to complete. As soon as a transaction has finished or aborted, the connection
is terminated. During this time, no new connections are allowed.
I recommend that you always use the Immediate or Normal option. The Abort option
should be used only in the case of an emergency.
As you have seen, it is easy to back up and restore a database using Backup Manager.
Nonetheless, it is still important to test the recovery process periodically to make
sure it is working correctly.
OS Facilities
The method for restoring a datafile using OS facilities is as simple as the method
used for creating the backup using OS facilities:
- 1. With the tablespace offline, copy the datafile from the backup to the
original datafile.
2. After you bring the tablespace back online (via the ALTER TABLESPACE
command or the restarting of the Oracle instance), Oracle will realize that the datafile
has been replaced and prompt you to apply the archived log files.
Tablespace and datafile recovery can be performed only if the RDBMS is running
in ARCHIVELOG mode. Otherwise, you must restore the entire database from
the last full, offline backup. No roll forward or rollback occurs.
The Import Utility
You can use the Import utility to restore the database if and only if the Export
utility was used to create the backup. For more information about the Import and
Export utilities, see Day 9, "Managing Data."
The NT Recovery Utility
NT Recovery Manager is very similar to NT Backup Manager. NT Recovery Manager,
provided with your Oracle for Windows NT software, is not part of Enterprise Manager.
Because of this, NT Recovery Manager can recover only a local database--it cannot
provide recovery over a network. To recover a local database with NT Recovery Manager,
perform these steps:
- 1. As with NT Backup Manager, the first step in invoking NT Recovery Manager
is to supply the password for the internal account.
2. After you supply the password for the internal account and connect with
the instance, you will see NT Recovery Manager's first screen. This screen, shown
in Figure 17.7, lists the following recovery options:
- Automatic recovery
- Restore from full database backup
- Restore datafile, then do recovery (supply the name of the datafile)
- Restore control file, then do recovery
Figure 17.7.
Oracle Recovery Manager offers a variety of recovery options.
- 3. The screen you see after selecting an option and clicking Recover depends
on what option you select. In this case, I have selected the Restore datafile option
and specified the filename DATABASEDOGSDBF. Consequently, I am presented
with the Data File Recovery screen shown in Figure 17.8.
4. After you specify which datafiles to recover, NT Recovery Manager automatically
recovers them for you. If instance recovery is necessary, it is performed as well.
Figure 17.8.
The Data File Recovery screen of NT Recovery Manager.
The Log File Recovery
You cannot restore from the loss of a log file. If a log file is damaged and the
system is still functional, you can drop the log file, re-create it, and immediately
perform a full backup. The backup is necessary because you will not be able to use
the archive and redo log files for any recovery on this database. That is why I always
recommend that redo log files reside on fault-tolerant disk volumes.
NOTE: I recommend using RAID mirroring
on the redo log files to protect them in the event of a disk failure.
The Point-in-Time Recovery
The point-in-time recovery allows you to recover a database back to a specified
point in time, allowing you to recover from user errors or software problems that
caused a failure to occur. However, because of the potential side effects, the point-in-time
recovery, a somewhat complex operation, should be performed only on rare occasions.
When you recover back to a point in time, the entire database will be recovered to
that point. If a table is deleted by accident and the point-in-time recovery is being
used to recover to a point just before that deletion, you must keep in mind that
all other work to other tables that occurred after that point will be lost.
WARNING: The point-in-time recovery can
be dangerous to use, because you are restoring to a point in time in the past. Always
perform a full backup of your current database before attempting the point-in-time
recovery. A point-in-time recovery recovers the entire database to that point in
time. There is no way to recover a single tablespace or table, so all work done after
that point to other tables will be lost.
NOTE: I have mentioned the point-in-time
recovery only so that you will know it exists. If you plan to use the point-in-time
recovery, you should study the Oracle documentation and carefully plan a restore
before you attempt it.
The point-in-time recovery can be performed via Recovery Manager or via the RECOVER
administrative SQL command. In either case, the syntax for performing the point-in-time
recovery is entered through the command-line interface; there is no GUI option for
the point-in-time recovery.
What method you use to restore your data depends on what type of error necessitated
the recovery. In the event of a media failure, you must restore the datafile, then
perform the media recovery with the point-in-time recovery option. In the event of
a user error, follow these steps:
- 1. Create a temporary copy of the database.
2. Restore this temporary database from a previous backup.
3. Specify a point in time immediately before the failure, then perform
the point-in-time recovery on this temporary database.
4. Export the table that has been damaged, which now is in a state just
before the damage.
5. Import that undamaged table to the real database.
This task can be time consuming and dangerous. If you confuse the temporary and
real copies of the database, you might destroy valuable data.
Point-in-Time Recovery Using Recovery Manager
Recovery Manager can be used to perform a point-in-time recovery via its command-line
syntax. Because of the complexity of the command-line interface to Recovery Manager
and the rarity of its use, I will not go into great detail on how to use it here.
You can find very complete documentation in the Oracle Backup and Recovery manual.
When using the command-line interface to Recovery Manager, you must specify this
additional parameter:
SET UNTIL
This parameter takes the following qualifiers:
TIME date
LOG SEQ number THREAD number
SCN scn
With this syntax, you can specify the exact time you want the recovery to stop
by specifying the time, the log sequence, or the system change number. By using the
SET UNTIL parameter when performing a recovery via Recovery Manager, you
restore until the specified time.
Point-in-Time Recovery Using the RECOVER Command
When you perform a recovery with the RECOVER command, you can restore
to any point in time by using the RECOVER UNTIL syntax. The RECOVER
command has a number of options, but the important ones for the point-in-time recovery
are
- RECOVER UNTIL CANCEL
- RECOVER UNTIL TIMEdate
- RECOVER UNTIL CHANGEscn
Like Recovery Manager, the RECOVER command can be used to recover a database,
a tablespace, or a datafile, and can include a parallel clause. If you increase the
degree of parallelism, you might see a performance benefit.
Developing a Recovery Strategy
It is important that you plan your recovery strategy so that in the event of a
system failure you will be ready. Several scenarios must be planned for, depending
on the type of failure.
Instance Failure: Planning for Instance Recovery
Except for requiring you to set the degree of parallelism, the instance-recovery
process is automatic; it requires no operator intervention. Therefore, your planning
should concentrate on the parallel recovery option. Evaluate your system to determine
the degree of parallelism for the recovery process. This is set as the Oracle initialization
parameter RECOVERY_PARALLELISM.
The number of disk drives and CPUs typically determines the degree of parallelism
in your system. My rule of thumb is to use one process (thread) for every two disk
drives, up to 20. Try starting with a number like this, and adjust it to suit your
system. Because every system is different, it is difficult to make across-the-board
recommendations. If you set this number too high, you might cause additional overhead.
Hardware Failure: Planning for Media Recovery
When you plan for media recovery, you must consider not only the steps required
to recover the damaged media, but the restoration of data as well. Some of the key
items necessary for quick media recovery include
- Hardware availability--Most system vendors offer hot-swappable disk drives. A
failed disk drive need only be replaced with the same type disk in order to start
automatic recovery. For this reason, you should keep spares available.
- Backup files--Backup files should be kept until subsequent backups have been
completed and verified. If you make these files available on the system or on a network
drive, you can perform quick media recovery.
- Archived log files--Any archived log files created since the last backup should
be kept available on the system itself or on a network server until the next backup
occurs.
If downtime is not an option, consider using a fault-tolerant disk subsystem.
You can avoid considerable downtime by protecting your disk drives with a RAID disk
array.
TIP: The component in your system that
is most likely to fail is probably a disk drive. The more disk drives you have in
your system, the more likely one will fail. By protecting your disk drives with hardware
or software RAID, you can avoid considerable downtime.
Planning for Operator Error Recovery
Recovering from operator error is similar to recovering from media failure, except
with operator error recovery you might need a large amount of disk space to temporarily
recover your database. Fortunately, users can typically continue working on the permanent
database unless their duties require them to access the table(s) that was deleted
or corrupted. To recover, you need the following components:
- Backup files
- Archived log files
- Large temporary space
Summary
Today you learned how to recover from various types of system failures. You learned
how instance, media, and point-in-time recoveries work. You also learned how to recover
from an operator error. You learned the importance of planning ahead to reduce downtime;
in most businesses, downtime can be very expensive.
What's Next?
On Day 18, "Administering Oracle Replication," you will learn how to
use Oracle replication to increase performance and quicken recovery. You will also
learn about other Oracle options for quick recovery, including the standby database
option. These features do not provide fault tolerance, but do help you to come back
online quickly.
Q&A
- Q What is the difference between restoring and recovering?
A Restoring involves copying a backup file to replace a damaged datafile,
whereas recovery involves Oracle using the transaction log to roll forward transactions.
Q What is the difference between instance recovery and media recovery?
A With instance recovery, Oracle automatically rolls forward committed
transactions to bring the database up to date; media recovery requires a damaged
datafile to be replaced with a backup copy before commencing.
Q What is an in-flight transaction?
A This term is sometimes used to describe a transaction that was in the
process of execution when a failure occurred.
Q What Oracle utility is needed to restore from an export file?
A An export file can only be restored with the Import utility.
Workshop
The workshop provides quiz questions to help you solidify your understanding of
the material covered and exercises to provide you with experience in using what you've
learned. See Appendix A, "Answers," for the answers to quiz questions.
Quiz
- 1. What types of recovery operations are there?
2. What transactions are recovered from an instance failure?
3. What transactions can be recovered from media failure?
4. What is ARCHIVELOG mode?
5. What is a dirty buffer?
6. What is a checkpoint?
7. What files are necessary for instance recovery?
8. What files are necessary for media recovery?
9. What is a point-in-time recovery?
10. What is a point-in-time recovery used for?
Exercises
- 1. Using Backup Manager, restore the tablespace you backed up yesterday.
2. Using NT Recovery Manager, restore the tablespace you backed up yesterday.
3. On a small test system, use the SHUTDOWN ABORT command to abort
an Oracle instance. (Do not do this on a production system or one that others are
using.)
4. Restart the instance and observe the recovery process; try this again
with a larger degree of parallelism. (NOTE: You
must perform some transactions for recovery to have activity.)
© Copyright, Macmillan Computer Publishing. All
rights reserved.
|