Oracle Database Blog: Experiments & Learnings

October 24, 2012

Tape Usage monitoring using Netbackup goodies and Oracle Tools

Filed under: Oracle RDBMS — Tags: , , — Saurabh Manroy @ 10:43 pm

Quite recently I got interested in one of the unattended (not so frequent) issue.  Every month, one or the other database backup reported problems because of tapes getting full in their respective tape pools. So, for me the task was to find out:

  1. All the  tape media in the Volume pools which had critically low free space.
  2. Then check if that tape gets full, how would it impact subsequent backup job.
  3. Based on the impact, if needed, raise an alarm.

Getting the Tape capacity

We are using Netbackup(tm) software to backup data to Virtual tape library. From Netbackup server console, you can see is how many KBs have been written to the tape but it doesn’t tell the total capacity of tape. As we are using LTO3 standard to format LUNs to be used as virtual tapes, the native (uncompressed) capacity of a tape drive is 400GB. I confirmed this figure by checking all tapes that had status FULL.

Impact and Alarms

For each allocated channel in the RMAN run block, MML tries to write to a different tape media in the volume pool.  So, for parallelism to be effective, # of tape drives available to be written to should be equal to the number of channels allocated for backup. If this is not the case, RMAN would allocate n channels but effectively less than n channels will be at work at any given time.

Based on this, Alarms should be raised:

Condition 1:

-> If the number of  (active + available) tapes are less than number of allocated RMAN channels for backup. For example: If 3 RMAN channels are allocated for backup, just 2 tapes are not sufficient for effective parallelism. In such case, an alarm email can be sent to the storage/DBA group with following content.

POOL_NAME   Active Tapes in Pool
----------- --------------------
 ora-pool            2

Condition 2:

->If a tape is currently 90% full, would filling this tape to 100% capacity lead to situation mentioned above. Example:

POOL_NAME    MEDIA_ID % Full   Active Tapes in Pool
----------- --------- -------  --------------------
ora-pool     090040     90.68       3

In the above output, if 0900400 tape media gets full, this volume pool will have only 2 tapes to write to. If RMAN is allocating 3 channels to backup, backups would get affected from channel queuing.

Solution

When you install Netbackup master server software,  a successful install would create  ‘goodies’  folder under ‘bin’ directory. This directory contains many useful scripts that can be used for monitoring media server. One script that  proved to be of use is : available_media.  For the script to work , it expects Netbackup Master server and media server software on same host. The script can be run only by root user. Sample output of this script is below:

# ./available_media | more
media   media   robot   robot   robot   side/   ret    size     status
ID     type    type      #     slot    face    level  KBytes
----------------------------------------------------------------------------
Backup pool1

07002G  HCART3   TLD      0       2      -       1     66099194 ACTIVE
07002H  HCART3   TLD      0       3      -       -     -        AVAILABLE

NetBackup pool

07002F  HCART3   TLD      0       1      -       -     -        DBBACKUP

Now, given the sample output above, you can either write a shell/perl program to manipulate and raise alarms (email saurabhmanroy(at)gmail.com for the shell program) or choose a complex method like below:

  • Modify Available_media script to provide input for SQL Loader
  • Pre-create table in the database.
  • Use SQL LOADER to load Tape data into a database table
  • Query relevant information from database and then truncate the table for future use.

Output of modified available_media_ora script looks like this:

ora-pool,11002O,419249568,FULL
ora-pool,11002N,95774112,ACTIVE
ora-pool,070033,Unknown,AVAILABLE

This can be loaded to a table in a database ( for example: RMAN catalog db or Grid/cloud control repository db). A sample controlfile for sql loader will be:

LOAD DATA
INFILE '/u01/oracle/tape_check/load.txt'
INTO TABLE Tape_repository
FIELDS TERMINATED BY ','
(POOL_NAME,
Media_id,
Size_KBytes,
status)

Finally, queries to get the data of interest:

-- 1st Query
select pool_name,count(*) "Total Active Tapes"
 from tape_repository
 where Status <> 'FULL'
 and pool_name like '%ora'
 group by pool_name having count(*) < #rman_channels - 1;

-- 2nd Query

with tab as
(select *
 from tape_repository
 where POOL_NAME like '%ora%'
 and status <> 'FULL'
 and SIZE_KBYTES/1024/1024 > (400*0.9))
select a.pool_name,a.media_id,round((a.size_kbytes/1024/1024)/400*100,2) "% Full",count(b.media_id) "Active Tapes in Pool"
 from tab a , tape_repository b
 where a.pool_name=b.pool_name
 and b.status <> 'FULL'
 group by a.pool_name,a.media_id,a.size_kbytes,a.status
 having count(b.media_id) < #rman_channels;

-- replace #rman_channels by appropriate number

Create a free website or blog at WordPress.com.