Skip to content

MAXNUMMP : rman failures and relation to tsm mount points





Well, I’d say that seems to be what is causing your intermittent failures
then.  Unfortunately, there is no "magic bullet" approach to fix this
situation — it requires cooperation of all the admins involved (TSM, DBA,
Unix, applications), and the TSM admin has the responsibility to educate
all parties about the interactions.  For example, the DBAs must be made
aware that if they set their parallelism (is that the right term?) too high
(higher than MAXNUMMP), some channels will not be able to work, and RMAN
jobs will fail.
Do NOT set MAXNUMMP higher than the number of installed drives…. that
will almost guarantee failures.  If you set it equal to the number of
installed drives, then all of those drives must be available for that node
when it wants them, or there will be failures.  It requires coordinated
scheduling.  The approach I would take is to set MAXNUMMP only as high as
that client needs to get its backup done in the time allotted…. if a
particular node MUST backup 100GB in ten minutes (as an absurd example), it
will need several drives… but if it has four hours to complete its
backup, then one drive is plenty.
Another approach, if you have it available, is to use an external scheduler
(such as Control-M) rather than the TSM scheduler.  Most enterprise class
schedulers can manage the drives as a resource pool, and will only start a
backup that needs four drives if four drives are actually available.  This
is a labor-intensive approach (initially), and it still is not fool-proof.
The approach we use, is that ALL backups go to a disk pool initially.
Nothing goes directly to tape.  Disk pools do not use the MAXNUMMP value,
so you can run as many channels and sessions as your hardware/OS/TSM can
handle.  This also eliminates the shoe-shining problem with streaming tape
drives such as LTO and DLT (or at least postpones it).   However, this does
introduce another problem, at least for TDP Oracle clients… if the disk
pool fills up, TDPO will not go to the next pool in the hierarchy like the
BA client does… it will fail.  In practice this means keeping the
migration threshold low enough on those disk pools so that there will
always be enough space available for TDPO.  Again, this requires some
careful analysis and cooperation of the TSM admin and the DBA team.
1.more discussion can be found here
 http://www.adsm.org/lists/html/ADSM-L/2006-09/msg00226.html

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Post a Comment

You must be logged in to post a comment.