Enterprise Messaging NotesBlog Authors: Dan Lynch |
Progress: 8.5.x TxL Bug and Symantec NBU Bug
Dan Lynch
|
|
Comments (1)
Through the extraordinary, timely and authoritative helpfulness of Steve Watts at IBM engineering, and some additional help from Symantec, the issues
|
A tag is a keyword that is used to categorize an entry. To view the entries with a particular tag, click a tag name or enter a tag in the box.
The tag cloud indicates the frequency of tag use. Popular tags appear darkest. The slider control adjusts how many tags are displayed in the tag cloud.
1 Dan Lynch Permalink Below are the technical details shared by Liji Kuruvilla who worked with Steve Watts at Veritas on the specifics of this issue:
"I shall try to be as informative as possible re: the fix we just
supplied. Should you wish to discuss this over a 10-15 min
con-call, I would be happy to attend as well.
Domino recovery consists of two phases:
1. the first where Domino sequentially PLAYs the TXNs FORWARD one
at a time
2. second where Domino UNDOes any transactions that were not
complete/aborted
There were two issues discovered:
A. Lotus Restore is requesting the same TXN logs multiple times to
complete a DB recovery
B. TXNs more than 64 log extents older were being requested by
Domino during the UNDO phase to UNDO aborted transactions
Phase 1 is fine here, no problems to report
Phase 2 is where both issues A and B were discovered
I would defer to IBM re: what is the normal behavior for a DB
recovery. To paraphrase my understanding, IBM does both Phase 1 and
2 for performing a recovery.
During Phase 1, Domino is receiving the TXNs from NBU sequentially
increasing order. Once we give Domino a TXN log from our cache, NBU
does not hold on to the TXN log in its cache. Once Domino replays
the TXN log, it purges the TXN log from its log folder.
During Phase 2, Domino starts UNDOing transactions that were NOT
complete. This step happens in a random order. However since the
TXNs were already purged by Domino from its cache in Phase 1, NBU
has to intervene and retrieve the TXNs again from tape. The
pre-fetch feature in our code pre-emptively fetches all the TXNs
from tape up to the chosen Point in Time for the restore operation.
This may be as few as a TXN log or in really large environments,
NBU support has seen Domino request upto 891 older TXNs to complete
a recovery.
IBM Engineering has confirmed a bug and indicated that in the ideal
case, only 64 older TXNs should be requested by the Domino Server
for recovery purposes. The fix for this is available by SPR
SWAS7X5675 from IBM. This resolves Issue B
As part of this UNDO operation, it has been noticed that Domino
Server could possibly request the same TXN more than three to four
times. Each time it requests the TXN from NBU, it purges it after
processing; then re-requests it again from NBU and repeats this
cycle. This results in a lot of unnecessary restores of the same
TXN from NBU Tape/Disk Storage Units. Symantec has brought this
issue to the attention of IBM Engineering. IBM Engineering has
invited an Enhancement to be opened for fixing this problem in a
future release of Domino OR a future patch release for Domino.
Until this enhancement(s) gets coded into Domino and becomes
generally available, NBU customers using Lotus DBs are still
subject to the TXNs being requested by Domino over and over again
at the time of a Restore operation.
NBU Support in the meanwhile has requested Symantec Engineering to
provide an enhancement binary to alter its caching behavior. This
caching behavior change will ensure that the last 64 TXNs being
requested by Domino Server would stay in NBU’s cache until we get
an acknowledgement from Domino Server that the DB recovery
operation is complete. This would alleviate the pain-point of
having to re-request the same data from tape repeatedly until
Domino deems the recovery complete. This enhancement is what has
been provided to Sherwin Williams via Symantec Engineering E-Track
Service Request # 1876784 for Case 320-222-736.
In its current form, the EEB 1876784 ensures that ALL transaction
logs requested by Domino are left behind in the cache. We are
currently on the verge of a vendor meeting between our respective
Engineering teams so that we can get confirmation re: leaving only
the last 64 TXNs in our cache to ensure that the DB Recovery
operation is not slowed down by requests for the same data from
tape. Depending on the outcome of that meeting, we may have an
updated binary for your environment.
Hope the explanation above helps. I would welcome any
corrections/updates/comments from IBM Engineering (Steve Watts)
also included on this email thread.
Thank you.