Blogs

  • Browse Blogs
  • My Blog
  • My Updates

Tags Help

  • View as cloud  | list

Enterprise Messaging Notes

Blog Authors:  Dan Lynch  

Previous |  Main  | Next

Progress: 8.5.x TxL Bug and Symantec NBU Bug

Dan Lynch  |     |  Comments (1)
Through the extraordinary, timely and authoritative helpfulness of Steve Watts at IBM engineering, and some additional help from Symantec, the issues here with unexpected restore behaviors using archive-style TxL's 8.5 and Veritas net backup (so we can utilize DAOS at some  point) are much better understood, and have turned out to be issues on both sides of the vendor equation.  Steve indicated that issue had occurred for at least one other IBM customer, and a fix can be had by asking for a hot fix (spr SWAS7X5675), or when the fix is rolled into 8.5.2.  Symantec provided an updated binary which "ensures that the Domino transaction logs are left behind in NetBackup's temporary cache until the entire restore attempt is complete.  This would ensure that TXNs are not being pulled from tape over and over again".

I can't say enough good things about Steve Watt's monitoring of the community posts here and the notes.net forums and willingness to help out, respond to posts and coordinate getting the right nuggets to the right people.  This is the 2nd time in the past couple months where he went above and beyond for us, and provided true value beyond the normal support realm.

The net is we can again resume testing towards DAOS etc when time allows.

Thanks Steve!!


Comments

1 Dan Lynch      Permalink Below are the technical details shared by Liji Kuruvilla who worked with Steve Watts at Veritas on the specifics of this issue:

"I shall try to be as informative as possible re: the fix we just supplied. Should you wish to discuss this over a 10-15 min con-call, I would be happy to attend as well.

Domino recovery consists of two phases:

1. the first where Domino sequentially PLAYs the TXNs FORWARD one at a time

2. second where Domino UNDOes any transactions that were not complete/aborted


There were two issues discovered:

A. Lotus Restore is requesting the same TXN logs multiple times to complete a DB recovery

B. TXNs more than 64 log extents older were being requested by Domino during the UNDO phase to UNDO aborted transactions

Phase 1 is fine here, no problems to report

Phase 2 is where both issues A and B were discovered

I would defer to IBM re: what is the normal behavior for a DB recovery. To paraphrase my understanding, IBM does both Phase 1 and 2 for performing a recovery.

During Phase 1, Domino is receiving the TXNs from NBU sequentially increasing order. Once we give Domino a TXN log from our cache, NBU does not hold on to the TXN log in its cache. Once Domino replays the TXN log, it purges the TXN log from its log folder.

During Phase 2, Domino starts UNDOing transactions that were NOT complete. This step happens in a random order. However since the TXNs were already purged by Domino from its cache in Phase 1, NBU has to intervene and retrieve the TXNs again from tape. The pre-fetch feature in our code pre-emptively fetches all the TXNs from tape up to the chosen Point in Time for the restore operation. This may be as few as a TXN log or in really large environments, NBU support has seen Domino request upto 891 older TXNs to complete a recovery.

IBM Engineering has confirmed a bug and indicated that in the ideal case, only 64 older TXNs should be requested by the Domino Server for recovery purposes. The fix for this is available by SPR SWAS7X5675 from IBM. This resolves Issue B

As part of this UNDO operation, it has been noticed that Domino Server could possibly request the same TXN more than three to four times. Each time it requests the TXN from NBU, it purges it after processing; then re-requests it again from NBU and repeats this cycle. This results in a lot of unnecessary restores of the same TXN from NBU Tape/Disk Storage Units. Symantec has brought this issue to the attention of IBM Engineering. IBM Engineering has invited an Enhancement to be opened for fixing this problem in a future release of Domino OR a future patch release for Domino. Until this enhancement(s) gets coded into Domino and becomes generally available, NBU customers using Lotus DBs are still subject to the TXNs being requested by Domino over and over again at the time of a Restore operation.

NBU Support in the meanwhile has requested Symantec Engineering to provide an enhancement binary to alter its caching behavior. This caching behavior change will ensure that the last 64 TXNs being requested by Domino Server would stay in NBU’s cache until we get an acknowledgement from Domino Server that the DB recovery operation is complete. This would alleviate the pain-point of having to re-request the same data from tape repeatedly until Domino deems the recovery complete. This enhancement is what has been provided to Sherwin Williams via Symantec Engineering E-Track Service Request # 1876784 for Case 320-222-736.

In its current form, the EEB 1876784 ensures that ALL transaction logs requested by Domino are left behind in the cache. We are currently on the verge of a vendor meeting between our respective Engineering teams so that we can get confirmation re: leaving only the last 64 TXNs in our cache to ensure that the DB Recovery operation is not slowed down by requests for the same data from tape. Depending on the outcome of that meeting, we may have an updated binary for your environment.

Hope the explanation above helps. I would welcome any corrections/updates/comments from IBM Engineering (Steve Watts) also included on this email thread.



Thank you.






Previous |  Main  | Next
Skip to main content link. Accesskey S
IBM Lotus Connections Help Tools About

Tags

A tag is a keyword that is used to categorize an entry. To view the entries with a particular tag, click a tag name or enter a tag in the box.
The tag cloud indicates the frequency of tag use. Popular tags appear darkest. The slider control adjusts how many tags are displayed in the tag cloud.