8.5.1 FAIL. Your code may just break.
8.5.1 is great. Really great. Except when it decides to make your code no longer work. And especially when that code involves NotesView.GetDocumentByKey, NotesView.GetAllDocumentsByKey, NotesView.GetEntryByKey or NotesView.GetAllEntriesByKey. If you do Notes/Domino dev you've probably written something like this before: Dim v as NotesView Dim d as NotesDocument Set v = GetView("myView") Set d = v.GetDocumentByKey("something") There's probably a trillion lines of this type of code in production all over the world, and it has worked for years. Something close to 15 years (Domino 4.6?), if I recall. However in 7.0.4, 8.5FP1 and 8.5.1 and beyond this code may arbitrarily return the error "The collection has become invalid." What I typed above is LotusScript but Java (and potentially SSJS) are all affected by what I'm about to describe on both client-run and server-side code and agents. Here's what happens in all prior versions:
When the lookup executes and a doc is found it checks to see if there is a document that was modified after the view's index was last updated. If so then the collection is technically out-of-date and the lookup is re-executed again and again until it is considered up-to-date. You can change this behavior with NotesView.AutoUpdate=False but that will cause any other code (e.g. agents) running to pile up waiting for your original code to finish so the index can be updated. Beside, most code (even long-running agents) actually want the most up-to-date information for each lookup anyway.
Here's what happens now:
Everything happens the same as it used to except instead of executing the lookup again and again there is a limit of 10 tries. After 10 tries Notes/Domino throws its hands up in the air and returns the error, "The collection has become invalid."
So your code that used to run fine in prior versions might now throw an error - sometimes immediately, sometimes after thousands of successful calls for days on end. All it takes is a single doc to be updated around the time your lookup occurs. Obviously the busier your database the higher the chance you'll encounter this new error. In a nutshell, NotesView.AutoUpdate=True (the default behavior) doesn't work any more. IBM/Lotus support says the "workaround" is to trap the error and call a NotesView.Refresh() and try your lookup again. Nevermind that you've probably got 488.223E+309 lines of this code across every app you've ever written, you get to work around this mess just to have your code work the way it used to. And nevermind the fact that additional error catching is always fairly expensive performance-wise. This change was introduced by two "fixes": SPRs #AHOE7JEKWY and #AJMO7LHMK9. They are both mislabeled in the Fix List Database as "This fix prevents functions from potentially going into an infinite loop when future TimeDates exist in views and folders." Apparently what happens is you might have a document that was created in the future (caused by a very rare Domino "time creep" bug when massive amounts of new docs are created rapidly.) Then your GetDocumentByKey call would loop and loop until the server's time passed whatever the latest doc's time was at which point the view would be considered "up-to-date". So your call would eventually complete but it might take a few seconds/minutes depending on how much time creep your database experienced. If you had a doc created hours in the future then that could be seen as an "infinite" loop and is apparently why IBM decided to "fix" this problem by instead only checking 10 times and then throwing an error. No, there's no documentation update mentioning this. No, there's nothing in the Readme file about this. No, there's no INI setting to revert this new behavior back to the way things used to work. This is "how it should have worked all along" according to IBM. I asked for an enhancement request to be created to add an INI setting to control this behavior and apparently there's already other enhancement requests that are similar (including one that has Notes/Domino just do a Refresh() and attempt the lookup again). We're not the only people who have hit this timebomb and as people upgrade I'm sure we won't be the last. That error number is 4678, by the way. Have fun trapping.
|
Ratings
6
|
Comments (31)
Erik, what SPR do you have (or PMR) for the problem that you are working?
Well, this should be easy to fix. You just need to run an analysis on your source control system to find all instances of the .get*ByKey() calls and then have a macro to insert the proper error handling and then check the code back in and do a global rebuild on all your production software. Easy peasy, right?
@Ed - Email sent.
Erik,
Erik,
Since this is critical for a number of our databases, it reduces our 8.5 --> 8.5.1 migration to a "no-go" until it is resolved.
I'm not sure who is making the decisions at IBM/Lotus lately but
the reputation for Total Compatibility seems to have gone out the
window with the 8.x series of releases. I finally got my 8.5
Attachment problem addressed with a fix pack but it wasn't fixed in
a way that was particularly useful for me. I've given up and
modified several database designs instead.
The enhancement request to improve Handling of Error 4678 is being tracked as SPR OIHZ7XFKDV. To increase weight please open a PMR with Lotus Support and ask for adding an additional customer report to that SPR.
My colleague and I have spent the past two weeks fixing this bug since we upgraded our servers to 7.04, on the road to R8 (although I have only seen it affect GetDocument(s) not GetViewEntries). That's over 150 man hours of work and big delays in project rollouts for ONE SPR! I wonder if I should bill IBM?
My comment is the same as Christians. Please open a PMR to log a customer report against SPR: OIHZ7XFKDV.
Thx for bringing this matter to our attention Erik. IBM could do a great service to the Notes development community if it were to consolidate all the issues like this in a central place where they were easy to find and absorb.
Well, that explains quite a bit... I thought it was odd that we were seeing this occasionally, but not with enough frequency to actually troubleshoot.
Erik,
I'm a colleague of Steve Newton, and just want to add my comment. Thanks Erik for brining this to attention, we have been seeing these errors on production servers since we updated to 8.5.1 and as we stand have no fix to this problem, and the workaround suggested, 1) will take weeks to do, 2) will only reduce the problem not fix it.
There are currently three open SPRs around this issue.
@14 - Sounds good Chad. Please let me know when there's another update and I'll post a follow-up blog entry.