• Browse Blogs
  • My Blog
  • My Updates

+Tags Get help with tags?

  • View as cloud  | list

+ Similar Entries

photo

Make the Fault Repor...

Blog:  The Notes Guy...
David Hablewitz
Updated 
RatingsRatings 1     No CommentsComments 0
photo

A lesson in Poker an...

Blog:  The Notes Guy...
David Hablewitz
Updated 
No RatingsRatings 0     No CommentsComments 0
photo

Re: What is in your ...

Blog:  Lotus Nut
Chris Whisonant
Updated 
No RatingsRatings 0     CommentsComments 1
photo

Caution if you use t...

Blog:  The Domino Ef...
Chris Hudson
Updated 
No RatingsRatings 0     No CommentsComments 0
photo

Secure Webservices

Blog:  Urs Meli
Urs Meli
Updated 
No RatingsRatings 0     CommentsComments 2

+ Blog Authors  

8.5.1 FAIL. Your code may just break.

Erik Brooks |   | Tags:  domino code notes programming agents lotus java ibm lotusscript | Comments (31)  |  Visits (6,576)
 
8.5.1 is great. Really great. Except when it decides to make your code no longer work. And especially when that code involves NotesView.GetDocumentByKey, NotesView.GetAllDocumentsByKey, NotesView.GetEntryByKey or NotesView.GetAllEntriesByKey.
 
If you do Notes/Domino dev you've probably written something like this before:
 
Dim v as NotesView
Dim d as NotesDocument
Set v = GetView("myView")
Set d = v.GetDocumentByKey("something")
 
There's probably a trillion lines of this type of code in production all over the world, and it has worked for years. Something close to 15 years (Domino 4.6?), if I recall.  However in 7.0.4, 8.5FP1 and 8.5.1 and beyond this code may arbitrarily return the error "The collection has become invalid."
 
What I typed above is LotusScript but Java (and potentially SSJS) are all affected by what I'm about to describe on both client-run and server-side code and agents.
 
Here's what happens in all prior versions:
 
When the lookup executes and a doc is found it checks to see if there is a document that was modified after the view's index was last updated. If so then the collection is technically out-of-date and the lookup is re-executed again and again until it is considered up-to-date.
 
You can change this behavior with NotesView.AutoUpdate=False but that will cause any other code (e.g. agents) running to pile up waiting for your original code to finish so the index can be updated. Beside, most code (even long-running agents) actually want the most up-to-date information for each lookup anyway.
 
Here's what happens now:
 
Everything happens the same as it used to except instead of executing the lookup again and again there is a limit of 10 tries. After 10 tries Notes/Domino throws its hands up in the air and returns the error, "The collection has become invalid."
 
So your code that used to run fine in prior versions might now throw an error - sometimes immediately, sometimes after thousands of successful calls for days on end. All it takes is a single doc to be updated around the time your lookup occurs. Obviously the busier your database the higher the chance you'll encounter this new error. In a nutshell, NotesView.AutoUpdate=True (the default behavior) doesn't work any more.
 
IBM/Lotus support says the "workaround" is to trap the error and call a NotesView.Refresh() and try your lookup again. Nevermind that you've probably got 488.223E+309 lines of this code across every app you've ever written, you get to work around this mess just to have your code work the way it used to. And nevermind the fact that additional error catching is always fairly expensive performance-wise.
 
This change was introduced by two "fixes":  SPRs #AHOE7JEKWY and #AJMO7LHMK9. They are both mislabeled in the Fix List Database as "This fix prevents functions from potentially going into an infinite loop when future TimeDates exist in views and folders." Apparently what happens is you might have a document that was created in the future (caused by a very rare Domino "time creep" bug when massive amounts of new docs are created rapidly.) Then your GetDocumentByKey call would loop and loop until the server's time passed whatever the latest doc's time was at which point the view would be considered "up-to-date". So your call would eventually complete but it might take a few seconds/minutes depending on how much time creep your database experienced. If you had a doc created hours in the future then that could be seen as an "infinite" loop and is apparently why IBM decided to "fix" this problem by instead only checking 10 times and then throwing an error.
 
No, there's no documentation update mentioning this.
No, there's nothing in the Readme file about this.
No, there's no INI setting to revert this new behavior back to the way things used to work. This is "how it should have worked all along" according to IBM.
 
I asked for an enhancement request to be created to add an INI setting to control this behavior and apparently there's already other enhancement requests that are similar (including one that has Notes/Domino just do a Refresh() and attempt the lookup again). We're not the only people who have hit this timebomb and as people upgrade I'm sure we won't be the last.
 
That error number is 4678, by the way. Have fun trapping.
 

RatingsRatings 6

Comments (31)

photo
1 Ed Brill commented   Permalink No RatingsRatings 0

Erik, what SPR do you have (or PMR) for the problem that you are working?

photo
2 Nathan T Freeman commented   Permalink No RatingsRatings 0

Well, this should be easy to fix. You just need to run an analysis on your source control system to find all instances of the .get*ByKey() calls and then have a macro to insert the proper error handling and then check the code back in and do a global rebuild on all your production software. Easy peasy, right?

photo
3 Erik Brooks commented   Permalink No RatingsRatings 0

@Ed - Email sent.

photo
4 Brent Henry commented   Permalink No RatingsRatings 0

Erik,

 
 
Thanks for posting this. We've seen this error showing up every few days in the logs when running scheduled agents and I had no idea what the issue was.
 
 
It wasn't causing a serious problem in our case because the agents would run correctly an hour later (Which would have made it nearly impossible to debug)

photo
5 Chad Scott commented   Permalink RatingsRatings 1

Erik,

 
 
I'm familiar with this issue. Drop me a line at chads@us.ibm.com.

photo
6 Gavin J Bollard commented   Permalink No RatingsRatings 0

Since this is critical for a number of our databases, it reduces our 8.5 --> 8.5.1 migration to a "no-go" until it is resolved.

 
 
I'm not sure who is making the decisions at IBM/Lotus lately but the reputation for Total Compatibility seems to have gone out the window with the 8.x series of releases. I finally got my 8.5 Attachment problem addressed with a fix pack but it wasn't fixed in a way that was particularly useful for me. I've given up and modified several database designs instead.

photo
7 Christian Brandlehner commented   Permalink No RatingsRatings 0

The enhancement request to improve Handling of Error 4678 is being tracked as SPR OIHZ7XFKDV. To increase weight please open a PMR with Lotus Support and ask for adding an additional customer report to that SPR.

photo
8 Andy C Dempster commented   Permalink No RatingsRatings 0

My colleague and I have spent the past two weeks fixing this bug since we upgraded our servers to 7.04, on the road to R8 (although I have only seen it affect GetDocument(s) not GetViewEntries). That's over 150 man hours of work and big delays in project rollouts for ONE SPR! I wonder if I should bill IBM?

 
 
This is not the first time QC @ IBM has let things slip - anyone remember the fun trying to get a database with a script library to recompile in 6.5.5 when it worked in 6.5.4?
 
 
It is safe to say that it has affected every single one of our systems and my faith in upgrading to R8 has been shaken.

photo
9 Simon O'Doherty commented   Permalink No RatingsRatings 0

My comment is the same as Christians. Please open a PMR to log a customer report against SPR: OIHZ7XFKDV.

 
 
Also here is the related tech note in relation to that SPR.
 
 
Title: Error 4000: %a's Certification Log/Error 4678: The collection has become invalid
 
Doc #: 1396849
 
URL: http://www.ibm.com/support/docview.wss?rs=899&uid=swg21396849
 
 
 
The hotfix mentioned in that Tech Note only lowers the chances of the message appearing.

photo
10 Peter Presnell commented   Permalink No RatingsRatings 0

Thx for bringing this matter to our attention Erik. IBM could do a great service to the Notes development community if it were to consolidate all the issues like this in a central place where they were easy to find and absorb.

photo
11 John L James commented   Permalink No RatingsRatings 0

Well, that explains quite a bit... I thought it was odd that we were seeing this occasionally, but not with enough frequency to actually troubleshoot.

 
 
I've opened a PMR against that SPR OIHZ7XFKDV as well.

photo
12 Steven Newton commented   Permalink No RatingsRatings 0

Erik,

 
 
Thanks for your response on the forum. We have also raised a PMR about this issue (83402,019,866), and have been advised it the call should be linked to PMR 18734442000 and SPR CSCT836HFL.
 
 
Please Ed, Chad and however else at IBM is looking at this can we have a hot fix.
 
 
Also could I ask that a response form IBM is also added to the forum
 
 
http://www-10.lotus.com/ldd/nd85forum.nsf/DateAllThreadedWeb/c503f82a40dbf3b1852576d4004f6fad?OpenDocument
 

photo
13 Simon Lamb commented   Permalink No RatingsRatings 0

I'm a colleague of Steve Newton, and just want to add my comment. Thanks Erik for brining this to attention, we have been seeing these errors on production servers since we updated to 8.5.1 and as we stand have no fix to this problem, and the workaround suggested, 1) will take weeks to do, 2) will only reduce the problem not fix it.

 
 
IBM, you have broken Domino and our application - that get*ByKey now errors under certain conditions is completely unacceptable - as is the fact that the SPR covering this is marked as closed with no indication of fix.
 
 
We have to have this fixed as a matter of urgency

photo
14 Chad Scott commented   Permalink No RatingsRatings 0

There are currently three open SPRs around this issue.

 
 
1. AJMO7LHMK9 fixes a problem in which future-dated views and folders can lead to an effective hang condition as the code loops until real-world time catches up to the date on the view/folder, which can be months or years in the future. This fix has introduced the problems identified in this post. It is fixed in 802FP2, 85FP1, and 851.
 
 
2. OIHZ7XFKDV is an enhancement request to better handle the new error thrown by AJMO7LHMK9 when the 10 retry attempts are exhausted. This is currently closed pending additional demand.
 
 
3. CSCT836HFL is open as a regression of AJMO7LHMK9 that causes existing code to break as a result of the view update throttling. This one is new and awaiting evaluation.
 
 
I hope to have a better idea on the path forward for CSCT836HFL today and will report back with those findings.

photo
15 Erik Brooks commented   Permalink No RatingsRatings 0

@14 - Sounds good Chad. Please let me know when there's another update and I'll post a follow-up blog entry.

Add a Comment Add a Comment
Previous Comment |  Next Comment

Previous |  Main  | Next
Skip to main content link. Accesskey S
IBM Lotus Connections Help Tools About