• Browse Blogs
  • My Blog
  • My Updates

+Tags Get help with tags?

+ Similar Blogs

photo

Yellow is the...

72 Entries |  Tim Tripcony
Updated 
RatingsRatings 2     CommentsComments 34
photo

CrashTestChix

100 Entries |  Marie L Scott
Updated 
RatingsRatings 13     CommentsComments 226
photo

TexasSwede

109 Entries |  Karl-Henry Martinsso...
Updated 
No RatingsRatings 0     CommentsComments 94
photo

Lotus Nut

111 Entries |  Chris Whisonant
Updated 
RatingsRatings 23     CommentsComments 157
photo

Henning Schmi...

54 Entries |  Henning Schmidt
Updated 
RatingsRatings 1     No CommentsComments 0

+ Blog Authors  

All entries tagged with readers

1 - 1 of 1
  • Previous
  • Next
  • Page   1

The future of NSF? Part 3

Erik Brooks |   | Tags:  views domino lotus performance readers nsf | Comments (0)  |  Visits (1,070)
This blog post is part of a series that focuses on some possibilities (some mine, some from inside IBM themselves) that could be a big deal to anybody who cares about view performance.  And if you had high hopes for DB2NSF, there's definitely some carryover here as well.


Note: Stephan Wissel beat me to the punch (I've been slowly working on this blog post for a few weeks) and posted a great article explaining how readers fields work within views.  Click here to read it for a little more preliminary background (he's got a great picture, as well).

If you've read his article, some of this post will be redundant.  But please keep reading, there's some other ideas here as well.



Readers fields!

We all love them.  Readers and authors fields are one of NSF's greatest strengths.  Unfortunately when it comes to view performance they are also one of NSF's greatest weaknesses.  One large view with reader's fields can bring an application (and even an entire server) to its knees.

 

A little background:

When a user opens a view (whether in Notes or on the web) a "page" of the view is transferred.  The page contains a certain number of rows determined by several factors.  I'm not going to get into what those factors are here but the point is: the entire view isn't simply transferred to the client (or browser).

That's why as you scroll through a view you'll see network traffic from the little lightning bolt icon.  After scrolling for a certain number of rows, more of the view needs to be loaded, so another "page" is transferred.

When a user requests a page, Domino scans the view for documents they can access.  If the user has access, the doc is added to the page.  Only when Domino has enough rows to fill a page (or it hits the bottom of the view) does it then transfer the page to the user.

 

The Problem:

Imagine a scenario where a user opens a view.  Their client or web browser requests a view "page" of 100 rows.

How quickly does the user get their data?  That depends...

The Good Scenario - The user has access to all of the docs in the first 100 rows.

In this case Domino scans the first 100 rows, and - great - they all go into the page.  The page is shipped off to the client/browser, and everybody's happy.


The Not-So-Good Scenario - The user has access to all of the docs in the first 99 rows.

In this case Domino scans the first 99 rows, and - great - they all go into the page.  But the user needs 1 more row to fill the page to 100 rows, so Domino is going to keep looking.  So the server keeps scanning the view.

If the user can see another doc say, 50 rows down, then Domino will scan those 50 rows, find the next doc, and add it to the page.  Great!  Now there's 100 rows in the page, so the page is sent down to the user.


It Gets Worse...

What if that 100th doc wasn't a mere 50 rows down?  What if it was 5000 rows down?

What if there is no "100th" doc because the user simply doesn't have readers' access to any more docs in the view?

Domino doesn't know.  It will scan and scan and scan until it finds a 100th doc or it reaches the end of the view.  During that time server cpu will be completely maxed out until the scan is done.  In a large view this can easily take several seconds or more.  There will also be some disk i/o for reading the view index (though compared to the cpu usage this is minimal).

If you've got an 8-cpu-core server all you need is 8 users to open this view within several seconds of each other and ALL of your server's cpu will be pegged at 100% for quite awhile.


And Worse...

The results of the scan aren't stored on the server anywhere after the scan is complete.  So if the same user comes back in an hour and opens the view, the entire process is repeated - even though the view might not have changed at all.


And Even Worse...

During the scan, that view's index can't be updated.  That means that if a doc was created or edited during the scan and then somebody else tried to open the view, that second user's scan can't even start until the first user's scan is done AND the view index is updated.

If your database is a bit busy with writes this can quickly cause a massive snowball effect as users "wait in line".  Your app (and even the entire server) can quickly become very slow or completely unusable.


And Even Worse than THAT...

This doesn't just affect a user opening a view.  This problem affects ANY program code that uses view lookups, as well.  @DbColumn, in particular, is extremely vulnerable to this problem because there's no "page" involved - the entire view must be scanned, each and every time.



Workarounds / How Can This Be Fixed?

I would bet that many "Notes doesn't scale" cases would be crushed if this bottleneck was addressed.  It's also a very common problem that bites new Notes developers, as their application that performed quickly for a small number of docs and users in testing suddenly becomes unusable as they start simulating a large production setting.


IBM might be able to better address this problem.  More on that in a minute.  In the meantime you can try to avoid this bottleneck if you're willing to make some tradeoffs:


1. Don't use readers/authors fields if you don't have to.

If you don't need to use them then you can avoid this problem entirely.


2. Set your view index refresh to something besides "Automatic".

Changing this setting might not even be an option if your application always requires current information to be shown in the view.  But if you set the view's refresh to "Manual" or "Auto, at most every X hours" then this can help avoid the "wait in line" snowball problem above.  This won't speed up the scanning process, though, so it's still possible for a small number of users to consume massive server-side resources, and even a single user will be waiting for a page much longer than desired.

3. Add a category column to the front of the view.  Set it to show the docs' readers.  Only use the view as an embedded view.

By making the view an embedded view you can embed it and use the "restrict to category" feature to restrict by @Username or @Userroles.  This will cause a somewhat-optimized scenario during the scan, since all of each user's readable docs will all be adjacent in the view.  Page scans will complete quickly.  Stephan Wissel covers this workaround in his blog (referenced at the top of this entry).

Unfortunately this means that you give up all resorting capabilities of the view for your users, your view index will get much larger (especially if there are multiple readers on the docs as each doc will be repeated across many categories), and you probably won't be able to use it for lookups anymore (but you had a separate view for that anyway, right?)  @DbColumn performance would be much worse since the view is even larger and the scanning process would take even longer for that function.  Assuming you don't run into the 64K @DbColumn limit beforehand.

But most importantly, this workaround doesn't work under some complex scenarios, e.g.: if you're using roles and a user has access to multiple roles with docs in them.  To handle that you've got to jump through some code hoops, essentially hand-crafting your own dataset.  And if you need it sorted ACROSS those separate roles, Domino isn't storing that index for you.


A Real Solution


Many of this blog's readers will have already guessed the "correct" way to fix this bottleneck:

A view needs to track multiple indexes, ideally one per each reader entry found in the set of docs.

WHAT? WHOA, THAT'S A LOT OF INDEXES!   Yes, it is.  Potentially.  Obviously if all of our view index sizes increased by 1000-fold we'd all be a bit concerned.

But it might not need to be *that* bad.  We don't need an index of each of the combinations of readers -- that would grow exponentially (the famous O^n in computer science lingo) and is therefore a no-go.  But if there was a single index for each reader in the view it wouldn't be hard for Domino to simply cycle through a user's multiple roles and piece together the multiple indices on-the-fly (similar to workaround #3 above).  That algorithm is fairly straightforward, and fairly quick -- YEARS quicker than the way things work now.  And for Domino to run it at the compiled source level would be much faster than us having to "roll our own" workaround as per #3 above.

Another possible optimization technique is to have Domino perform and on-the-fly build-and-store of those readers-aware indexes.  I.E, Domino builds its standard view index as it does now.  Then, on first access, Domino could perform its page-scanning technique as it does now, and then store the results that it did calculate in a specific index for that user, helping to avoid future recalculations for that user.  The indexer could then update those per-user indexes as it updates the view's main index.  The per-user-generated indexes could follow the "Discard index" setting from the view.  E.g. if a per-user index wasn't used for X days, delete it.



Final Thoughts

This idea is currently SPRed within IBM as SPR #SVRO44BU6Y.  Open a PMR and reference that SPR# if you're interested.

I don't, however, believe that the increased view index size and -- most importantly -- increased NSF write contention would keep performance acceptable as long as view indexes are still stored in the NSF.  They'll have to be moved to somewhere on the general file system to make this workable from a storage and performance standpoint.  If you've been following this series, though, you'll remember that was the topic of my previous blog entry in this series.


No RatingsRatings 0

  • Previous
  • Next
Jump to page of 1
Skip to main content link. Accesskey S
IBM Lotus Connections Help Tools About