XPages are gaining traction, and this is a Good Thing (tm). IBM is obviously behind the concept, as 8.5.1 has some new goodies, and it's no doubt that 8.5.2 will also. The Dojo community is addressing IBM's needs (and vice-versa), and we're seeing a bit of fresh Domino/XPages developer blood pop up in the 8.5 forums as a result. And a really, really awesome thing is happening: People (including IBM) are contributing sidebar widgets and XPage custom controls to the community. This is huge, and hopefully I'll be donating some nice ones in awhile. XPages are truly revolutionizing Domino development. Aside from a couple of glaring barriers-to-entry (inability to consume LotusScript, complaints of inadequate documentation, lack of user-versus-signer ACL control, inability to "partial refresh" multiple controls) they really are pretty darn amazing. They let you do things with the interface that would simply be cost-prohibitive on legacy Domino. One of the design paradigms that is becoming prevalent in XPage development is a de-normalization of Domino data. I.E. you don't use list fields (text list, number list, etc.) as often. Many pieces of XPages are built around handling a collection of documents, and to leverage that integration you're better off using individual documents for some things. As a case-in-point here are two of my favorite examples of reusable controls -- both are File Upload managers: Here's one. Here's the other. They both let users upload attachments. Each attachment becomes a new document, with a few pieces of additional information (including the UNID of the "main" document). Cool right? I thought so too. The interface is definitely waaaay better than the old "File Upload Control" from legacy Domino dev. But then I thought about it a bit, and realized that there's a big problem with this new paradigm: indexing. You've got a collection of "main" documents, and a bunch of "file upload" documents. But as soon as you've decoupled your data in this way, many things get much more difficult. Try this: Get a collection of all "main" documents that have more than 2 files uploaded. Now, if instead of XPages you were working with classic Domino dev you would be storing your attachments in the "main" docs themselves. And you would simply type this in your view
selection formula: @Attachments > 2 But in this new decoupled paradigm of XPages where your data spans multiple docs, what do you do? You either get to hack in some sort of incrementing code to write to a field on the main document (and decrementing when you go to delete) and search on that, or you get to hack in a loop routine to do your own pseudo-JOIN and return the collection of docs that matches your criteria. The first would be potentially open to sync problems, the second would not scale as your dataset grew. A third possible option would be to manipulate the file after it's uploaded to shoehorn it back into the "main" doc, but that's got its own set of problems. And all of these approaches require a bunch of extra code. Ouch. Score one for legacy Domino dev. I think that XPages are going to quickly bring to the forefront the #1 weakness of NSF - views. Don't get me wrong - views are still fairly powerful, and there's a lot you can do with them. But they haven't gotten any true indexing enhancements in years. Well, there's the "defer index build" feature which is more of an admin/disk space help (8.0), and also the feature where view index builds are integrated with the transaction log (6.0 I believe). But as far as helping you search/group/sort/evaluate your data, things haven't changed in a decade. The list of potential enhancements is huge, some of which we've discussed here:
- store readers' field info as part of the view index to help address the "many readers and docs" problem
- get view indexes out of the NSF, on an external path similar to DAOS' functionality
- implement IBM's patent from 2002 (which pertains to Notes/Domino specifically) to allow users to resort while inside of a category, or while using restrict-to-category
- sorting on category-row totals
- cross-column totaling (e.g. a category row has a 12 in column A and 15 in column B. Column C shows 12/15)
- allow developers to specify fields to be indexed at the NSF level, similar to what "Optimize Document Table Map" does for the Form field (though that feature got broke in 6.0).
- add additional column totaling algorithms (min, max, stddev, "custom" with an @Formula would also be nice)
...and of course, the classic problem I've exemplified in this blog post: JOINs. Part of me thinks that, with NSFDB2 being scrapped, IBM's got to at least be aware of the need for those. Hopefully there's work being done. With just a bit more enhancements on views, there's really no stopping XPages.
|
Ratings
0
|
This blog post is part of a series that focuses on some possibilities (some mine, some from inside IBM themselves) that could be a big deal to anybody who cares about view performance. And if you had high hopes for DB2NSF, there's definitely some carryover here as well.
Note: Stephan Wissel beat me to the punch (I've
been slowly working on this blog post for a few weeks) and posted a
great article explaining how readers fields work within views. Click here to read it for a little more preliminary background (he's got a great picture, as well).
If
you've read his article, some of this post will be redundant. But
please keep reading, there's some other ideas here as well.
Readers fields! We all love them. Readers and authors fields are one of NSF's greatest strengths. Unfortunately when it comes to view performance they are also one of NSF's greatest weaknesses. One large view with reader's fields can bring an application (and even an entire server) to its knees.
A little background: When a user opens a view (whether in Notes or on the web) a "page" of the view is transferred. The page contains a certain number of rows determined by several factors. I'm not going to get into what those factors are here but the point is: the entire view isn't simply transferred to the client (or browser). That's why as you scroll through a view you'll see network traffic from the little lightning bolt icon. After scrolling for a certain number of rows, more of the view needs to be loaded, so another "page" is transferred. When a user requests a page, Domino scans the view for documents they can access. If the user has access, the doc is added to the page. Only when Domino has enough rows to fill a page (or it hits the bottom of the view) does it then transfer the page to the user.
The Problem: Imagine a scenario where a user opens a view. Their client or web browser requests a view "page" of 100 rows. How quickly does the user get their data? That depends... The Good Scenario - The user has access to all of the docs in the first 100 rows. In this case Domino scans the first 100 rows, and - great - they all go into the page. The page is shipped off to the client/browser, and everybody's happy.
The Not-So-Good Scenario - The user has access to all of the docs in the first 99 rows. In this case Domino scans the first 99 rows, and - great - they all go into the page. But the user needs 1 more row to fill the page to 100 rows, so Domino is going to keep looking. So the server keeps scanning the view. If the user can see another doc say, 50 rows down, then Domino will scan those 50 rows, find the next doc, and add it to the page. Great! Now there's 100 rows in the page, so the page is sent down to the user.
It Gets Worse... What if that 100th doc wasn't a mere 50 rows down? What if it was 5000 rows down? What if there is no "100th" doc because the user simply doesn't have readers' access to any more docs in the view? Domino doesn't know. It will scan and scan and scan until it finds a 100th doc or it reaches the end of the view. During that time server cpu will be completely maxed out until the scan is done. In a large view this can easily take several seconds or more. There will also be some disk i/o for reading the view index (though compared to the cpu usage this is minimal). If you've got an 8-cpu-core server all you need is 8 users to open this view within several seconds of each other and ALL of your server's cpu will be pegged at 100% for quite awhile.
And Worse... The results of the scan aren't stored on the server anywhere after the scan is complete. So if the same user comes back in an hour and opens the view, the entire process is repeated - even though the view might not have changed at all.
And Even Worse... During the scan, that view's index can't be updated. That means that if a doc was created or edited during the scan and then somebody else tried to open the view, that second user's scan can't even start until the first user's scan is done AND the view index is updated. If your database is a bit busy with writes this can quickly cause a massive snowball effect as users "wait in line". Your app (and even the entire server) can quickly become very slow or completely unusable.
And Even Worse than THAT... This doesn't just affect a user opening a view. This problem affects ANY program code that uses view lookups, as well. @DbColumn, in particular, is extremely vulnerable to this problem because there's no "page" involved - the entire view must be scanned, each and every time.
Workarounds / How Can This Be Fixed?
I would bet that many "Notes doesn't scale" cases would be crushed if this bottleneck was addressed. It's also a very common problem that bites new Notes developers, as their application that performed quickly for a small number of docs and users in testing suddenly becomes unusable as they start simulating a large production setting.
IBM might be able to better address this problem. More on that in a minute. In the meantime you can try to avoid this bottleneck if you're willing to make some tradeoffs:
1. Don't use readers/authors fields if you don't have to.
If you don't need to use them then you can avoid this problem entirely.
2. Set your view index refresh to something besides "Automatic".
Changing this setting might not even be an option if your application always requires current information to be shown in the view. But if you set the view's refresh to "Manual" or "Auto, at most every X hours" then this can help avoid the "wait in line" snowball problem above. This won't speed up the scanning process, though, so it's still possible for a small number of users to consume massive server-side resources, and even a single user will be waiting for a page much longer than desired.
3. Add a category column to the front of the view. Set it to show the docs' readers. Only use the view as an embedded view.
By making the view an embedded view you can embed it and use the "restrict to category" feature to restrict by @Username or @Userroles. This will cause a somewhat-optimized scenario during the scan, since all of each user's readable docs will all be adjacent in the view. Page scans will complete quickly. Stephan Wissel covers this workaround in his blog (referenced at the top of this entry).
Unfortunately this means that you give up all resorting capabilities of the view for your users, your view index will get much larger (especially if there are multiple readers on the docs as each doc will be repeated across many categories), and you probably won't be able to use it for lookups anymore (but you had a separate view for that anyway, right?) @DbColumn performance would be much worse since the view is even larger and the scanning process would take even longer for that function. Assuming you don't run into the 64K @DbColumn limit beforehand.
But most importantly, this workaround doesn't work under some complex scenarios, e.g.: if you're using roles and a user has access to multiple roles with docs in them. To handle that you've got to jump through some code hoops, essentially hand-crafting your own dataset. And if you need it sorted ACROSS those separate roles, Domino isn't storing that index for you.
A Real Solution
Many of this blog's readers will have already guessed the "correct" way to fix this bottleneck:
A view needs to track multiple indexes, ideally one per each reader entry found in the set of docs.
WHAT? WHOA, THAT'S A LOT OF INDEXES! Yes, it is. Potentially. Obviously if all of our view index sizes increased by 1000-fold we'd all be a bit concerned.
But it might not need to be *that* bad. We don't need an index of each of the combinations of readers -- that would grow exponentially (the famous O^n in computer science lingo) and is therefore a no-go. But if there was a single index for each reader in the view it wouldn't be hard for Domino to simply cycle through a user's multiple roles and piece together the multiple indices on-the-fly (similar to workaround #3 above). That algorithm is fairly straightforward, and fairly quick -- YEARS quicker than the way things work now. And for Domino to run it at the compiled source level would be much faster than us having to "roll our own" workaround as per #3 above.
Another possible optimization technique is to have Domino perform and on-the-fly build-and-store of those readers-aware indexes. I.E, Domino builds its standard view index as it does now. Then, on first access, Domino could perform its page-scanning technique as it does now, and then store the results that it did calculate in a specific index for that user, helping to avoid future recalculations for that user. The indexer could then update those per-user indexes as it updates the view's main index. The per-user-generated indexes could follow the "Discard index" setting from the view. E.g. if a per-user index wasn't used for X days, delete it.
Final Thoughts
This idea is currently SPRed within IBM as SPR #SVRO44BU6Y. Open a PMR and reference that SPR# if you're interested.
I don't, however, believe that the increased view index size and -- most importantly -- increased NSF write contention would keep performance acceptable as long as view indexes are still stored in the NSF. They'll have to be moved to somewhere on the general file system to make this workable from a storage and performance standpoint. If you've been following this series, though, you'll remember that was the topic of my previous blog entry in this series.
|
Ratings
0
|
This blog post is the first of a 3-part series that focuses on some possibilities (some mine, some from inside IBM themselves) that could be a big deal to anybody who cares about view performance. Views. They're the core component of any Domino database. You may argue that forms are more important, but with xPages you don't even technically need those anymore either. Views are the only real-time-capable index you can create. If you need to sort or categorize data with anything remotely resembling scalability or speed, you need views. As a result we all want them to be fast. REAL fast. And we want lots of them, with all the sortable options enabled, and with as many different columns showing as many different pieces of data as possible. But views are expensive. One of the best things IBM could do for views (besides adding JOIN capabilities) would be to store view indexes outside of NSFs. A view "index" is essentially two items, $Collation and $Collection, that gets stored on the view design docs. Go pull up the document properties on a view design doc in Designer, and you'll see them. But you won't see the sizes. For that, you'll need to use the Administrator client. Simply open the "Files" tab, right-click on your database and select "Manage Views".
If these fields were moved to be outside of the NSFs, then we'd realize some significant benefits: 1. Many NSF sizes would be drastically reduced -- many NSFs have over half of their size allocated to view indexes. And smaller NSFs perform much faster than larger ones. 2. View indexing could be offloaded to separate drives from the NSFs altogether -- even more speed! 3. Notes queues writes. A db is completely write-locked with *any* write that occurs to an NSF. So while a view index is being written to inside an NSF a doc cannot be simultaneously added to the db, user activity can't be recorded, etc. With this change this bottleneck could be removed - the view index could be written to externally while a doc is added to the NSF.
Here's the kicker: With DAOS in 8.5 I'd bet money that the groundwork is likely already there for abstracting an item out to the filesystem. View indexes aren't replicated anyway - why store them in the view design doc at all? You could argue that for security reasons they should stay in the NSF, but that's easily addressable by encryption similar to how DAOS encrypts its files. Sound interesting? If you're interested, there's two ways to let IBM know that you care: 1. Open a support ticket with IBM, asking to be added to SPR#: DCOE7N32FM / APAR#: LO36534 2. Go vote on the topic on IdeaJam: http://www.ideajam.net/IdeaJam/P/ij.nsf/0/BA723B6332ED0C67862573B4003E9AE2?OpenDocument
Thanks for reading. There's more fun stuff to come, including an IBM patent that will make some of us Domino people say "WHAT? WHEN?"
|
Ratings
0
|