Mindoo Blog - Cutting edge technologies - About Java, Lotus Notes and iPhone

  • New on OpenNTF: Geospatial indexing for IBM Notes/Domino data

    Karsten Lehmann  30 July 2013 23:11:37
    Last weekend I created a new project on OpenNTF.org, which is part of a pretty big "pet project" that I have been working on for several month and that will hopefully be ready for primetime someday.

    My original plan was to submit this pet project for the last XPages development contest, either in addition to or instead of the Mindoo FTP Server, but the project got bigger and bigger over time - and an FTP server was finally easier to polish and explain than my other idea.

    This idea has to do with alternative indexing techniques for IBM Notes/Domino data, something like "Notes Views on steroids":
    Building an external indexer for IBM Notes/Domino that is more powerful than classic Notes Views, but still easy to use and scalable for large amounts of data.

    And while I was investigating different open source indexers and database engines, I once again came across the topic "Geospatial Indexing", which I had already discussed in the article XPages series #14: Using MongoDB’s geo-spatial indexing in XPages apps

    Geospatial indexing basically solves the task to find locations stored in a database that are close to a given set of coordinates, specified as latitude/longitude pair and to sort the results by distance.
    With all those smartphones out there that carry a GPS chip, the requirement nowadays is pretty often to "find the next Italian restaurant" or "find friends nearby" that all can be solved with Geospatial Indexing.

    In my XPages series article I demonstrated how to use an external MongoDB database to do these kind of searches from XPages applications, but this stuff gets even more interesting and realistic if we can solve it with pure Notes/Domino technologies - and it is possible.

    There are a few obvious ways how Geospatial searches could be implemented with Notes/Domino APIs, e.g. Database.search(String), fulltext searching or just manually scanning through all view entries to find the relevant documents.
    The main problem is, that they either do not scale very well, because all documents of a database have to be scanned or they require the creation of a fulltext index, which I personally try to avoid for this kind of lookups (takes a lot of disk space, is often not up to date, sometimes issues with date searches, when Domino thinks a field is not a date/time, but a text).

    The solution: Geohashes

    After a few hours of searching, I found a document that explains how MongoDB has implemented Geospatial Indexes.
    They convert latitude/longitude pairs to a single string value, a so called Geohash.

    This way, a single prefix lookup is enough to search for both values. All you have to do is to compute the list of Geohash boxes that intersect the search area and find view entries that start with the right Geohash prefix:

    Image:New on OpenNTF: Geospatial indexing for IBM Notes/Domino data
    (screenshot taken from the Geohash demonstrator website)


    Mindoo Geohash Demo

    The new project on OpenNTF that demonstrates the Geohash technique is called "Mindoo Geohash Demo" and it looks like this:

    Image:New on OpenNTF: Geospatial indexing for IBM Notes/Domino data


    Project description

    The sample database can be used to store and search real-world locations. A location document consists of a name, a type (e.g. "Restaurant" or "Supermarket"), address information with street/zip/city/country and a field for other custom data.

    When entered via the web interface, we use the Google Geocoding API  to retrieve geo coordinates (latitude/longitude) for the address.
    These coordinates are stored alongside the other location data in the database.
    Location documents can also be created via a REST API call.

    Image:New on OpenNTF: Geospatial indexing for IBM Notes/Domino data

    The database also provides search functionality via web UI and REST API to quickly find the nearest locations for a given point (either entered as address or latitude/longitude pair), sorted in ascending distance.

    To get started, simply sign the database, copy it to your IBM Domino R9 server and open it in a browser.
    The database contains a sample dataset (all Starbucks stores in New York and Berlin, all Apple Stores in Germany) as a starting point, but this data can be deleted to start from scratch.
    To search for locations, enter an address (e.g. "Brandenburger Tor, Berlin, Germany") and the maximum distance in meters (e.g. 1000) in the search form and click the search button.

    You can further restrict the result set by specifying a location type (e.g. "Coffee"). Just select a type and leave the address field empty to see all locations with that type in the database.

    Image:New on OpenNTF: Geospatial indexing for IBM Notes/Domino data

    For a visual representation of the search results, select up to 25 rows in the result list and they will get displayed via the Google Maps API.

    Hope you like the demo! All code and required libraries are available under Apache 2.0 license.

    Comments

    1Mark Barton  21.08.2013 10:25:24  New on OpenNTF: Geospatial indexing for IBM Notes/Domino data

    Karsten,

    I will be definitely checking this out thanks.

    We originally had a REST API which was using Java and the haversine formula ({ Link } ) to calculate the nearest points using the Long / Lat, the problem was it was relatively slow so there was a payoff against the size of dataset returned as JSON against the processing time.

    In the end it worked out better for us to serialise all of our Geolocation information per continent and load the matching file depending on what the user selected. If the dataset changed the file was recreated.

    An unzipped file for approx 10,000 POI categorised into 9 product groups was 2.3Mb. Of course the browser then caches the file as well. This is for an internal application and the browser seems to cope (even IE8).

    We use markercluster - { Link } to control how many POI are displayed, depending on the zoom level.

    2Karsten Lehmann  21.08.2013 10:28:51  New on OpenNTF: Geospatial indexing for IBM Notes/Domino data

    Mark,

    thanks for the feedback!

    I also use the haversine formula, but the Geohash based lookup is the key to scan only a small subset of the data.