As many of you may know, we have had a problem with
Dogear here for a while. At one point, I started referring to it as Old Yeller (could not resist the dead dog/yellow reference - sorry). After doing a lot of upgrades (to IHS/WAS/Connections), DB troubleshooting, completely deleting the DB (which has since been restored), enabling advanced Windows logging (which did not help much), even moving to a fresh VM with a clean install of WAS, maintenance, etc..., nothing seemed to work. In all the crashes of the Dogear jvm, only once did a good Dr. Watson dump get sent out. But in one of them we noticed what looked like a strange search of the Dogear database. After further digging, I looked at this closer with IBM. Turns out we were seeing a lot of these in the IHS access.log:

It was not always the same search - it would be for different users, etc.... The IP address, though, is a Googlebot. So, in walks robots.txt to see if he can help! After adding it to the server, I wondered how long it would take for Google to honor the settings as we kept seeing the crashes. As part of the move to a new box with Deployment Manager, the Dogear jvm would automagically restart. So that helped uptime a bit. ;) I noticed that the last crawl of Dogear by Google was around 8:25 PM on 6/12/09. And at 8:29 PM Dogear was last restarted. So it appears that Google took around 12 hours to start honoring the robots.txt.
After running now for almost a week, I think we are confident in saying that Dogear has issues with this type of string in the crawl. After some further data crunching from 6/12/09 (where we logged 7 Dogear crashes between 7:11 pm and 8:30 pm), here are some quick stats on when we saw Dogear get crawled with this string in the IHS access.log and when the server recovery message was logged in the WAS SystemOut.log:
the string "lang=en?ref=sex%" was found in IHS access.log
6/12 19:07
6/12 19:12
6/12
19:25
6/12
19:43
6/12
19:50, 19:52
6/12
20:06
6/12
20:25
Restarts in SystemOut.log
6/12 19:11
6/12 19:15
6/12 19:29
6/12 19:44
6/12 19:57
6/12 20:11
6/12 20:29
Since those times on 6/12, the crawl has not occurred any more and Dogear has not crashed. Here's to hoping this is finally resolved!
Comments (1)
So, If I go to dogears, and paste "〈=en?ref=sex%" into the URL, am I going to kill the server?
Just curious.