Mailserver: server task hang at 100% CPU - Problem with ulimit -n to low
Our Domino Mailserver crashed or better became unresponsive with the server task at 100%CPU. The log showed during that time:
- Unable to access notes.ini. Determine what application or hardware fault is preventing access. Previous cache values used.
- Error updating local ID file: Could not open the ID file
Also, there were a few "Too many open Files" Error in DDM (mostly fulltext index files, sometimes also databases) which I at first didn't relate to this. It started after the 8.0.2 update in fall and got worse after that. A restart of the domino "fixed" the problem, but after the xmas holidays we had to do this every day :-(
The Server was a ubuntu linux (latest LTS-release). The first level IBM Support couldn't help and the second level Support didn't want to support us as we are running an unsuported OS :-(
I turns out, that the problem was a too low "ulimit -n" of 1024. This limits the open files of processes and can be set via PAM configuration files. Our domino server process has about 700 open DBs + other files in normal mode:
# lsof |grep " lotus " |grep "^server" |wc -l 1135
I suspect that the IO changes in 8.0.x and also that we had to add a few new databases during the start of term (we are a student organsation and we get new member during that time) were the reason why we never saw that before and also didn't see this on a different domino on the same Server, but without so many mailfiles.
The changes to fix it were: /etc/security/limits.conf + lotus hard nofile 4096 + lotus soft nofile 4096 also add this line in /etc/pam.d/su: + session required pam_limits.so
The last change is because we use the rcdomino startscript, which uses su to start the domino server.
After that: no more hangs...
Anyway: I still suspect a bug, as the domino should not hang without an error if that problem occures. Lets see if the report in our PMR gets a reaction :-)
Another question: what is this limit on RedHat or Suse systems? We are not really big, so othere should have run into this much earlier if the limit wouldn't be higher... Or it really is an ubuntu problem...
|
Mailserver: server task hang at 100% CPU - Problem...
|
1 Simon O'Doherty Permalink Hi Jan,
Just to clarify regarding the level 2 comment. While you are
correct unsupported operating systems are beyond the scope of
support, if you test on a supported platform and get the same issue
then it would be supported. To the point where the issue no longer
occurs in the supported platform (which normally resolves the issue
in the unsupported platform sometimes).
I pinged the crash/performance L2 guy who sits beside me this blog
entry. :)
2 Liam Harpur Permalink Hi Jan.
Yes.... I would increase the nofiles limit still furtHer.... I
would generally recommend the following regarding the OS limits
(these are recommendations for Solaris... but the equivalent
settings for Ubuntu may work also):
Resource Limits:
================
Soft/Current Limits:
====================
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 256
memory(kbytes) unlimited
Hard Limits:
============
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) 65536
memory(kbytes) unlimited
Liam.
3 Jan Schulz Permalink @Simon: Yep, that was the reply: try on a supported platform.
I understand that supporting everything is not possible: we have
external laptops in our org and writing a foolproof installation
guide for notes is something which seems not possible. Not so much
because of notes (although it could be better and it would if I
would try to understand how to customize the installer, but thats
another blog post) but because of the people using it....
Unfortunately we did not have such alternative system and in this
case we would probably have gotten a "works on RH"... But migration
would not have been possible, as we only "know" debian based
systems.
So, also I understand it, i still needed to have some noise proof
room at that time :-) especially after the nice first level
support.
4 Henning Heinz Permalink No, this is not an Ubuntu problem. It could be that the Enterprise distributions now handle this for you but it was quite common to implement some tweaks in the past.
This is from the Domino 7.0.2 release Notes
1. Domino is started from a login session
For this case, the default must be overridden by modifying the file
/etc/security/limits.conf AND insuring it is respected by the
login. Edit /etc/security/limits.conf using root and add or modify
the lines:
notes soft nofile 20000
notes hard nofile 49152
blablabla...
I have been running Domino on Debian (now 8.5 on Lenny) for years.
I am not going back to Novell or Red Hat. I also tried Ubuntu LTS
last year but apt sometimes left broken packages and I am used to
that everything just works.