OOM in CentOS analyzing and thinking
1. Condition
- Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
- Jira
JVM_MINIMUM_MEMORY=”2g”
JVM_MAXIMUM_MEMORY=”3g” - Confluence
1CATALINA_OPTS="-Xms2g -Xmx3g -XX:+UseG1GC ${CATALINA_OPTS}" - BitBucket
12JVM_MINIMUM_MEMORY=2gJVM_MAXIMUM_MEMORY=2g
andĀ elasticsearch:
1-Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss320k - Artifactory
1export JAVA_OPTIONS="-server -Xms512m -Xmx1g -Xss256k -XX:+UseG1GC -XX:OnOutOfMemoryError=\"kill -9 %p\"" - Jenkins
1JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true -Xmx1024m"
2. Problem
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 1062024 bytes for Chunk::new # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (allocation.cpp:390), pid=24614, tid=0x1c8f9b40 ...... Memory: 4k page, physical 33554432k(20680264k free), swap 0k(0k free) vm_info: Java HotSpot(TM) Server VM (25.171-b11) for linux-x86 JRE (1.8.0_171-b11), built on Mar 28 2018 12:00:40 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0xb775b41e, pid=23390, tid=0xb754f700 # # JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build 1.8.0_171-b11) # Java VM: Java HotSpot(TM) Server VM (25.171-b11 mixed mode linux-x86 ) # Problematic frame: # C [+0x41e] __kernel_vsyscall+0x1e # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp ...... Memory: 4k page, physical 33554432k(22882124k free), swap 0k(0k free) |
As you can see, the system memory is about 20 Gb free when the bitbucket or Artifactory is crashing.
3. Analyzing
3-1. Try 1
After reading thisĀ Bitbucket Server is reaching resource limits, I tried to set theĀ JVM_MAXIMUM_MEMORY to 3gb for Bitbucket. But I can not start Bitbucket, with such error: cannot allocate 3xxxxxxk.
3-2. Try 2
I guess there is one system background process had like to kill bitbucket process unexpectedly, and I found this post:
Bitbucket Server Process Dies Unexpectedly Due to Linux OOM-Killer
The Symptom is 100% the same as mine:
- Bitbucket Server is installed on a Linux host.
- Ā The entire Bitbucket Server process suddenly terminates without warning. That is to say, the process ID (pid) is gone.
- The browser shows a generic “cannot connect” or similar error, indicating that it is not able to reach the webpage
- Nothing out of the ordinary appears in the Bitbucket Server application logs (
$BITBUCKET_HOME/logs/atlassian-bitbucket.log)
, since the application was terminated without properly shutting down - Similarly, Tomcat logs (
<Bitbucket Server installation directory>/logs/catalina.out
) also do not show errors
The Diagnosis from this article said that:
Ā Linux OOM-KillerĀ is a feature on some Linux installationsĀ that will sacrifice processes to free up memory if theĀ operating systemĀ experiences memory exhaustion for its own operations. Please note that this is different from Bitbucket Server running out of memory. In this case, the OS itself is in danger of running out of memory and thus starts terminating processes to avoid it.
On the host machine, look in theĀ /var/log/Ā directory for theĀ syslogĀ orĀ messages.
But in my host server, I can not find these files. But accordingĀ to this CentOS 7: Disabling OOMKiller for aĀ process, I foundĀ /proc/<pid>/oom_score_adj
Ā , which pid is the dead bitbucket pid. You can set the score of a process to avoid toĀ be killed by OOM Killer. This could be oneĀ solution, You need set the value every time when youĀ restart Bitbucket manually or automatically, for example:
1 2 |
........ echo -1000 > /proc/$PID/oom_score_adj; done; |
3-3. Try 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
Version: 2.5 uid resource held maxheld barrier limit failcnt 2791714: kmemsize 313088859 463237120 9223372036854775807 9223372036854775807 0 lockedpages 0 0 8388608 8388608 0 privvmpages 4316480 5586119 9223372036854775807 9223372036854775807 0 shmpages 36541 36542 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numproc 1130 1300 1300 1300 25497 physpages 3148540 3942884 8388608 8388608 0 vmguarpages 0 0 8388608 8388608 0 oomguarpages 3021652 3772121 8388608 8388608 0 numtcpsock 311 633 1600 1600 0 numflock 20 24 9223372036854775807 9223372036854775807 0 numpty 2 4 9223372036854775807 9223372036854775807 0 numsiginfo 1 126 9223372036854775807 9223372036854775807 0 tcpsndbuf 6127784 11155032 9223372036854775807 9223372036854775807 0 tcprcvbuf 5815912 11133776 9223372036854775807 9223372036854775807 0 othersockbuf 289000 1068472 9223372036854775807 9223372036854775807 0 dgramrcvbuf 0 207880 9223372036854775807 9223372036854775807 0 numothersock 230 378 1800 1800 0 dcachesize 161201527 314460571 9223372036854775807 9223372036854775807 0 numfile 17606 18380 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 dummy 0 0 9223372036854775807 9223372036854775807 0 numiptent 328 328 9223372036854775807 9223372036854775807 0 |
I foundĀ numprocĀ this field, limit is 1300, and held is 1130.
The fieldĀ heldĀ shows the current counter for the Private Virtual serversĀ (resource āusageā).
The fieldĀ maxheldĀ shows the counterās maximum for the lifetime of the Virtual server. The lifetime of the Virtual server/Node/VserverĀ is usually just the time between the start and stop of your VPS.
3-4. Try 3
An idea flashed through my mind, I can try running Bitbucket with lower memory.
1 2 |
JVM_MINIMUM_MEMORY=1500m JVM_MAXIMUM_MEMORY=1500m |
And miracle comes, the Bitbucket is running stable. WTF?
Reason unknown. 2G is a huge size for a 20Gb memory free host??
3-5. Try 4
To check all the running applications, enter
1 2 |
ps -e |
another good option is:
1 |
ps aux |
And this is part of running applications
1 2 3 4 5 6 7 8 |
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postfix 957 0.0 0.0 89868 1520 ? S Nov09 0:00 qmgr -l -t unix -u jenkins 1341 0.1 2.2 1690600 766604 ? Ssl Nov09 7:16 /var/alternatives/java -Dcom jira 1570 0.4 7.6 13667216 2556880 ? Sl Nov09 22:55 /var/atlassian/jira/jre//bin conflue+ 20440 1.1 9.0 13560908 3026800 ? Sl 01:17 14:54 /var/atlassian/confluence/jr conflue+ 20728 0.6 1.6 8908396 541156 ? Sl 01:19 8:12 /var/atlassian/confluence/jr atlbitb+ 25300 0.2 4.0 1683760 1365452 ? Sl 01:52 2:44 /var/jdk1.8.0_171/jre/bin/ja atlbitb+ 25322 1.0 6.0 2764100 2020532 ? Sl 01:52 13:28 /var/jdk1.8.0_171/jre/bin/ja |
And I found there are many postgres processes created by confluence. I estimate that confluence is maintaining a database connection pool.
Maybe there are some optimizing for each application.