Monday, October 15, 2012

JVM Tuning for Low Latency High Throughput on Multi Core Linux Box

Recently, I have been tuning JVM parameters to achieve low latency & high throughput on HotSpot VM. For server side Java application, low latency is a very important requirement. In many cases, Java garbage collector pause is the serious threat. Under a very strict SLA, let's say response time under 100ms or less, it requires good amount of engineering effort to make the application behave under high stress environment .  

In my experience, before any JVM parameters tuning, I have spent lots of time to identify and remove thread contention points in the application. My development environment is 4 core Linux machine, JDK 1.6,  JBoss, and required SLA less than 100ms. There are 600 plus different JVM parameters. This article is about tuning GC for low latency for server side Java  applications, and the focus will be on those parameters that have bigger impact on the achieving low latency & high throughput.


    
$ /usr/java/jdk1.6.0_29/bin/java -XX:+PrintFlagsFinal -version 
     


The Java heap is divided into three main sections: Young Generation, Old Generation and the Permanent Generation.











Young Generation: The Eden Space of the Young Generation holds all the newly created objects. When this section fills, the Scavenge Garbage Collector clears out of memory all objects that are unreferenced. Objects that survive this scavenge moved to the "From" Survivor Space. The Survivor Space is a section of the Young Generation for these intermediate‐life objects. It has two equally‐sized subspaces "To" and “From” which are used by its algorithm for fast switching and cleanup. Once the Scavange GC is complete, the pointers on the two spaces are reversed: "To" becomes "From" and "From" becomes "To".


Old Generation: Once an object survives a given number of Scavenge GCs, it is promoted (or tenured) from the "To" Space to the Old Generation. Objects in this space are never garbage collected except in the two cases: Full Garbage Collection or Concurrent Mark‐and‐Sweep Garbage Collection. If the Old Generation is full and there is no way for the heap to expand, an Out‐of‐Memory error (OOME) is thrown and the JVM will crash. 


Permanent Generation: The Permanent Generation is where class files are kept. These are the result of compiled classes and jsp pages. If this space is full, it triggers a Full Garbage Collection. If the Full Garbage Collection cannot clean out old unreferenced classes and there is no room left to expand the Permanent Space, an Out‐of‐ Memory error (OOME) is thrown and the JVM will crash. 



HotSpot JVM may use one of 6 combinations of garbage collectors listed below.


Young collector
Old collector
JVM option
Serial (DefNew)
Serial Mark-Sweep-Compact
-XX:+UseSerialGC
Parallel scavenge (PSYoungGen)
Serial Mark-Sweep-Compact (PSOldGen)
-XX:+UseParallelGC
Parallel scavenge (PSYoungGen)
Parallel Mark-Sweep-Compact (ParOldGen)
-XX:+UseParallelOldGC
Serial (DefNew)
Concurrent Mark Sweep
-XX:+UseConcMarkSweepGC
-XX:-UseParNewGC
Parallel (ParNew)
Concurrent Mark Sweep
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
G1
-XX:+UseG1GC




A List of Stop the World Pauses:

  • Young space collections 
  • Full GCs – All collectors 
  • System GCs – Called via JMX or the application 
  • CMS Initial Mark Phase 
  • CMS Remark Phase 
  • CMS Concurrent Mode Failure



In my experiment, CMS gives the best results for low latency & high throughput. Here is summary of what I have learned. 

  1. JVM tuning is application specific. In depth knowledge of the application will help. And one needs to take a holistic when tuning. 
  2. Young Collections are fast and efficient. It is important to give objects the opportunity to die young. Smaller Young Space helps. 
  3. CMS is concurrent and requires CPU and it will compete with the application during collections.
  4. CMS fragments the Old Space and it makes Object allocations are more complicated. 
  5. Sizing the heap correctly is critical. Undersized heaps will make CMS work overtime, and worse it would cause CMS Concurrent Mode Failure. 
  6. Sizing the young ratio is important: 1) Size the survivor spaces appropriately 2) Configure the Tenuring Threshold appropriately  
  7. CMS to wait for a Young GC before starting. 

Here is the list of the recommended JVM settings for low latency & high throughput:


-server
-Xms2048m 
-Xmx2048m 
-XX:+UseConcMarkSweepGC 
-XX:+UseParNewGC
-XX:+AggressiveOpts
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=65
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=19
-XX:NewSize=128m
-XX:MaxNewSize=128m
-XX:PermSize=64m
-XX:MaxPermSize=64m
-XX:SurvivorRatio=88
-XX:TargetSurvivorRatio=88
-XX:MaxTenuringThreshold=15
-XX:MaxGCMinorPauseMillis=1
-XX:MaxGCPauseMillis=5
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./gc_heap_dump/
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-Xloggc:./gc_log.log




Compare with 600 plus parameters to play with, this is a much shorter list. I hope you like this post, and enjoy your journey.