In my experience, before any JVM parameters tuning, I have spent lots of time to identify and remove thread contention points in the application. My development environment is 4 core Linux machine, JDK 1.6, JBoss, and required SLA less than 100ms. There are 600 plus different JVM parameters. This article is about tuning GC for low latency for server side Java applications, and the focus will be on those parameters that have bigger impact on the achieving low latency & high throughput.
$ /usr/java/jdk1.6.0_29/bin/java -XX:+PrintFlagsFinal -version
The Java heap is divided into three main sections: Young Generation, Old Generation and the Permanent Generation.
Young Generation: The Eden Space of the Young Generation holds all the newly created objects. When this section fills, the Scavenge Garbage Collector clears out of memory all objects that are unreferenced. Objects that survive this scavenge moved to the "From" Survivor Space. The Survivor Space is a section of the Young Generation for these intermediate‐life objects. It has two equally‐sized subspaces "To" and “From” which are used by its algorithm for fast switching and cleanup. Once the Scavange GC is complete, the pointers on the two spaces are reversed: "To" becomes "From" and "From" becomes "To".
Old Generation: Once an object survives a given number of Scavenge GCs, it is promoted (or tenured) from the "To" Space to the Old Generation. Objects in this space are never garbage collected except in the two cases:
Full Garbage Collection or Concurrent Mark‐and‐Sweep Garbage Collection. If the Old Generation is full
and there is no way for the heap to expand, an Out‐of‐Memory error (OOME) is thrown and the JVM will
crash.
Permanent Generation: The Permanent Generation is where class files are kept. These are the result of compiled classes and jsp
pages. If this space is full, it triggers a Full Garbage Collection. If the Full Garbage Collection cannot clean
out old unreferenced classes and there is no room left to expand the Permanent Space, an Out‐of‐
Memory error (OOME) is thrown and the JVM will crash.
HotSpot JVM may use one of 6 combinations of garbage collectors listed below.
A List of Stop the World Pauses:
In my experiment, CMS gives the best results for low latency & high throughput. Here is summary of what I have learned.
HotSpot JVM may use one of 6 combinations of garbage collectors listed below.
Young collector
|
Old collector
|
JVM option
|
Serial (DefNew)
|
Serial Mark-Sweep-Compact
|
-XX:+UseSerialGC
|
Parallel scavenge (PSYoungGen)
|
Serial Mark-Sweep-Compact (PSOldGen)
|
-XX:+UseParallelGC
|
Parallel scavenge (PSYoungGen)
|
Parallel Mark-Sweep-Compact (ParOldGen)
|
-XX:+UseParallelOldGC
|
Serial (DefNew)
|
Concurrent Mark Sweep
|
-XX:+UseConcMarkSweepGC
-XX:-UseParNewGC |
Parallel (ParNew)
|
Concurrent Mark Sweep
|
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC |
G1
|
-XX:+UseG1GC
|
A List of Stop the World Pauses:
- Young space collections
- Full GCs – All collectors
- System GCs – Called via JMX or the application
- CMS Initial Mark Phase
- CMS Remark Phase
- CMS Concurrent Mode Failure
In my experiment, CMS gives the best results for low latency & high throughput. Here is summary of what I have learned.
- JVM tuning is application specific. In depth knowledge of the application will help. And one needs to take a holistic when tuning.
- Young Collections are fast and efficient. It is important to give objects the opportunity to die young. Smaller Young Space helps.
- CMS is concurrent and requires CPU and it will compete with the application during collections.
- CMS fragments the Old Space and it makes Object allocations are more complicated.
- Sizing the heap correctly is critical. Undersized heaps will make CMS work overtime, and worse it would cause CMS Concurrent Mode Failure.
- Sizing the young ratio is important: 1) Size the survivor spaces appropriately 2) Configure the Tenuring Threshold appropriately
- CMS to wait for a Young GC before starting.
Here is the list of the recommended JVM settings for low latency & high throughput:
-server
-Xms2048m
-Xmx2048m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+AggressiveOpts
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=65
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=19
-XX:NewSize=128m
-XX:MaxNewSize=128m
-XX:PermSize=64m
-XX:MaxPermSize=64m
-XX:SurvivorRatio=88
-XX:TargetSurvivorRatio=88
-XX:MaxTenuringThreshold=15
-XX:MaxGCMinorPauseMillis=1
-XX:MaxGCPauseMillis=5
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./gc_heap_dump/
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-Xloggc:./gc_log.log
Compare with 600 plus parameters to play with, this is a much shorter list. I hope you like this post, and enjoy your journey.