Nov 13, 2008

Spring conference in Paris, part 2

[Updated on 2009/01/09] Videos are available here.

Part 1 here.

Here's the transcript of Mark Thomas' presentation on Tomcat optimization & performance tuning.

The process
  • Understand the system architecture
  • Stabilize the system (no one should be messing with it while investigation is in progress)
  • Set performance targets
    • All requests under 3 seconds ?
    • This type of request under 3 seconds, this one under 10 ?
    • 90% of requests under 3 seconds, 100% under 10 ?
  • Measure current performance
  • Identify the current bottleneck
  • Fix the root cause of the bottleneck, not the symptom, e.g. if your systems runs out of memory, don't add extra memory : find where the memory consumption occurs
  • Repeat the whole process until you meet the performance target
Common errors
  • Optimizing code that doesn't need it
  • Insufficient testing
    • realistic data volumes
    • realistic user load
  • Lack of clear performance targets
  • Guessing where the bottleneck is
  • Fixing the symptom rather than the cause
Tuning options
  • Applications typically account for over 80% of a request processing time, so look at your application first.
  • Tomcat logs are also a good place to start, as their default configuration is too generic
    • catch-all logger writes both to file and to stdout... but Linux redirects stdout to a file.
    • The catch-all file has no overflow protection. It will just grow and grow.
    • Logging is synchronous, which shouldn't be a problem for local disks but it does add overhead if logs are written through the network (NAS).
    • Solutions :
      • Remove logging to stdout, i.e. remove java.util.logging.ConsoleHandler from the handlers list in the file
      • Add log rotation to the catch-all logger
      • Asynchronous logging has not been implemented yet...
  • First, you need to understand your application usage patterns
    • One request every now and then ?
    • Short bursts of requests ?
    • One request every 3 seconds ?
  • TCP/HTTP/SSL connections
    • TCP connection setup is expensive, especially over WANs with high latency
      • HTTP keep-alives allow a TCP connection to be kept open and reused for other HTTP requests
    • SSL connection setup is very expensive
      • HTTP keep-alives are a must if SSL is heavily used
  • 3 connectors are available
    • Java Blocking I/O (BIO)
      • oldest one
      • most stable
      • JSSE-based SSL implementation : very slow
    • Java Non-Blocking I/O
      • JSSE-based SSL implementation : very slow
    • Native (APR)
  • Picking the right connector for a given requirement (from best to worst)
    • Stability : BIO, APR, NIO
    • SSL : APR, NIO, BIO
    • Low concurrency : BIO, APR, NIO
    • High concurrency, no keep-alives : BIO, APR, NIO
    • High concurrency, keep-alives : APR, NIO, BIO
  • Why use NIO at all ? It never comes first !
    • APR is unstable on Solaris
    • NIO is a pure Java solution
    • Switching between BIO and NIO is straightforward
      • Same configuration files
      • Same certificates
  • maxThreads
    • Maximum number of threads servicing HTTP requests
      • For BIO, this value really is the maximum number of concurrent client requests
    • Typical value : 200-800
    • 400 is a good starting point, which may be adjusted depending on CPU load
  • maxKeepAliveRequests
    • Maximum number of concurrent HTTP requests per TCP connection
      • 1 : no keep-alives
      • Typical value is 100
  • connectionTimeout
    • Typical value : 3000 ms
    • Also used for keep-alive timeout
    • Increase for :
      • slow clients, such as mobile phones
      • layer 7 load balancer with keep-alives
    • Decrease for faster timeouts (!)
  • Content cache
    • element
      • cacheMaxSize : typical value is 10240 Kb
      • cacheTTL : typical value is 5 s
    • NIO/APR can use sendfile for large static files
      • OS “bypass”
      • File sent by a different thread, which doesn't tie up a HTTP request processing thread
JVM tuning
  • Memory
    • Xms / Xmx flags
      • Used to define the size of the Java heap
      • Aim to set as low as possible
      • Setting the heap too high wastes memory and can cause long GC pauses
    • XX:NewSize / XX:NewRatio
      • Set to 25/33% of total Java heap
      • Setting the ratio too high/low leads to inefficient GC
  • Garbage collection
    • GC pauses the application
    • From milliseconds to seconds
    • XX:MaxGCPauseMillis / XX:MaxGCMinorPauseMillis
      • these set goals, which are not guaranteed
      • they lead to more frequent, shorter pauses
  • More on
Load balancing
  • Basic configuration : 1 httpd, 2 Tomcat instances, mod-proxy-http or mod-jk
  • Stateless requests
    • they are routed purely according to load balancing algorithm
    • this doesn't allow HTTP sessions
  • HTTP sessions
    • sticky sessions must be setup, i.e. all HTTP requests from a single session must go to the same Tomcat instance
    • this is performed by appending a session cookie to the HTTP requests, which will be tracked by the load balancer
  • Session replication between Tomcat instances can be added through clustering
  • Replication is asynchronous by default (done once the answer has been sent back to the client
  • Single line configuration by default.
  • Additional configuration needed for real-life production
    • Automatic discovery of new Tomcat instances (through IP multicast)
    • Synchronous replication (watch out for the performance impact)
    • Session replication to a specific node (instead of all nodes)
Hints & tips
  • Use a minimum of 3 Tomcat instances
  • Test your application with load balancing and clustering before going to production (it may not behave the same)
  • Redeployment can cause memory leaks
    • Include it in your testing
    • Safer option : do a stop/start upgrade on each individual Tomcat instance in your clustering

No comments:

Post a Comment