K8s Java

Dec 16, 2025 · 5 min read

en software featured · java spring K8s pods tuning

Overview

Does it seem easy to tune K8s pods for your Java Spring Boot application, right? Well it is not. Lets dive into it.

Java application was always resource hungry. For instance, I have never seen a Java application starting in 1 second amd consuming less than 64MB of RAM. PHP can do both of this thing foir very little web pages, and Python and nodejs too.

Even a small Java Microservice Spring-boot application has a huge infrastructure under the hood, with a lot fo components asking for threads (and RAM). If your applications provide a REST interface and use a database connection pooling, you are already spinning some threads in the background, just to manage the http connections.

So, when you have to configure the pod of your java microservices, there are some things to watch for. Let take an example with an explanation below

Threads

Below a tipical K8s resource snippet:

1resources:
2    requests:
3        cpu: 1000m
4        memory: "1024Mi"
5    limits:
6        cpu: 2000m      
7        memory: "1024Mi"

CPU is considered a “compressible” resource. If your app starts hitting your CPU limits, Kubernetes starts throttling your container. This means the CPU will be artificially restricted. On simple scenarios you should not specify a cpu limit, because if you service need a bit more cpu, it can borrow it from the spare cpus time in the pod, and K8s will not throttle it.

On the other side, on big clusters K8s admin could require you to specify limits to be sure the worker nodes are stable. Also specifying CPU limits is a good idea. Lets follow this route and undertand how to set the cpu limit. The cpu limits is more a cpu-per-second limit: every 100ms K8s will check if your application has already consumed all its cpu credit. If so, the application will be paused for the rest of the second. This is the explanation of “throttling”

This can lead to application slow down. The problem become critical if you have too much thread in respect of your cpu limit.

Java Threads everywhere: how to tune Executors

A spring Boot application spins a lot fo threads for different needs:

By web container to serving REST calls
Various spring modules can need some ‘daemons’: for instance for
- cache housekeeping
- @Async annotations
- Web Socket management (consumer/produce threads)
- JMS management (similar to previous)
- etc

Often many of those threads spend a lot of time actually not using the CPU. Base on our experience, 2 CPU per second is a good starting point limit if your service does something not trivial. As fist step refrain from tuning the executor pool threads. Limit your will to the ActiveProcessorCount JVM value (see below), which can change the number of processors the JVM thinks to have. The reason is simple: you can have a worker node with 4 CPU, but you can have a lower/higher cpu-per-second limit and if you just tune the limit and the ActiveProcessorCount you have less things to manage.

Summarizing:

On average a Spring-boot Java application needs at least 2 cpu to start with. You can decrease this number but only for true tiny services (more below)
If possible do not specify a cpu limit: the pod will use all shared cpu on the node, and on average, this will enable it to elastically increase its performance BUT if you specify the limit for every pod, it is a lot easier to guarantee stability and you will be on the safe side. Do it for very mission criticla 24x7 services (like ‘Trading on line’ and… ‘Netflix billing services’ :)
Measure the performance in a consistent way. If you want to try to increase parallelism, try with more replicas. If you think more replicas are a big deal and/or you cannot do it, try to tune the ActiveProcessorCount (below I show you some examples).

Memory

Memory is linked to thread count. Every thread consume a lot of memory (with the exception of Virtual threads, but lets focus on simpler scenarios).

Set your maximum heap size to 75-80% of the memory available in the container, with the -XX:MaxRAMPercentage directive.

A value of 80% can be too big for small containers, and can be too small for very big containers, but on avarage it is a good starting point. For instance, on a 4096MB (4GB) limit, the 100-80= 20% is around 820MB which should be enough for Linux O.S. +Java VM extra memory needs.

Reduce it to 70% if your container is less than 4GB.

Launch lines

Okey, so what? Below some tiny suggestions

For standard containers (i.e. 2-4GB RAM, 2+ cpu)

1    java -XX:MaxRAMPercentage=80 -XX:+UseParallelGC \
2            -XX:ActiveProcessorCount=<2x yourCpuLimit> -verbose:gc \
3            macro-service.jar

Very small container

If you have very small container, with little RAM and less than 2 CPU you should tune the parameters like

1    java -XX:MaxRAMPercentage=70 -XX:+UseSerialGC -verbose:gc tiny-service.jar

The SerialGC does not use threads, and so require less cpu resources to work.

The verbose:gc is a tiny trick to get in the log a summary of GC work without filling your logs.

Last but not least, take a look at the latencies every programmer should know which is a companion for this article.

References

This article is a folow up of /2023/k8s-resource-limits/ but with more focus on Java.
This post https://pretius.com/blog/jvm-kubernetes was invaluable for writing the core of this article.
Look at https://www.baeldung.com/java-jvm-parameters-rampercentage if you want to deep dive in the percentage JVM parameters