- Kubernetes HPA combines recommendations (calculating the desired replicas) and executions (adjusting the number of replicas) to set the number of replicas by the HPA controller. However, we have seen more and more operator-based applications in a Kubernetes cluster. Kubernetes HPA is not suitable to auto-scale operator-based applications. And users may need only recommendations and run customized executions separately.
- If metrics are not chosen appropriately to calculate desired replicas, adverse effects on performance might happen. Users need to take extra care to find a proper metric by trial and error.
In this article, we would like to show that the Native Kubernetes HPA algorithm (K8sHPA mechanism) results in modest saving and much larger lags (latency). Federator.ai from ProphetStor uses Machine Learning technologies to predict and analyze the Kafka workload, and then Federator.ai recommender recommends the number of consumers, considering the benefit and cost with the adjustment. We can achieve much better performance (reduced latency) and use much fewer resources (reduced consumers in Kafka), all without changing the K8sHPA mechanism or a line of code of Kafka. Therefore, users can exploit Federator.ai’s recommendations and customize their executions more flexibly.
In general, a topic accumulates lags at the beginning stage when producers send messages. The consumers then reduce the lags in a topic. Kubernetes HPA periodically checks the value of lags and determines the number of consumers in a monitored consumer group according to the equation (1). It may scale up or down the number of consumers drastically based ONLY on the observed currentMetricValue. However, when adding or deleting consumers in the group, the cluster will start to rebalance and re-assign the topic’s partitions for consumers . During a rebalance, consumers cannot consume messages, and some partitions may be moved from one consumer to another.
After the rebalance, each consumer may be assigned a new set of partitions. If the committed offset in the new partitions is smaller than the offset of the latest messages that the client processed, the messages between the last processed offset and the committed offset will be processed twice. If the committed offset in the new partitions is larger than the offset of the latest messages that the client processed, all messages between the last processed offset and the committed offset will be missing. Consumers need to take additional time to handle the above issues. The additional time and resources needed are called auto-scaling cost.
The Kubernetes HPA controller determines the number of consumers based on the current value of lags without considering the auto-scaling cost. It may increase many consumers at the next time interval, but these new consumers may only be created and come to be effective 30 seconds later, due to the rebalance. The fluctuation in creating/deleting consumers might not be effective and will result in added lags (queue length), which is not desirable for the operation of the Kafka application.
How ProphetStor’s Federator.ai Helps
Users can use Kafka consumer API  or Kafka client tools to make up a consumer group by using a deployment in a Kubernetes cluster . Each consumer in the consumer group can be connected with external services, such as MySQL or Elasticsearch, by custom Kafka connectors . We have devised Federator.ai cost functions to recommend the best auto- scaling interval to reduce the auto-scaling cost. In addition, Federator.ai recommends the best number of consumers according to the total of the benefits of HPA and cost functions during the execution so that the system manager can focus on what they do best and enjoy much-reduced cost with improved performance.
. “Autoscaling Kafka Streams applications with Kubernetes,” https://blog.softwaremill.com/autoscaling-kafka-streams-applications-with-kubernetes- 9aed2e37d3a0
. “Kubernetes HPA Autoscaling with Kafka metrics,” https://medium.com/google- cloud/kubernetes-hpa-autoscaling-with-kafka-metrics-88a671497f07
. “Kafka At Scale in the Cloud,” https://www.slideshare.net/ConfluentInc/kafka-at-scale- in-the-cloud
. “Kafka: The Definitive Guide,” https://www.oreilly.com/library/view/kafka-the- definitive/9781491936153/ch04.html
. “Documentation,” https://kafka.apache.org/documentation/
. “Apache Kafka Helm Chart,” https://github.com/helm/charts/tree/master/incubator/kafka
. “Strimi, Run Apache Kafka on Kubernetes and OpenShift,” https://github.com/strimzi/strimzi-kafka-operator
. “Setup Kafka with Debezium using Strimzi in Kubernetes,” https://medium.com/@sincysebastian/setup-kafka-with-debezium-using-strimzi-in- kubernetes-efd494642585
. “Apache Kafka® on Kubernetes®,” https://blog.kubernauts.io/apache-kafka-on- kubernetes-4425e18daba5
. “Alibaba Trace,” https://github.com/alibaba/clusterdata, 2017.