where desiredMetricValue can be a targetAverageValue or targetAverageUtilization. Kubernetes 1.6 adds support for scaling applications based on multiple metrics and custom metrics. Take Kafka as an example, targetAverageValue can be 1,000 lags in a topic for a consumer group. The targetAverageValue is based on users’ experience. However, according to the scale of Kafka in Netflix, it should be able to manage about 4,000 brokers and process 700 billion unique events per day. During a rebalance, consumers cannot consume messages, and some partitions may be moved from one consumer to another.
After the rebalance, each consumer may be assigned a new set of partitions. If the committed offset in the new partitions is smaller than the offset of the latest messages that the client processed, the messages between the last processed offset and the committed offset will be processed twice. If the committed offset in the new partitions is larger than the offset of the latest messages that the client processed, all messages between the last processed offset and the committed offset will be missing. Consumers need to take additional time to handle the above issues. The additional time and resources needed are called auto-scaling cost.
The Kubernetes HPA controller determines the number of consumers based on the current value of lags without considering the auto-scaling cost. It may increase many consumers at the next time interval, but these new consumers may only be created and come to be effective 30 seconds later, due to the rebalance. The fluctuation in creating/deleting consumers might not be effective and will result in added lags (queue length), which is not desirable for the operation of the Kafka application.