The Kubernetes native HPA on the other hand does not have predictive analytics capability nor ability to fuse various metrics intelligently. At best, it can recommend maximum number of replicas among the recommended replicas determined by various configured metrics. On the other hand, KEDA triggers autoscaling based on KPI such as lag offset to increase or reduce the number of replicas. In both cases, it is similar to watching in rear mirror since the lag offset is the difference between the production and consumption rates in the past.
In contrast with the Kubernetes native HPA with consumer lag offset metrics results, Federator.ai’s consumer replicas and consumption rates can nicely track the production rate due to its predictive capability of the future workloads. The sharp oscillation of consumer replicas in the tests using native HPA with consumer lag offset can be explained as follows: if the target lag offset is set to too low (e.g., 1,000), any sudden large increase of consumer lag offset (e.g., 2,000) will lead to a large number of desired replicas (e.g., 2 times the original number of replicas) and likely over provisioned in respect to the workloads in our test results, which can happen when the production rates increase beyond the consumer pods processing capability. Once over provisioned, a similar workload will produce very low lag offset (e.g., 100 or less), the number of desired replicas will become much less (e.g., 1/10 or less of the original number of replicas). It’s also clear that the target lag offsets cannot set to too high, which will render an unacceptable consumer lags. It’s possible to remedy this situation with the cool down and upscale delay of the HPA configurations. However, it remains difficult to determine what’s the best values to configure them without a real production environment.