

Use Case Descriptions
Requirements and Challenges
Geophysical data is processed in parallel across all chosen servers in the data center. 100s or 1000s of nodes need to work uninterrupted. Even one node failure can result in reload of partial or all task jobs. Hardware failures, especially disks, are unavoidable at large scale, high-density clusters, due to their intensive data access during computation. To minimize disruptions by hardware failure, the company can only rely on new and abundant hardware to process the jobs. The selection criteria result in more than 30% waste in hardware utilization.

Solution Benefits
With Federator.ai® disk failure prediction, the HPC data center reliably selects qualified hardware for any jobs without delay of service deliveries. Integrating with task schedulers prevents loading jobs on risky nodes before a task starts, guaranteeing the health of the entire cluster during the job lifespan. Data center operators perform hardware maintenance in between jobs to prepare servers for coming tasks. Federator.ai® also keeps performance metrics of any hosts and disks, which can be used to track unusual performance patterns at any point in time.
1.
Shorten data processing time by more than 30% by eliminating task reloading
2.
Leverage aged hardware by having accurate disk data predictions. No more swapping out aged, but healthy hardware
3.
Save money by reducing redundancy, so that other nodes can be used for active production tasks
4.
Simplify hardware management and maintenance by transforming unexpected failures, into planned events