Predicting Node Failure in Cloud Service Systems, ESEC/FSE 2018.
Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, Murali Chintalapati, and Dongmei Zhang. ESEC/FSE 2018.
A cloud service system typically contains a large number of computing nodes. In reality, nodes may fail and affect service availability. In this paper, we propose a failure prediction technique, which can predict the failure-proneness of a node in a cloud service system based on historical data, before node failure actually happens. The ability to predict faulty nodes enables the allocation and migration of virtual machines to the healthy nodes, therefore improving service availability.