A Design and Implementation of Cluster Heartbeat Network for Efficient Fault Detection
Keywords:
Hearbeat Network, Fault Detection, High Availability, Concurrency,Abstract
To achieve fault tolerance in a server cluster, fault detection capability is a primary prerequisite. Efficient fault detection is prompt, correct and complete. This paper revisited the technique called Reactive Failure Detection (RFD) that dynamically predicts a heartbeat delay from a cluster node. We also identified the requirements to deploy RFD in actual servers. A new cluster heartbeat network with concurrency is proposed to use push and pull interaction during live monitoring and determining node’s status. The prototype of the new model is tested on a platform running multiple independent web applications and analyzed for its implementation and design correctness.References
Falai, L. and Bondavalli, A. (2005), “Experimental Evaluation of the QoS of Failure Detectors on Wide Area Network,” International Conference on Dependable Systems and Networks (DSN’05), pp. 624–633.
Fu, S. (2010), “Failure-Aware Resource Management for HighAvailability Computing Clusters with Distributed Virtual
Machines,” Journal of Parallel and Distributed Computing, vol. 70, pp.
–393.
Kaur, A. and Verma, S. (2015), “Performance Measurement and Analysis of High-Availability Clusters,” SIGSOFT Softw. Eng. Notes, vol. 40, pp. 1–7.
M. Noor, A. S. and M. Deris, M. (2012), “Fail-stop Failure Recovery in Neighbor Replica Environment,” Procedia Computer Science,
vol. 19, pp. 1040–1045.
Noor, A.S.M., Deris, M.M. (2010), “Failure recovery mechanism in neighbor replica distribution architecture” Lecture Notes in Computer Lecture Notes in Bioinformatics), 6377 LNCS (M4D), pp. 41-48. Springer Verlag
Mamat, R., M. Deris, M., and Jalil, M. (2004), “Neighbor Replica Distribution Technique for Cluster Server Systems,” Malaysian Journal of Computer Science, vol. 17, pp. 11–20.
Mitchell, M., Oldham, J., Samuel, A. (2001), Advanced Linux
Programming, pp. 45-60, 95-129, Indiana USA, New Riders Publisher.
Schmidt, K. (2006), “High Availability and Disaster Recovery Concepts, Design, Implementation”. Berlin London: Springer
Shi, L., Yang, S. and Zhang, Q. (2010), “Research and Analysis of Adaptive Failure Detection Algorithm,” 3rd International Symposium on Computer Science and Computational Technology, pp. 21–24, Academy Publisher.
Zakaria, A., Awang, W., Mohamad, Z., Rose, A., and M. Deris, M. (2010), “Improving Response Time, Availability and Reliability Through Asynchronous Replication Technique in Cluster Architecture of Web Server Cluster,” in Database Theory and Application, Communications in Computer and Information Science, vol. 118, pp 29-36, Springer
Noor, A.S.M., Deris, M.M. (2009), “Extended heartbeat mechanism for fault detection service methodology” Communications in Computer and Information Science, 63, pp. 88-95. Springer Verlag
Matsudaira, K. “Scalable Web Architecture and Distributed Systems” Architecture of Open Source Applications.
http://www.aosabook.org/en/distsys.html. Accessed on 22 January 2015.
Khan, F. G., Qureshi, K., and Nazir, B. (2010), “Performance Evaluation of Fault Tolerance Techniques in Grid Computing System,” Computers & Electrical Engineering, vol. 36, pp. 1110–1122
Butenhof, David R.(1997) “Programming with POSIX threads.” Addison Wesley Professional.
Lea, Douglas (2000) “Concurrent programming in Java: design principles and patterns”. Addison-Wesley Professional
Downloads
Published
How to Cite
Issue
Section
License
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.