Paper Title
Fault Tolerance In Grid – An Overview
Abstract
Grid computing has emerged as a distributed methodology that coordinates the resources that are spread in the
heterogeneous distributed environment. The resources can be categorized as computational resources and storage resources
A grid is composed of a collection of heterogeneous systems such as workstations, servers, computers that allows access to
computing power, data sharing, memory use, software applications, hardware peripherals, etc. Grid scheduling is a software
framework with which the scheduler collects resource state information, selects appropriate resources, predicts the potential
performance for each candidate schedule, and determines the best schedule for the applications to be executed on a Grid
system subject to some performance goals. A scheduler is the mediate resource manager as the interface between the
consumers and the underlying resources. The probability of a failure is much greater than in traditional parallel computing
and the failure of resources affects job execution fatally .It is therefore necessary to investigate the application of fault
tolerant techniques for Grid. Fault tolerance is an important property in Grid computing as the dependability of individual
Grid resources may not be able to be guaranteed; also as resources are used outside of organizational boundaries, it becomes
increasingly difficult to guarantee that a resource being used is not malicious in some way.