Paper Title
Fault Tolerance In Grid – An Overview

Abstract
Grid computing has emerged as a distributed methodology that coordinates the resources that are spread in the heterogeneous distributed environment. The resources can be categorized as computational resources and storage resources A grid is composed of a collection of heterogeneous systems such as workstations, servers, computers that allows access to computing power, data sharing, memory use, software applications, hardware peripherals, etc. Grid scheduling is a software framework with which the scheduler collects resource state information, selects appropriate resources, predicts the potential performance for each candidate schedule, and determines the best schedule for the applications to be executed on a Grid system subject to some performance goals. A scheduler is the mediate resource manager as the interface between the consumers and the underlying resources. The probability of a failure is much greater than in traditional parallel computing and the failure of resources affects job execution fatally .It is therefore necessary to investigate the application of fault tolerant techniques for Grid. Fault tolerance is an important property in Grid computing as the dependability of individual Grid resources may not be able to be guaranteed; also as resources are used outside of organizational boundaries, it becomes increasingly difficult to guarantee that a resource being used is not malicious in some way.