Fault Tolerant Task Scheduling: A Comprehensive Approach to Optimized Resource Usage
pdf

Keywords

Fault Tolerance, Ant Colony Optimization (ACO), Load Balancing, Task Scheduling, Resource Optimization.

Abstract

Fault tolerance and load balancing present significant challenges in dynamic and heterogeneous cloud environments. This research ensures balanced workload to meet diverse resource demands, even in the presence of faulty nodes. The proposed algorithm leverages Ant Colony Optimization(ACO) for decentralized, probabilistic task assignment mechanism to dynamically distribute tasks based on both computational power and bandwidth of the Virtual Machines (VMs). By integrating fault tolerance mechanisms, the system detects and recovers from faults by reassigning tasks to healthy VMs without compromising the overall system performance. ACO's pheromone-based decision-making process enables effective task scheduling and rescheduling in the event of failures, while balancing the workload. The proposed fault-tolerant scheduling framework integrates migration-based fault tolerance methods to ensure resource optimization, high availability, reliability, and effective load balancing using Ant Colony Optimization (ACO). The study tested the framework by distributing 1,000 tasks with diverse lengths and file sizes across 50 VMs of various configurations, with random faults introduced at failure rates of 10% and 20%. A multi-objective ACO-based optimization algorithm was employed to assign tasks exclusively to healthy VMs, balancing two key resources: computing time and bandwidth, each weighted at 50%. The analysis recorded resource utilization for each VM. Results showed slight variations in resource usage among VMs, but the overall task completion time remained minimally affected by VM failures, attributed to the balanced multi-objective optimization strategy. The study's dual focus on resource utilization and minimal impact on task completion time, despite VM failures ensures reliable task scheduling and enhanced system resilience, setting a benchmark for managing dynamic and heterogeneous cloud systems.

pdf
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2024 African Journal of Biomedical Research