Keynote Lecture-2

 

The Robustness of Resource Allocation in Computer Systems

 

 

 

H. J. Siegel

Colorado State University

 

Abstract

 

Performing computing and communication tasks on parallel and distributed systems involves the coordinated use of different types of machines, networks, interfaces, and other resources. These resources should be allocated to the tasks in a way that maximizes some system performance measure. However, decisions about how to allocate resources are often based on estimated values of task and system parameters. These estimated values are also used as a basis for predicting the system performance that will result from a given resource allocation. The actual values of these parameters may differ from the estimates for many reasons; for example, the estimates may represent only average values, the models used to generate the estimates may have limited accuracy, and there may be changes in the environment. Thus, an important research problem is the development of resource management strategies that can guarantee a particular system performance given such uncertainties. To address this problem, we have designed a methodology for deriving the degree of robustness of a resource allocation — the maximum amount of collective uncertainty in system parameters within which a user-specified level of system performance (QoS) can be guaranteed. The foundation of this methodology is our mathematical formulation of a metric that evaluates the robustness of a resource allocation. Our four-step procedure for deriving a robustness metric for an arbitrary system will be presented. We will illustrate this procedure by using it to derive robustness metrics for some example distributed systems. Furthermore, we will demonstrate the ability of the robustness metric to select the most robust resource allocation from among those that otherwise perform similarly (based on the primary performance criterion). These methods are applicable to the allocation of resources for many different types of computing and communication tasks in many types of environments, including parallel, distributed, cluster, grid, Internet, embedded, wireless, and reconfigurable systems. This research was performed jointly with Prof. Shoukat Ali, Prof. Anthony A. Maciejewski, and Mr. Jong-Kook Kim.