Keynote
Lecture-2
The Robustness of
Resource Allocation in Computer Systems

H. J. Siegel
Performing computing and
communication tasks on parallel and distributed systems involves the
coordinated use of different types of machines, networks, interfaces, and other
resources. These resources should be allocated to the tasks in a way that
maximizes some system performance measure. However, decisions about how to
allocate resources are often based on estimated values of task and system
parameters. These estimated values are also used as a basis for predicting the
system performance that will result from a given resource allocation. The
actual values of these parameters may differ from the estimates for many
reasons; for example, the estimates may represent only average values, the
models used to generate the estimates may have limited accuracy, and there may
be changes in the environment. Thus, an important research problem is the
development of resource management strategies that can guarantee a particular
system performance given such uncertainties. To address this problem, we have
designed a methodology for deriving the degree of robustness of a resource
allocation — the maximum amount of collective uncertainty in system parameters
within which a user-specified level of system performance (QoS) can be
guaranteed. The foundation of this methodology is our mathematical formulation of
a metric that evaluates the robustness of a resource allocation. Our four-step
procedure for deriving a robustness metric for an
arbitrary system will be presented. We will illustrate this procedure by using
it to derive robustness metrics for some example distributed systems.
Furthermore, we will demonstrate the ability of the robustness metric to select
the most robust resource allocation from among those that otherwise perform
similarly (based on the primary performance criterion). These methods are applicable
to the allocation of resources for many different types of computing and
communication tasks in many types of environments, including parallel,
distributed, cluster, grid, Internet, embedded, wireless, and reconfigurable
systems. This research was performed jointly with Prof. Shoukat
Ali, Prof. Anthony A. Maciejewski, and Mr. Jong-Kook Kim.