Document Type

Dissertation

Date of Award

Spring 5-31-2008

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Alexander Thomasian

Second Advisor

David Nassimi

Third Advisor

Ali Mili

Fourth Advisor

Cristian Borcea

Fifth Advisor

Jian Yang

Abstract

There has been an explosion in the amount of generated data, which has to be stored reliably because it is not easily reproducible. Some datasets require frequent read and write access. like online transaction processing applications. Others just need to be stored safely and read once in a while, as in data mining. This different access requirements can be solved by using the RAID (redundant array of inexpensive disks) paradigm. i.e., RAIDi for the first situation and RAID5 for the second situation. Furthermore rather than providing two disk arrays with RAID 1 and RAID5 capabilities, a controller can be postulated to emulate both. It is referred as a heterogeneous disk array (HDA).

Dedicating a subset of disks to RAID 1 results in poor disk utilization, since RAIDi vs RAID5 capacity and bandwidth requirements are not known a priori. Balancing disk loads when disk space is shared among allocation requests, referred to as virtual arrays - VAs poses a difficult problem. RAIDi disk arrays have a higher access rate per gigabyte than RAID5 disk arrays. Allocating more VAs while keeping disk utilizations balanced and within acceptable bounds is the goal of this study.

Given its size and access rate a VA's width or the number of its Virtual Disks -VDs is determined. VDs allocations on physical disks using vector-packing heuristics, with disk capacity and bandwidth as the two dimensions are shown to be the best. An allocation is acceptable if it does riot exceed the disk capacity and overload disks even in the presence of disk failures. When disk bandwidth rather than capacity is the bottleneck, the clustered RAID paradigm is applied, which offers a tradeoff between disk space and bandwidth.

Another scenario is also considered where the RAID level is determined by a classification algorithm utilizing the access characteristics of the VA, i.e., fractions of small versus large access and the fraction of write versus read accesses.

The effect of RAID 1 organization on its reliability and performance is studied too. The effect of disk failures on the X-code two disk failure tolerant array is analyzed and it is shown that the load across disks is highly unbalanced unless in an NxN array groups of N stripes are randomly rotated.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.