Electronic Theses and Dissertations

Date of Award

2014

Document Type

Dissertation

Degree Name

Ph.D. in Engineering Science

Department

Computer and Information Science

First Advisor

Philip J. Rhodes

Second Advisor

Greg Easson

Third Advisor

Byunghyun Jang

Relational Format

dissertation/thesis

Abstract

The size of spatial scientific datasets is steadily increasing due to improvements in instruments and availability of computational resources. Scientific datasets today are often far too large to fit into a single machine's memory or even a single disk. However, much of the research on efficient storage and access to spatial datasets has focused on large multidimensional arrays. In contrast, unstructured grids consisting of collections of implices (e.g. triangles or tetrahedra) present special challenges that have received less attention. Data values found at the vertices of the simplices may be dispersed throughout a datafile, producing especially poor disk locality. Partitioning multidimensional arrays across several machines or disks has become increasingly necessary. However, relatively little work has been done for unstructured grids. We address this important problem of poor locality in two major ways. First, we reorganize the unstructured grid to improve locality in both the dataset space and in the data file on disk using a specialized chunking approach that maintains the spatial neighborhood relationships inherent in the unstructured grid. This reorganization produces significant gains in performance by reducing the number of accesses made to the data file. We examine the effects of different chunking configurations on data retrieval performance. A major motivation for reorganizing the unstructured grid is to allow the application of iteration aware prefetching. Second, we describe a prefetching method that takes advantage of prior knowledge of the user's access pattern. Applying this prefetching method to unstructured grids produces further performance gains over and above the gains seen from reorganization alone. In addressing the poor locality, we investigated partitioning unstructured grids at the disk level and its effect on overall system performance. We build upon this and investigate the effect of an in-core partitioning performed on top of the existing disk level partitioning. We also examine the performance benefits of declustering unstructured grids across several disks. Given this declustered dataset, we describe and explore a parallel data retrieval method that takes advantage of prior knowledge of a user access pattern. Our test results demonstrate very significant performance gains. Lastly, we present guidelines for choosing effective partitionings of datasets when the access pattern is known in advance.

Concentration/Emphasis

Emphasis: Computer Science

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.