Reducing Cache Contention On GPUs

Kyoshin Choo, University of Mississippi

Date of Award

2016

Document Type

Dissertation

Degree Name

Ph.D. in Engineering Science

Department

Computer and Information Science

First Advisor

Byunghyun Jang

Second Advisor

Robert J. Doerksen

Third Advisor

Feng Wang

Relational Format

dissertation/thesis

Abstract

The usage of Graphics Processing Units (GPUs) as an application accelerator has become increasingly popular because, compared to traditional CPUs, they are more cost-effective, their highly parallel nature complements a CPU, and they are more energy efficient. With the popularity of GPUs, many GPU-based compute-intensive applications (a.k.a., GPGPUs) present significant performance improvement over traditional CPU-based implementations. Caches, which significantly improve CPU performance, are introduced to GPUs to further enhance application performance. However, the effect of caches is not significant for many cases in GPUs and even detrimental for some cases. The massive parallelism of the GPU execution model and the resulting memory accesses cause the GPU memory hierarchy to suffer from significant memory resource contention among threads. One cause of cache contention arises from column-strided memory access patterns that GPU applications commonly generate in many data-intensive applications. When such access patterns are mapped to hardware thread groups, they become memory-divergent instructions whose memory requests are not GPU hardware friendly, resulting in serialized access and performance degradation. Cache contention also arises from cache pollution caused by lines with low reuse. For the cache to be effective, a cached line must be reused before its eviction. Unfortunately, the streaming characteristic of GPGPU workloads and the massively parallel GPU execution model increase the reuse distance, or equivalently reduce reuse frequency of data. In a GPU, the pollution caused by a large reuse distance data is significant. Memory request stall is another contention factor. A stalled Load/Store (LDST) unit does not execute memory requests from any ready warps in the issue stage. This stall prevents the potential hit chances for the ready warps. This dissertation proposes three novel architectural modifications to reduce the contention: 1) contention-aware selective caching detects the memory-divergent instructions caused by the column-strided access patterns, calculates the contending cache sets and locality information and then selectively caches; 2) locality-aware selective caching dynamically calculates the reuse frequency with efficient hardware and caches based on the reuse frequency; and 3) memory request scheduling queues the memory requests from a warp issuing stage, frees the LDST unit stall and schedules items from the queue to the LDST unit by multiple probing of the cache. Through systematic experiments and comprehensive comparisons with existing state-of-the-art techniques, this dissertation demonstrates the effectiveness of our aforementioned techniques and the viability of reducing cache contention through architectural support. Finally, this dissertation suggests other promising opportunities for future research on GPU architecture.

Recommended Citation

Choo, Kyoshin, "Reducing Cache Contention On GPUs" (2016). Electronic Theses and Dissertations. 454.
https://egrove.olemiss.edu/etd/454

Concentration/Emphasis

Emphasis: Computer Science

Download

Included in

Computer Sciences Commons

COinS

Reducing Cache Contention On GPUs

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Relational Format

Abstract

Recommended Citation

Concentration/Emphasis

Included in

Browse

Search

Author Corner

Additional Information

Reducing Cache Contention On GPUs

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Relational Format

Abstract

Recommended Citation

Concentration/Emphasis

Included in

Share

Browse

Search

Author Corner

Additional Information