Program Locality Models and Their Use in Memory Performance Optimization

A Half-Day Tutorial at PACT 2005

Tutorial Organizers
Steve Carr Professor Michigan Technological University
Trishul Chilimbi Research Group Manager Microsoft Research
Chen Ding Assistant Professor University of Rochester
Youfeng Wu Principal Engineer Intel Microprocessor Technology Labs

Abstract

Caching is widely used in many computer programs and systems. The effect of caching depends on the program locality, which can be analyzed using the following four models.
  • The compiler model analyzes array subscript patterns to determine when array references contained within loops access the same memory location.
  • The hot-stream model identifies hot data streams, which are variable size, frequently repeated sequences of consecutive data accesses, and treats these as the basic exploitable locality units to be optimized.
  • The distance model predicts the change in the distance of data reuses for different program inputs and identifies the reference affinity among data groups.
  • The stride model explores the ``near constant stride'' among successive memory references to determine the future address of a memory location for data prefetching
The tutorial will give a survey of these four locality models and their uses in manual data restructuring, loop transformations (permutation, fusion, and tiling), automatic cache-conscious heap allocation, dynamic software prefetching, cache miss-rate prediction, array regrouping, and structure splitting.

Tutorial Outline

  1. Introduction of speakers by Ding
  2. Compiler model and its uses by Carr
    1. Dependence analysis basics
    2. Dependence-based locality analysis
    3. Uniformly generated set analysis
    4. Locality analysis using uniformly generated sets
    5. Application to loop transformations
  3. Hot-stream model and its uses by Chilimbi
    1. Framework
    2. Efficient Implementation
    3. Applications to
      1. Guided Data Restucturing
      2. Cache-conscious Heap allocation
      3. Dynamic Software Prefetching
  1. Distance model and its uses by Ding
    1. Reuse distance model
    2. Measurement methods
    3. Prediction of reuse-distance patterns
    4. Miss-rate prediction across all program inputs
    5. Reference affinity and its use in array regrouping and structure splitting
  2. Stride model by Wu
    1. Stride patterns in irregular programs
    2. Stride detection methods
    3. Stride-based prefetching
    4. Performance improvement
    5. Other usages of stride patterns
  3. Summary by Ding

Biographies

Steve Carr is a Professor of Computer Science in the Department of Computer Science at Michigan Technological University. His research focuses on memory hierarchy performance using static and dynamic compiler analysis and compiler/micro-architecture cooperation. Dr. Carr is a recipient of a National Science Foundation Information Technology Research award (ITR) and has served on numerous conference committees. Dr. Carr's teaching interests include compiler design, programming languages and discrete mathematics.

Trishul Chilimbi is researcher at Microsoft Research, where he leads the Runtime Analysis & Design Group. His research focuses on runtime analyses, including hybrid static-dynamic approaches to enhance software reliability, resilience, performance, and security. He also investigates memory system performance and garbage collection. He has published widely in premier conferences including PLDI, POPL, ASPLOS, and has been awarded several patents for his inventions. He is a co-founder of the ACM SIGPLAN Workshop on Memory System Performance.

Chen Ding is an assistant professor in Computer Science Department at University of Rochester. His research focuses on understanding whether a complex program has an inherent pattern of data access and, if so, to what degree that pattern can be modeled, measured, and modified (improved). He is a recipient of a Young Investigator Award from the Office of Science of the US Department of Energy, a Career Award from National Science Foundation, and a best paper award from the IEEE IPDPS'01 conference. In 2004, he was the general chair of the second ACM SIGPLAN Workshop on Memory System Performance.

Youfeng Wu is a principal engineer and a group manager at Intel's Programming Systems Research Labs. He has spent more than 15 years honing his expertise in advanced compiler transformations, profile-guided optimizations, instruction-level parallelism, performance analysis, and software/hardware collaborative techniques to enhance future generations of micro-processors. His current research focuses on dynamic binary translation and related software strategy and hardware supports. He has more than 20 patents and pending applications. He is a member of IEEE and ACM.