![]() |
|
|
|
Overviews
Modern microprocessor designs continue to obtain impressive performance gains through increasing clock rates and advances in the parallelism obtained via micro-architecture design. Unfortunately, corresponding improvements in memory design technology have not been realized, resulting in latencies of over 100 cycles between processors and main memory. This ever-increasing gap in speed has pushed the current memory-hierarchy approach to its limit. Additionally, exploiting a large amount of instruction-level parallelism (ILP) requires issuing a large number of loads and stores in parallel with limits on speculation due to power and energy requirements. The larger cache sizes demanded by scientific applications to limit memory delays and the increased ports requirements for sufficient load/store parallelism conflict with the need for high clock rates and lower energy requirements. Together these issues form the Memory Wall.
Traditional approaches to the memory wall have not yielded satisfactory results. Hardware-only solutions require more power and energy than desired and do not scale well. Compiler-managed solutions tend to miss too many optimization opportunities because of limited compile-time knowledge of run-time behavior. This research proposes a fundamentally different approach. The project will explore combining the best of both techniques by making use of the static knowledge obtained by the compiler in the dynamic decision making of the micro-architecture. In this research, the compiler exposes compile-time analysis to the micro-architecture fully with the micro-architecture using this information to make critical scheduling and memory optimization decisions. This approach allows the examination of solutions that are impossible to do using a hardware-only or fully compiler-managed solution.
To make our approach feasible, the compiler must communicate its analysis in a concise and effective manner. Condensing the vast amount of compiler knowledge to a small number of sets meets these criteria. Using set membership as a framework for communicating compiler information, this research will pursue the following problems related to the memory wall:
Cost-effective run-time memory disambiguation of load/store operations to increase the number of parallel memory operations through the interaction of the compiler and micro-architecture via set-based dependence information;
Scalable cache and load/store queue designs that can sustain multiple memory accesses every cycle for wide-issue superscalar processors and utilize set-based dependence information;
Novel cache design formed in cooperation with compiler-generated working-set information to reduce the number of conflict and capacity misses; and,
Working-set based prefetching techniques to reduce the number of misses and the miss penalty.
Of special note, effective solutions to the last two items require the proposed compiler/micro-architecture cooperation.
The proposed research will provide novel and cost-effective solutions to the memory wall problem, improving the performance of scientific applications. The solutions obtained will be scalable and fit within reasonable power and energy requirements. On a broader scale, this research will fundamentally change compiler and micro-architecture design. Specifically, this research will encourage greater cooperation between the compiler and micro-architecture in solutions addressing many difficult problems facing system design. This greater cooperation will, in turn, provide a new array of design possibilities that are not possible otherwise.
This project is supported by the National Science Foundation under grant number CCR-0312892