|
Fall 2009 Colloquium Series
Peng Zhou, PhD Dissertation
Department of Computer Science
Fine-grain State Processors
August 27, 2009
139 Fisher
9:30 A.M.
Abstract:
Proper manipulation of processor state is crucial for high performance speculative superscalar processors. This dissertation presents a new state paradigm. In this paradigm, the processor is aware of the in-order, speculative and architectural states on an individual data
location basis, rather than with respect to a particular point in the program’s execution. We refer to the traditional processors which adopt a lump-sum approach with respect to the processor state as Coarse-grain State Processors (CSP), and those which can classify individual data locations belonging to a particular state as Fine-grain State Processors (FSP). Fine-grain State Processors break the atomic state set into finer granularity at the individual value level. As a result, they can utilize correct values upon a mis-speculation. Furthermore, they can continue execution with a partially correct state and still maintain correct program semantics. Performing the state recovery without stopping the execution
of future instructions potentially can hide the latency of the recovery process, resulting in zero-penalty speculation under ideal conditions.
This dissertation also presents a taxonomy of FSP. The taxonomy categorizes existing fine-grain state handling techniques and outlines the design space of future FSP designs. Based on the developed general framework, the dissertation explores applications of FSP on sophisticated uni-processor as well as multi-core/multi-threaded organizations. Two detailed FSP models are evaluated, EMR and FSG-RA, regarding control speculation and value speculation, respectively. In both models, the FSP technique handles processor states more efficiently and obtains much higher performance than traditional mechanisms. For example, EMR achieves an average of 9.0% and up to 19.9% better performance than traditional course grain state handling on the SPEC CINT2000 benchmark suite, while FSG-RA obtains an average of 38.9% and up to 160.0% better performance than a comparably equipped CSP processor on the SPEC CFP2000 benchmark suite.
|