Publications

Compiler/Micro-architecture Cooperation
    1. C. Fang, S. Carr, S. Onder and Z. Wang. "Feedback-directed Memory Disambiguation Through Store Distance Analysis", In Proceedings of the 20th ACM International Conference on Supercomputing, Queensland, Australia, June 2006.
    2. C. Fang, S. Carr, S. Onder and Z. Wang. "Path-based Reuse Distance Analysis", In Proceedings of the 15th International Conference on Compiler Construction, Vienna, Austria, March 2006.
    3. C. Fang, S. Carr, S. Onder and Z. Wang. " Instruction Based Memory Distance Analysis and Its Application to Optimization", In Proceedings of the Fourteenth ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, September 2005.
    4. S. Carr and S. Onder. "A Case for a Working-set-based Memory Hierarchy", In Proceedings of the 2005 ACM International Conference on Computing Frontiers, Ichia, Italy, May 2005.
    5. C. Fang, S. Carr, S. Onder and Z. Wang. "Reuse-distance-based Miss-rate Prediction on a Per Instruction Basis", In Proceedings of the 2004 ACM Workshop on Memory System Performance, Washington, D.C., June 2004.
    6. Z. Wang, K.S. McKinley and D. Burger. Combining Cooperative Software/Hardware Prefetching and Cache Replacement,  IBM Austin CAS Center for Advanced Studies Conference, Austin, TX, February 2004.
    7. Z. Wang, D. Burger, S.K. Reinhardt, K.S. McKinley and C.C. Weems,  Guided Region Prefetching: A Cooperative Hardware/Software Approach, In Proceedings of the Thirtieth International Symposium on Computer Architecture (ISCA'03), San Diego, CA, June 9-11, 2003 (This version contains a couple of non-critical corrections to our published one).
    8. Z. Wang, K.S. McKinley, A.L. Rosenberg and C.C. Weems, Using the Compiler to Improve Cache Replacement Decisions, In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT), Charlottesville, Virginia, September 22-25, 2002.
    9. X. Huang, Z. Wang, K.S. McKinley, Compiling for the Impulse Memory Controller, In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT), Barcelona, Spain, September 8-12, 2001.
    10. O.S. Unsal, Z. Wang, I. Koren, C. M. Krishna and C.A. Moritz, On Memory Behavior of Scalars in Embedded Multimedia Systems, In Proceedings of WMPI'01, Workshop on Memory Performance Issues, Goteborg, Sweden, June, 2001.
    11. Z. Wang K.S. McKinley A.L. Rosenberg, Improving Replacement Decisions in Set-Associative Caches, In Proceedings of MASPLAS'01, The Mid-Atlantic Student Workshop on Programming Languages and Systems, IBM Watson Research Center, Hawthorne, NY, April, 2001.
    12. M. Bedy, S. Carr, S. Onder and P. Sweany. "Improving Software Pipelining by Hiding Memory Latency with Combined Loads and Prefetches", In Interaction between Compilers and Computer Architectures, G. Lee and P.-C. Yew ed., Kluwer Academic Publishers, 2001.
    13. S. Carr and P. Sweany. ``Improving Software Pipelining with Hardware Support for Self-Spatial Loads'', In Proceedings of the Third Workshop on Interaction between Compilers and Computer Architecture (INTERACT-3), San Jose, CA, October 1998.
Optimization for DSP Architectures
  1. S. Carr and P. Sweany. "Automatic Data Partitioning for the Agere Payload Plus Network Processor", In Proceedings of the ACM/IEEE 2004 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, Washington D.C., September 2004.
  2. P. Sweany and S. Carr. "Building a C Compiler Retargetable for DSP Processors", In The 1st Workshop on Optimizations for DSP and Embedded Systems (ODES), San Francisco, California, March 2003.
  3. Y. Qian, S. Carr and P. Sweany. "Optimizing Loop Performance for Clustered VLIW Architectures", In Proceedings of the Eleventh IEEE International Conference on Parallel Architectures and Compiler Techniques (PACT-2002), Charlottesville, Virginia, September 22-25, 2002.
  4. Y. Qian, S. Carr and P. Sweany. "Loop Fusion for Clustered VLIW Architectures", In Proceedings of the ACM 2002 Joint Conference on Languages, Compilers and Tools for Embedded Systems and Software and Compilers for Embedded Systems, Berlin, Germany, June 19-21, 2002.
  5. D. Sule, S. Carr, and P. Sweany. "Evaluating Register Partitioning with Genetic Algorithms", In Proceedings of the Fourth International Conference on Massively Parallel Computing Systems, Ischia, Italy, April 2002.
  6. X. Huang, S. Carr and P. Sweany. "Loop Transformations for Architectures with Partitioned Register Banks", In Proceedings of the 2001 ACM Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES '2001), Snowbird, Utah, June 22-23, 2001.
  7. J. Hiser, S. Carr and P. Sweany. "Global Register Partitioning", In Proceedings of the 2000 IEEE International Conference on Parallel Architectures and Compiler Techniques, Philadelphia, PA, October 15-19, 2000.
  8. J. Hiser, S. Carr, P. Sweany, and S.J. Beaty. ``Register Assignment for Software Pipelining with Partitioned Register Banks''. In Proceedings of the 2000 IEEE International Parallel and Distributed Processing Symposium, Cancun, Mexico, May 1-4, 2000.
  9. D. Kuras, S. Carr and P. Sweany. ``Value Cloning for Architectures with Partitioned Register Banks'', In The 1998 Workshop on Compiler Support for Embedded Systems (CASES98), Washington D.C., December 1998.
  10. S. Jang, S. Carr, P. Sweany, and D. Kuras, ``A Code Generation Framework for VLIW Architectures with Partitioned Register Files''. In Proceedings of the Third International Conference on Massively Parallel Computing Systems, Colorado Springs, Colorado, April 1998.
Cache Optimization
  1. X. Huang, S.M. Blackburn, K.S. McKinley, J.E.B. Moss, Z. Wang and P. Cheng.  The Garbage Collection Advantage: Improving Program Locality, In  Proceedings of the 19th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04), Vancouver, Canada, October, 2004.
  2. S. Carr and Y. Guan. ``Unroll-and-Jam Using Uniformly Generated Sets'', In Proceedings of the 30th IEEE International Symposium on Microarchitecture (MICRO-30), Research Triangle Park NC, December 1997.
  3. S. Carr and R.B. Lehoucq, ``Compiler Blockability of Dense Matrix Factorizations'', ACM Transactions on Mathematical Software 23(3), September 1997.
  4. C. Ding, S. Carr, and P. Sweany. ``Modulo Scheduling with Cache-Reuse Information'', Lecture Notes in Computer Science 1300, Springer-Verlag, Proceedings of Europar 97, Passau, Germany, August 1997.
  5. S. Carr. ``Combining Optimization for Cache and Instruction-Level Parallelism'', In Proceedings of the 1996 IEEE International Conference on Parallel Architectures and Compiler Techniques (PACT 96), Boston MA, October 1996.
  6. K. McKinley, S. Carr and C.-W. Tseng, ``Improving Data Locality with Loop Transformations'', ACM Transactions on Programming Languages and Systems 18(4), July 1996.
  7. S. Carr and R.B. Lehoucq, ``A Compiler Blockable Algorithm for QR Decomposition'', In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, San Francisco CA, February 1995.
  8. S. Carr, K.S. McKinley and C-W. Tseng, ``Compiler Optimizations for Improving Data Locality'', In Proceedings of the Sixth ACM International Conference on Architectural Support for Programming Languages and Compilers (ASPLOS-VI), San Jose CA, October 1994.
  9. S. Carr and K. Kennedy, ``Compiler Blockability of Numerical Algorithms'', In Proceedings of Supercomputing '92', Minneapolis MN, November 1992.
  10. S. Carr and K. Kennedy, ``Compiling Scientific Code for Complex Memory Hierarchies'', In Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences, Kauai HI, January 1991.
  11. S. Carr and K. Kennedy, ``Blocking Linear Algebra Codes for Memory Hierarchies'', In Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, Chicago IL, December 1989.
Register Allocation
  1. Y. Ma, S. Carr and R. Ge. "Low-cost Register-pressure Prediction for Scalar Replacement using Pseudo-schedules", In Proceedings of the 2004 Interational Conference on Parallel Processing, Montreal, Canada, August 15-18, 2004.
  2. D. Callahan, S. Carr and K. Kennedy. " Retrospective: Improving Register Allocation for Subscripted Variables", In 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation (1979 - 1999): A Selection, Kathryn S. McKinley, Editor, ACM SIGPLAN Notices, Volume 39, Number 4, April 2004.
  3. S. Carr and P. Sweany. "An Experimental Evaluation of Scalar Replacement on Scientific Benchmarks", Software - Practice & Experience 33(15), December 2003.
  4. T. Brasier, P. Sweany, S. Beaty and S. Carr, ``CRAIG: A Practical Framework for Combining Instruction Scheduling and Register Assignment'', In Proceedings of the 1995 IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT 95), Cyprus, June 1995.
  5. S. Carr and K. Kennedy, ``Scalar Replacement in the Presence of Conditional Control Flow'', Software - Practice & Experience 24(1), January 1994.
  6. S. Carr, D. Callahan and K. Kennedy, ``Improving Register Allocation for Subscripted Variables'', In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation (PLDI 90), White Plains NY, June 1990.
Instruction-Level Parallelism
  1. P. Zhou, S. Onder and S. Carr. "Fast Branch Misprediction Recovery in Out-of-order Superscalar Processors", To appear in Proceedings of the 2005 ACM International Conference on Supercomputing, Boston, MA, June 2005.
  2. S. Onder, Cost Effective Memory Dependence Prediction using Speculation Levels and Color Sets ,  In Proceedings of the 2002 ACM International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, Virginia, September 22-25, 2002.
  3. S. Onder and R. Gupta, "Dynamic Memory Disambiguation in the Presence of Out-of-Order Store Issuing", The Journal of Instruction Level Parallelism, vol. 4, June 2002.
  4. S. Rele, S. Pande, S. Onder, and R. Gupta, Optimization of Static Power Dissipation by Functional Units in Superscalar Processors, In Proceedings of the 2002 International Conference on Compiler Construction , Grenoble, France, April 2002.
  5. S. Onder and R. Gupta, Instruction Wake-up in Wide Issue Superscalars, In Proceedings of the
    7th European Conference on Parallel Computing ,
    LNCS 2150, Springer Verlag, pages 418-427, Manchester, UK, August 2001.
  6. S. Onder and R. Gupta, Load and Store Reuse Using Register File Content, In Proceedings of the ACM 15th International Conference on Supercomputing, pages 289-302, Sorrento, Naples, Italy, June 2001.
  7. S. Onder and R. Gupta,  Dynamic Memory Disambiguation in the Presence of Out-of-order Store Issuing, In Proceeding of  the 32nd Annual IEEE/ACM International Symposium on Microarchitecture , Haifa, Israel, November 1999. ( Longer version )
  8. S. Onder, J. Xu, and R. Gupta, Caching and Predicting Branch Sequences for Improved Fetch Effectiveness , In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, Newport Beach, California, October 1999.
  9. S. Onder and R. Gupta,  Superscalar Execution with Direct Data Forwarding, In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, pages 130-135,Paris, France, October 1998.
  10. S. Carr, C. Ding and P. Sweany, ``Improving Software Pipelining with Unroll-and-Jam'', In Proceedings of the Twenty-Ninth Annual Hawaii International Conference on System Sciences, Maui HI, January 1996, pp. 183-192.
  11. S. Carr and K. Kennedy, ``Improving the Ratio of Memory Operations to Floating-Point Operations in Loops'', ACM Transactions on Programming Languages and Systems 16(6), November 1994.
Simulators and Domain-specific Languages
  1. J. Bastian and S. Onder. "Specification of the Intel IA-32 using an Architecture Description Language", In Proceedings of the 2004 Workshop on Architecture Description Languages, Toulouse, France, August 2004.
  2. S. Onder and R. Gupta, Automatic Generation of Microarchitecture Simulators, In Proceedings of the 1998 IEEE International Conference on Computer Languages, pages 80-89, Chicago, Illinois, May 1998  (Click here for a longer version)