Energy savings in simultaneous multi-threaded processors through dynamic resizing of datapath resources

Nowadays, all the designers of systems from high-performance servers to battery-operated handheld devices aim for reliability, high-performance and longevity. Central within these aims the issue of processor power consumption is becoming increasingly important. In this study, we aim to adapt our already-proven method for single-threaded superscalar processors to simultaneous multi-threaded (SMT) processors for energy savings. The original method focused on resizing datapath resources according to the demands of running applications. To achieve this, the targeted resources are physically divided into multiple partitions, and turned on and off according to the needs of the applications. Since, the energy consumption of the turned-off datapath resources is quite low, as a result, it becomes possible to have great amount of energy savings within a processor. However, special care must be taken when there are multiple threads racing against each other to gain access to shared datapath resources. As a result, our proposed microarchitectural technique achieves 0.5% Instructions Per Cycle (IPC) and 3.2% Total number of instructions Per Cycle (TPC) improvement, while it turns off 45% of the Reorder Buffer (ROB), 59% of the Load-Store Queue (LSQ), 43% of the Issue Queue (IQ), 30% of the integer Physical Register Files (PRF) and, finally, 48% of the floating PRF, on the average across all simulated benchmarks. According to our estimates, the total processor power is reduced by 12%, on the average.

Energy savings in simultaneous multi-threaded processors through dynamic resizing of datapath resources

Nowadays, all the designers of systems from high-performance servers to battery-operated handheld devices aim for reliability, high-performance and longevity. Central within these aims the issue of processor power consumption is becoming increasingly important. In this study, we aim to adapt our already-proven method for single-threaded superscalar processors to simultaneous multi-threaded (SMT) processors for energy savings. The original method focused on resizing datapath resources according to the demands of running applications. To achieve this, the targeted resources are physically divided into multiple partitions, and turned on and off according to the needs of the applications. Since, the energy consumption of the turned-off datapath resources is quite low, as a result, it becomes possible to have great amount of energy savings within a processor. However, special care must be taken when there are multiple threads racing against each other to gain access to shared datapath resources. As a result, our proposed microarchitectural technique achieves 0.5% Instructions Per Cycle (IPC) and 3.2% Total number of instructions Per Cycle (TPC) improvement, while it turns off 45% of the Reorder Buffer (ROB), 59% of the Load-Store Queue (LSQ), 43% of the Issue Queue (IQ), 30% of the integer Physical Register Files (PRF) and, finally, 48% of the floating PRF, on the average across all simulated benchmarks. According to our estimates, the total processor power is reduced by 12%, on the average.

___

  • Conclusion
  • Pollack, F., “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies,” Micro32 conference key note, 1999.
  • Ponomarev, D., Kucuk, G., Ghose, K., “Reducing Power Requirements of Instruction Scheduling through Dy- namic Allocation of Multiple Datapath Resources,” in Proceedings of the 34th International Symposium on Microarchitecture, 2 001.
  • Ponomarev, D., Kucuk, G., Ghose, K., “Dynamic Allocation of Datapath Resources for Low Power,” in Proceedings of the Workshop on Complexity-Effective Design, held in conjunction with ISCA-28, 2 001.
  • Ponomarev, D., Kucuk, G., Ghose, K., “Dynamic Resizing of Superscalar Datapath Components for Energy Efficiency,” in IEEE Transactions on Computers, vol. 55, No. 2, Feb 2006, pp.192-213.
  • Marculescu, D., “On the Use of Microarchitecture-Driven Dynamic Voltage Scaling,” in Proceedings of the Workshop on Complexity-Effective Design, 2000.
  • Weissel, A., Bellosa, F., “Process Cruise Control: Event-Driven Clock Scaling for Dynamic Power Management,” in Proceedings of the International Conference on Compilers, Architecture, ad Synthesis for Embedded Systems, 2002. [7] Kondo, M., Nakamura, H., “Dynamic Processor Throttling for Power Efficient Computations,” in Proceedings of the Workshop on Power-Aware Computer Systems, 2004.
  • Poellabauer, C., Singleton, L., Schwan, K., “Feedback Based Dynamic Voltage and Frequency Scaling for Memory- Bound Real-Time Applications,” in IEEE Real Time and Embedded Technology and Applications Symposium, 2005. [9] Standard Performance Evaluation Corporation, http://www.spec.org
  • Simplescalar LLC, http://www.simplescalar.com
  • Sherwood, T., Calder, B., “Time Varying Behavior of Programs,” Technical Report No. CS99-630, Dept. of Computer Science and Engineering, University of California San Diego, 1999.
  • Buyuktosunoglu, A., Schuster, S., Brooks, D., Bose, P., Cook, P., Albonesi, D., “An Adaptive Issue Queue for Reduced Power at High Performance,” in Proceedings of the Workshop Power-Aware Computer Systems, 2000.
  • Buyuktosunoglu, A., Albonesi, D., Schuster, S., Brooks, D., Bose, P., Cook, P., “A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors,” in Proceedings of the Great Lakes Symposium VLSI Design, 2001.
  • Folegnani, D., Gonzalez, A., “Energy-Effective Issue Logic,” in Proceedings of the International Symposium on Computer Architecture, 2001.
  • Dropsho, S. et al., “Integrating Adaptive On-Chip Structures for Reduced Dynamic Power,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2002.
  • Tullsen, D. et al., “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithread- ing Processor,” in Proceedings of the International Symposium on Computer Architecture, 1996.
  • Tullsen, D. et al., “Handling Long-Latency Loads in a Simultaneous Multi-Threaded Processor,” in Proceedings of the International Symposium on Microarchitecture, 2001.
  • Cazorla, F. et al., “Improving Memory Latency Aware Fetch Policies for SMT Processors,” in Proceedings of the International Symposium on High Performance Computing, 2003.
  • El-Moursy, A., Albonesi, D., “Front-End Policies for Improved Issue Efficiency in SMT Processors,” in Proceedings of the International Symposium on High-Performance Computer Architecture, 2003.
  • Cazorla, F. et al., “Dynamically Controlled Resource Allocation in SMT Processors,” in Proceedings of the Inter- national Symposium on Microarchitecture, 2004.
  • Robatmili, B. et al., “Thread-Sensitive Instruction Issue for SMT Processors,” in Computer Architecture News, 2004.
  • Kucuk, G., Ergin, O., Ponomarev, D., Ghose, K., “Distributed Reorder Buffer Schemes for Low Power,” in Proceedings of the International Conference on Computer Design, 2003.
  • Kucuk, G. Ponomarev, D., Ergin, O., Ghose, K., “Complexity-Effective Reorder Buffer Designs for Superscalar Processors,” in IEEE Transactions on Computers, vol. 53, No. 6, 2004, pp.653-665.
  • Raasch, S., Reinhardt, S., “The Impact of Resource Partitioning on SMT Processors,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2003.
  • Sharkey, J., Balkan, D., Ponomarev D., “Adaptive Reorder Buffers for SMT Processors,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2006.
  • Sharkey, J., Ponomarev, D., Ghose, K., “M-Sim: A Flexible, Multithreaded Architectural Simulation Environment,” Technical Report CS-TR-05-DP01, Department of Computer Science, State University of New York at Binghamton, October 2005.
  • Brooks, D., Tiwari, V., Martonosi, M., “Wattch: A Framework for Architecture-Level Power Analysis and Opti- mizations,” in Proceedings of the International Symposium on Computer Architecture, pp.83-94, 2000.
  • Hamerly, G., Perelman, E., Calder, B., “How to Use SimPoint to Pick Simulation Points,” ACM SIGMETRICS Performance Evaluation Review, 2004.