Dynamic issue queue capping for simultaneous multithreaded processors
Dynamic issue queue capping for simultaneous multithreaded processors
A simultaneous multithreaded (SMT) processor mixes multiple instruction streams in its superscalar out-of- order execution core for higher throughput. To achieve this, a superscalar processor is modified in such a way that some of its resources are duplicated and the rest is shared among multiple threads. The issue queue (IQ), which holds all waiting instructions until they become ready and scheduled for execution, is among these shared resources. A baseline unmanaged IQ can give an unexpectedly low performance since a hungry thread can tie up most of the IQ entries. This type of scenario is also worse in terms of the fairness metric since some of the threads may experience starvation. Earlier studies propose both static and a limited type of dynamic capping of the IQ entries for regulating IQ traffic and providing better SMT throughput and fairness. In this study, we propose an efficiency-based dynamic capping (EDC) algorithm that calculates an efficiency metric for each thread and allocates the IQ entries for maximizing the throughput and the fairness metrics. EDC gives 3.6% better throughput and 3.9% better fairness results compared to the current state-of-the-art algorithms, on the average
___
- [1] Dı́az J, Hidalgo JI, Fernández F, Garnica O, López S. Improving SMT performance: an application of genetic algorithms to configure resizable caches. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers; New York, USA; 2009. pp. 2029-2034.
- [2] Özer E. Low-cost and power-efficient thread collision detection scheme for shared caches in a real-time multithreaded embedded processor. Turkish Journal of Electrical Engineering and Computer Sciences 2013; 21(3): 714-733.
- [3] Zhang Y, Lin WM. Write buffer sharing control in smt processors. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA); New York, USA; 2013. pp. 243-249.
- [4] Gungorer H, Kucuk G. Dynamic capping of physical register files in simultaneous multi-threading processors for performance. In: 32nd International Symposium (ISCIS 2018); Poznan, Poland; 2018. pp. 41-48.
- [5] Sheikh MN, Lin WM. Dynamic capping of rename registers for SMT processors. Journal of Systems Architecture 2019; 99: 1-20. doi: 10.1016/j.sysarc.2019.101637
- [6] Zhang Y, Lin WM. Efficient resource sharing algorithm for physical register file in simultaneous multi-threading processors. Microprocessors and Microsystems 2016; 45 (PB): 270-282. doi: 10.1016/j.micpro.2016.06.002
- [7] Tullsen DM, Eggers SJ, Levy HM. Simultaneous multithreading: maximizing on-chip parallelism. In: 25 Years of the International Symposia on Computer Architecture (Selected Papers); Barcelona, Spain; 1998. pp. 533-544.
- [8] Tullsen DM, Brown JA. Handling long-latency loads in a simultaneous multithreading processor. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture; Austin, TX, USA; 2001. pp. 318-327.
- [9] Everman S, Eeckhout L. A memory-level parallelism aware fetch policy for SMT processors. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture; Scottsdale, AZ, USA; 2007. pp. 240-249.
- [10] Cazorla FJ, Ramirez A, Valero M, Fernandez E. Dynamically Controlled Resource Allocation in SMT Processors. In: 37th International Symposium on Microarchitecture (MICRO-37); Portland, OR, USA; 2004. pp. 171-182.
- [11] El-Moursy A, Albonesi DH. Front-end policies for improved issue efficiency in SMT processors. In: The Ninth International Symposium on High-Performance Computer Architecture; Anaheim, CA, USA; 2003. pp. 31-40.
- [12] Choi S, Yeung D. Learning-Based SMT Processor Resource Distribution via Hill-Climbing. In: 33rd International Symposium on Computer Architecture (ISCA’06); Boston, MA, USA; 2006. pp. 239-251.
- [13] Bitirgen R, Ipek E, Martinez JF. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In: 41st IEEE/ACM International Symposium on Microarchitecture; Lake Como Italy; 2008. pp.318-329.
- [14] Wang H, Koren I, Krishna CM. An Adaptive Resource Partitioning Algorithm for SMT processors. In: International Conference on Parallel Architectures and Compilation Techniques (PACT); Toronto, ON, Canada; 2008. pp. 230- 239.
- [15] Eyerman S, Eeckhout L. Probabilistic job symbiosis modeling for SMT processor scheduling. ACM SIGARCH Computer Architecture News 2010; 38(1): 91-102
- 16] Nagaraju T, Douglas C, Lin VM, John E. Effective Dispatching for Simultaneous Multi-Threading (SMT) Processors by Capping PerThread Resource Utilization. The Computing Science and Technology International Journal 2011; 1(2): 5-14.
- [17] Carroll S, Lin W. Dynamic issue queue allocation based on reorder buffer instruction status. International Journal of Computer Systems 2015; 2 (9): 395-404
- [18] Zhang Y, Hays M, Lin W, John E. Autonomous control of issue queue utilization for simultaneous multi-threading CPUs. Journal of Emerging Trends in Computing and Information Sciences 2015; 6 (12): 736-744.
- [19] Sharkey JJ, Ponomarev D, Ghose K. M-SIM: A Flexible, Multithreaded Architectural Simulation Environment. Technical Report CS-TR-05-DP01. 2005.
- [20] Luo K, Gummaraju J, Franklin M. Balancing throughput and fairness in SMT processors. International Symposium on Performance Analysis of Systems and Software (ISPASS); Tuscon, AZ, USA; 2001. pp.164-171.
- [21] Margaritov A, Gupta S, Gonzalez-Alberquilla R, Grot B. Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores. In: Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA); New York, USA; 2019. pp. 15-27