High-performance processors are becoming increasingly power bound with technology scaling. Dynamic voltage and frequency scaling (DVFS) has emerged as an efficient method of reducing power consumption by lowering the operating voltage and frequency of a processor. We propose a multicore memory-aware DVFS scheme based on VSV, a uniprocessor DVFS algorithm that throttles a core based on L2 cache misses. The key observation is that during L2 misses, there may be periods during which the processor pipeline is stalled, waiting for data. These stalls offer an excellent opportunity for power savings with DVFS. Care must be taken, however, to be sure that the pipeline is actually stalled during an L2 miss and that the processor has sufficient work to complete when transitioning out of low-power mode. Using SPEC2K benchmarks, we evaluate both single-core and multicore VSV over a range of DVFS transition latencies: 12ns, 100ns, and 8.9us which are representative of different voltage regulator configurations. We show that fast-switching, on-chip voltage regulators for DVFS are necessary to see benefits using the energy delay squared metric. However, if low latencies—on the order of 12ns—are indeed possible, we show power benefits of 28%, performance costs of 35%, and improvements of 28% for a quad-core CMP. Increasing the latency to 100ns shows power savings of 45% at 45% performance loss, and additionally an energy delay squared degradation of 28%, though this is the result of poor prediction of instruction-level parallelism—8.9us latencies proved to be entirely infeasible.


Saugata Ghose and Jonathan Tse