10.07.2015 Views

高速かつ正確なキャッシュシミュレーション法とその評価 Fast ... - 九州大学

高速かつ正確なキャッシュシミュレーション法とその評価 Fast ... - 九州大学

高速かつ正確なキャッシュシミュレーション法とその評価 Fast ... - 九州大学

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

THE INSTITUTE OF ELECTRONICS,INFORMATION AND COMMUNICATION ENGINEERSTECHNICAL REPORT OF IEICE. † †† ††† 819–0395 744 †† 819–0395 744 E-mail: †ono@c.csce.kyushu-u.ac.jp, ††{inoue,murakami}@i.kyushu-u.ac.jp 81.7% 34.6%<strong>Fast</strong>, Accurate Cache SimulationTakatsugu ONO † , Koji INOUE †† , and Kazuaki MURAKAMI ††† Graduate School of Information Science and Electrical Engineering, Kyushu University Motooka 744,Nishiku, Fukuoka, 819–0395 Japan†† Faculty of Information Science and Electrical Engineering, Kyushu University Motooka 744, Nishiku,Fukuoka, 819–0395 JapanE-mail: †ono@c.csce.kyushu-u.ac.jp, ††{inoue,murakami}@i.kyushu-u.ac.jpAbstractThis paper proposes a fast, accurate cache simulation technique for efficient design space exploration,and shows its efficiency by means of comparing with a related approach. Trace-driven simulation is a well knownmethodology to measure memory-system performance, e.g. cache hit rates. One of advantages of this method is thehigh-speed of simulations. Since the trend increases the complexity of microprocessor chips, e.g. CMPs, however, itis strongly required to achieve much faster simulations without sacrificing the accuracy of performance prediction.The proposed approach first attempts to characterize the memory-access patters, and then generates a small butwell-constructed memory-access trace as a stimulus of cache simulators. In our evaluation, it is observed that theproposed technique reduces the trace size by 81.7% while the accuracy of cache miss rates is improved by 34.6%,compared with SimPoint approach.Key wordsPerformance evaluation, Simulation, Memory system1. — 1 —


[1]2. 3. 4. 5. 2. [2] [3] 3 1 2 3 (1) MinneSPEC [4] (2) [5], [6] 1 (3) SMARTS [7] SimPoint [8], [9] SMARTS SimPoint SimPoint 1 SMARTS SimPoint 3. 3. 1 3. 31 3. 2 — 2 —


Adres2302STEP12アドレス1インターバル アドレス1STEP22302215.5390Memory access200 400 600 800 10000 20 40 60 80 100 Clock cycles 0(a) 連 続 したメモリアクセス 1アクセス 時 間 の 平 均 アクセス 時 間 の 標 準 偏 差 値 アドレス 間 距 離サブ・トレース 1 1 STEP1 2 2 STEP2 STEP1 3 1 0 N x i ¯x 1¯x =∑xiN 2 0 σ 2√ ∑(xi− ¯x)σ =(2)N 3 2 2 2(a) 2(b) 2(a)(b) (1)200 600 1000 Memory400 800 accessAdres 60 80 100 Clock cycles 40 20 0 0(b) 離 散 的 なメモリアクセス 2 2(a) 3(b) 3. 3 3. 2 K- [10] 1 miss rate 3— 3 —


1 () 10,000 1,024 8,192 K- 500 2 SimPoint MPEG2 carphone 1,740 360foreman 1,200 250SPEC2000 175.vpr 4,260 1,270176.gcc 2,350 850183.equake 3,600 1,360∑ ki=1miss rate =(miss i × weight i )× 100 (3)access i access miss i weight i k 4. 4. 1 3. L1 SimPoint SimpleScalar3.0 [11] MPEG2 [12] SPEC2000 [13] SimPointSimPoint3.1 [14], [15] Sim-Point3.1 10M [15] 16KB 32B 1 16KB 32B 4 MPEG2 carphone foreman QCIF 150 SPEC2000 175.vpr 176.gcc 2 183.equake SPEC2000 test 1 1 K- [10] K- Cluster3.0 [16] k∑S = (N i + W ) (4)i=1 k N i i W N i k W k W 97%500 16,000 1 10 10 4. 2 4. 3 10 — 4 —


0.04 0.12 0.16 0.2正 規 化 トレースサイズ 提 案 手 法 (クラスタ 数 500) 提 案 手 法 (SimPointと 同 サイズ) SimPoint00.04 0.05キャッシュ・ミス 率 予 測 誤 差(パーセンテージ・ポイント) 0.030.1920.064 3 ( KB/ W)32KB/1W 32KB/1W 32KB/2W 32KB/4W16KB/1W 16KB/1W 16KB/2W 16KB/4W8KB/1W 8KB/1W 8KB/2W 8KB/4W4KB/1W 4KB/1W 4KB/2W 4KB/4W 3SimPoint 令 データ 命 令 データ 命 令 データ 命 令 データ 命 令 データ carphone foreman 175.vpr 176.gcc 183.equake ベンチマーク 命0.08 50.01 0.0232K1W16K1W8K1W4K1W32K1W16K1W8K1W4K1W32K1W16K1W8K1W4K1W32K1W16K1W8K1W4K1W32K1W16K1W8K1W4K1W00.01 0.017 0.09 0.05 0 0.2 0.4 0.6 0.5 0.3データ 令 命 データ 令 0.06 0.013 0.07 予 0.013 測 誤 差(パーセンテージ・ポイント)(クラスタ 数 500) 提 案 手 法 (SimPointと 同 サイズ) SimPoint 0.85 1.6320.07 率キャッシュ・ミス データ 命 令 データ 命 令 命 データ 令 命 0.1ベンチマーク176.gcc183.equake175.vpr foreman carphone 4 SimPoint 4KB 64B 1 4KB 64B 4 500 16,000 SimPoint 64,000 SimPoint 2 4. 2 SimPoint 3 1 500 SimPoint SimPoint SimPoint 18.3%SimPoint 4 carphone foreman SimPoin 0.001 0.004 SimPoint SimPoint 500 carphone 175.vpr 176.gcc 175.vpr 176.gcc183.equake ベンチマーク carphoneforemancarphone foreman 183.equake 176.gcc SimPointSimPoint SimPoint 4. 3 3 64B 5 5 5 0.024 6 6 0.143 carphone foreman MPEG2 — 5 —


0.5キャッシュ・ミス 率 予 測 誤 差 (パーセンテージ・ポイント) 0.4 60.1 0.2 0.30176.gcc 4 5. 81.7%34.6% 98.6% 0.083 , ( A: 17680005) .[1] , , “”, 5 SACSIS(2007).[2] , , , “”, , 46, 12, pp. 84–97 (2005).[3] J. J. Yi and D. J. Lilja: “Simulation of computer architectures:Simulators, benchmarks, methodologies, and recommendations.”,IEEE Trans. Computers, 55, 3, pp. 268–280(2006).[4] A. J. KleinOsowski and D. J. Lilja: “Minnespec: A newspec benchmark workload for simulation-based computerarchitecture research.”, Computer Architecture Letters, 1,(2002).[5] J. Robert H. Bell and L. K. John: “The case for automaticsynthesis of miniature benchmarks”, Workshop on Modeling,Benchmarking and Simulation, Wisconsin, Madison,pp. 88–97 (2005).[6] J. Robert H. Bell and L. K. John: “Improved automatictestcase synthesis for performance model validation”, InternationalConference on Supercomputing, pp. 111–120(2005).[7] R. E. Wunderlich, T. F. Wenisch, B. Falsafi and J. C. Hoe:“Smarts: Accelerating microarchitecture simulation via rigorousstatistical sampling.”, International Symposium onComputer Architecture, pp. 84–95 (2003).[8] T. Sherwood, E. Perelman, G. Hamerly and B. Calder:“Automatically characterizing large scale program behavior”,International Conference on Architectural Support forProgramming Languages and Operating Systems, pp. 45–57(2002).[9] T. Sherwood, E. Perelman and B. Calder: “Basic block distributionanalysis to find periodic behavior and simulationpoints in applications”, International Conference on ParallelArchitectures and Compilation Techniques, pp. 3–14(2001).[10] , , “”, (2006).[11] SimpleScalar Simulation Tools for Microprocessor andSystem Evaluation: “http://www.simplescalar.org/”.[12] MPEG Software Simulation Group: “http://www.mpeg.org/mpeg/mssg/”.[13] SPEC: “http://www.specbench.org/”.[14] SimPoint3.1: “http://www.cse.ucsd.edu/ calder/simpoint/”.[15] G. Hamerly, E. Perelman, J. Lau and B. Calder: “Simpoint3.0: <strong>Fast</strong>er and more flexible program analysis”, Workshopon Modeling, Benchmarking and Simulation (2005).[16] Cluster3.0: “http://bonsai.ims.u-tokyo.ac.jp/ mdehoon/software/cluster/software.htm#source”.carphone foreman 175.vpr 176.gcc 183.equake ベンチマーク 32K1W32K2W32K4W16K1W16K2W16K4W8K1W8K2W8K4W4K1W4K2W4K4W32K1W32K2W32K4W16K1W16K2W16K4W8K1W8K2W8K4W4K1W4K2W4K4W32K1W32K2W32K4W16K1W16K2W16K4W8K1W8K2W8K4W4K1W4K2W4K4W32K1W32K2W32K4W16K1W16K2W16K4W8K1W8K2W8K4W4K1W4K2W4K4W32K1W32K2W32K4W16K1W16K2W16K4W8K1W8K2W8K4W4K1W4K2W4K4W— 6 —

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!