Question 1(a) 4 pts
What kind of dependences does register renaming eliminate?
Register renaming eliminates write-after-write (WAW) and write-after-read (WAR) dependences.
EECS 4340 · Spring 2024
Historically numbered as EECS 4824.What kind of dependences does register renaming eliminate?
Register renaming eliminates write-after-write (WAW) and write-after-read (WAR) dependences.
In the P6 microarchitecture, an instruction’s source operand value may come from three places before being placed in the reservation station. Name all three.
What are the main advantages/disadvantages of the R10k in comparison to the P6-style architecture?
(Both rely on register renaming and a ROB for precise interrupts; the difference is where operand values live.)
Name a technique that can eliminate compulsory cache misses.
A repeating branch pattern of T, T, T, T, T, N, N is predicted with a PC-indexed 2-bit saturating-counter predictor. What is the steady-state prediction accuracy?
The steady-state accuracy is:
4 / 7 = 0.5714 = 57.14%Reasoning in steady state:
| Actual outcome | Predictor state before outcome | Prediction | Result | State after outcome |
|---|---|---|---|---|
| T1 | strongly taken | T | correct | strongly taken |
| T2 | strongly taken | T | correct | strongly taken |
| T3 | strongly taken | T | correct | strongly taken |
| T4 | strongly taken | T | correct | strongly taken |
| T5 | strongly taken | T | correct | strongly taken |
| N1 | strongly taken | T | wrong | weakly taken |
| N2 | weakly taken | T | wrong | weakly not taken |
| next T1 | weakly not taken | N | wrong | weakly taken |
Per 7-outcome period, there are 3 mispredictions and 4 correct predictions.
CPI is 1.2; ideal CPI is 1; 20% of instructions are branches; branch-prediction accuracy is 90%. How many cycles does each misprediction cost? What accuracy is needed to reduce CPI to 1.01 only by improving branch prediction?
Current CPI overhead:
1.2 - 1 = 0.2 cycles/instructionBranch misprediction rate per instruction:
0.20 branch/instruction * (1 - 0.90) = 0.02 mispredictions/instructionMisprediction cost:
0.2 / 0.02 = 10 cycles per mispredictionFor CPI = 1.01:
1.01 = 1 + 0.20 * (1 - accuracy) * 100.01 = 2 * (1 - accuracy)1 - accuracy = 0.005accuracy = 0.995 = 99.5%A two-level/correlated branch predictor uses 8 PC bits to select a BHT entry; each BHT entry holds an 8-bit branch history; the PHT is indexed by that 8-bit history and stores 2-bit saturating counters. How many storage bits are needed for the BHT and the PHT?
Compute the cache index bits for L1i, L1d, L2, and L3 caches with the following parameters. The block size is 64 B for all caches.
| Cache | Size | Associativity | Block size |
|---|---|---|---|
| L1 instruction | 32 KB | 8-way | 64 B |
| L1 data | 32 KB | 8-way | 64 B |
| L2 | 256 KB | 8-way | 64 B |
| L3 | 20 MB | 20-way | 64 B |
Formula:
# sets = cache size / (block size * associativity)index bits = log2(# sets)Answers:
| Cache | Sets | Index bits |
|---|---|---|
| L1 instruction | 32 KB / (64 B * 8) = 64 | 6 |
| L1 data | 32 KB / (64 B * 8) = 64 | 6 |
| L2 | 256 KB / (64 B * 8) = 512 | 9 |
| L3 | 20 MB / (64 B * 20) = 16384 | 14 |
Compute the average access time (AAT) for each associativity given the per-associativity hit access time, a 10 ns miss access time, 0.3 data references per instruction, and the table’s MPKI values. Which associativity performs best?
Miss rate conversion:
miss rate = misses / accesses = (MPKI / 1000) / 0.3AAT = hit_rate * hit_access_time + miss_rate * miss_access_time = (1 - miss_rate) * hit_access_time + miss_rate * 10 ns| Associativity | Hit access time | MPKI | Miss rate | Average access time |
|---|---|---|---|---|
| Direct-mapped | 0.86 ns | 6.64 | 0.02213 | ~1.061 ns |
| 2-way | 1.12 ns | 3.66 | 0.01220 | ~1.228 ns |
| 4-way | 1.37 ns | 0.987 | 0.00329 | ~1.398 ns |
| 8-way | 2.03 ns | 0.266 | 0.000887 | ~2.037 ns |
The direct-mapped cache performs best by average access time.