Computer Organization and Structure
Homework
#5
Due:
2010/1/4
1. For a
direct-mapped cache design with 32-bit address, the following bits of the
address are used to access the cache.
|
Tag |
Index |
Offset |
a. |
31-10 |
9-4 |
3-0 |
b. |
31-13 |
12-5 |
4-0 |
a.
What is the cache line size (in words)?
b.
How many entries does the cache have?
c.
What is the ratio between total bits
required for such a cache implementation over the data storage bits?
Starting
from power on, the following byte-addressed cache references are recorded.
Address |
0 |
4 |
16 |
132 |
232 |
160 |
1024 |
30 |
140 |
3100 |
180 |
2180 |
d.
How many blocks are replaced?
e.
What is the hit ratio?
f.
List the final state of the cache, with
each valid entry represented as a record of <index, tag, data>.
2. In
general, cache access time is proportional to capacity. Assume that main memory
accesses take 70ns and that memory accesses are 36% of all instructions. The
following table shows data for L1 caches attached to each of two processors P1
and P2.
|
|
L1
size |
L1
miss rate |
L1
hit time |
a. |
P1 |
1KB |
11.4% |
0.62ns |
P2 |
2KB |
8.0% |
0.66ns |
|
b. |
P1 |
8KB |
4.3% |
0.96ns |
P2 |
16KB |
3.4% |
1.08ns |
a.
Assuming that the L1 hit time determines
the cycle times for P1 and P2, what are their respective clock rates?
b.
What is the AMAT for each of P1 and P2?
c.
Assuming a base CPI of 1.0, what is the
total CPI for each of P1 and P2? Which processor is faster?
For
the next three sub-problems, we will consider the addition of an L2 cache to P1
to presumably make up for its limited L1 cache capacity. Use the L1 cache
capacities and hit times from the previous table when solving these
sub-problems. The L2 miss rate indicate is its local miss rate.
|
L2
size |
L2
miss rate |
L2
hit time |
a. |
512KB |
98% |
3.22ns |
b. |
4MB |
73% |
11.48ns |
d.
What is the AMAT for P1 with the
addition of an L2 cache? Is the AMAT better or worse with the L2 cache?
e.
Assuming a base CPI of 1.0, what is the
total CPI for P1 with the addition of an L2 cache?
f.
Which processor is faster, now that P1
has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache
to match P1fs performance? If P2 is faster, what miss rate would P1 need in its
L1 cache to match P2fs performance?
3. To
support multiple virtual machines, two levels of memory virtualization are
needed. Each virtual machine still controls the mapping of virtual address (VA)
to physical address (PA), while the hypervisor maps the physical address (PA)
of each virtual machine to the actual machine address (MA). To accelerate such
mappings, a software approach called gshadow pagingh duplicates each virtual
machinefs page tables in the hypervisor, and intercepts VA to PA mapping
changes to keep both copies consistent. To remove the complexity of shadow page
tables, a hardware approach called nested page table (or extended page table)
explicitly support two classes of page tables (VAÞPA
and PAÞMA) and can walk
such tables purely in hardware. Consider the following sequence of operations:
(1)
Create process; (2) TLB miss; (3) page fault; (4) context switch;
a.
What would happen for the given
operation sequence, for shadow page table, and nested page table respectively?
b.
Assuming an x86-based four-level page
table in both guest and nested page table, how many memory references are
needed to service a TLB miss ofr
native versus nested page table?
c.
Among TLB miss rate, TLB miss latency,
page fault rate, and page fault handler latency, which metrics are more
important for shadow page table? What are important for nested page table?
The
following table shows parameters for a shadow paging system.
TLB misses per
1000 instruction |
NPT TLB miss latency |
Page faults
per 1000 instruction |
Shadowing page
fault overhead |
0.2 |
200
cycles |
0.001 |
30000
cycles |
d.
For a benchmark with native execution
CPI of 1, what are the CPI numbers if using shadow page tables versus NPT (assuming
only page table virtualization overhead)?
e.
What techniques can be used to reduce
page table shadowing induced overhead?
f.
What techniques can be used to reduce
NPT induced overhead?