CS610 – Solved – Assignment Hub

Description

5/5 – (1 vote)

CS610 – Assignment 1

Shivam Sharma (210983, sshivam21@iitk.ac.in)
1 Problem 1
Total Cache Size = 256KB = 218 bytes Line Size = 64B = 26 bytes Word Size = 4 bytes
Total Number of Sets = sets
Number of Elements in A = 32K = 215 bytes
Each cache line contains 16 floats. Without loss of generality, we will assume that A[0] is block aligned and maps to the 0th set.
1.1 Stride = 1
Consider the inner loop. The first access to A[0] will result in a cold miss and the cache block will be stored in S0. Subsequent missess will be at A[16]− > S1, A[32]− > S2 and so on after every 16 iterations of the inner loop. In this manner all sets will store one cache line, until A[213], which maps back to S0. In general we can say that A[213∗ j], where j >= 0 and 213∗ j < 215 will map to S0.
Here, j ∈{0,1,2,3}. That means 4 out of 8 available lines in S0 get filled in. Similar analysis holds true for all remaining sets. By the end of one iteration of the outer loop, entire A is stored in cache without any conflict, hence for it > 0, there will be no misses.
Cold Misses Capacity Misses Conflit Misses
0 0
1.2 Stride = 4
When it = 0 every 4th access with lead to a cold miss. After this, entire A is cached, so no misses for it > 0.
Cold Misses Capacity Misses Conflit Misses
0 0
1.3 Stride = 16, 64, 2K, 8K, 16K, 32K
Now our stride is greater the number of floats that fit inside the cache line, so for it = 0, every access will lead to a cold miss. Now the required cache lines fit inside the cache, so no further misses for it > 0.
Stride Cold Misses Capacity Misses Conflit Misses
16 0 0
64 0 0
2K 0 0
8K 0 0 16K 0 0
32K 0 0
2 Problem 2
Cache Size = 216 words Line Size = 23 words
Total Cache Lines = Cache SizeLine Size = 213 Lines
N = 1024 = 210
2.1 kij form
2.1.1 Direct Mapped Cache
• Matrix C
– j – Stride 1 access of a single row, misses.
– i – Accessing new row for the first time, N misses.
– k – The matrix C contains 220 words, which is greater than the cache size, N misses.
• Matrix A
– j – No dependency on j.
– i – Columnwise(Stride = 1024 > Line Size) access, miss on every access, N misses.
– k – If the entire column fits in the cache, there would be no misses for the next 7 iterations. Let us check,
If A[0][0] maps to S0, then A[1][0] would map to Line SizeRow Size = 128th set. In general,
• Matrix B
– j – Stride 1 access of a single row, misses.
– i – Row fits in the cache, 1 miss.
– k – Columnwise(Stride = 1024 > Line Size) access, miss on every access, N misses.
C A B
k i j
Total
Table 1: Direct Mapped Cache
2.1.2 Fully Associative Cache
• Matrix C – No difference in analysis from direct mapped cache.
• Matrix A
– j – No dependency on j.
– i – Columnwise(Stride = 1024 > Line Size) access, miss on every access, N misses.
– k – Now since the the cache is fully associative, 8 elements for every row are stored in the cache for every miss in i(Total Lines occupied = 210 < Availabe Lines) without any conflit, misses.
• Matrix B – No difference in analysis from direct mapped cache.
k i j
Total
Table 2: Fully Associative Cache
2.2 jik form
2.2.1 Direct Mapped Cache
• Matrix C
– k – No relation
– i – Columnwise access, N misses.
• Matrix A
– k – Rowwise access, misses.
– i – Columnwise(Stride = 1024 > Line Size) access, miss on every access, N misses. – j – The entire matrix does not fit in the cache, N misses.
• Matrix B
– k – Columnwise access, N misses.
– i – Entire column does not fit in cache, N misses.
– j – Entire column does not fit in cache, N misses.
C A B
j i k
Total
Table 3: Direct Mapped Cache
2.2.2 Fully Associative Cache
• Matrix C
– k – No relation
– i – Columnwise access, N misses.
– j – Now entire column fits in cache, misses.
• Matrix A – No difference in analysis from direct mapped cache.
• Matrix B
– k – Columnwise access, N misses.
– i – Entire column fits in cache, we are accessing same elements in every iteration, 1 miss.
– j – Now entire column fits in cache, misses.
j i k
Total
Table 4: Fully Associative Cache
3 Problem 3
Cache Size = 16MB = 224 bytes Line Size = 32B = 25 bytes
Word Size = 8 bytes N = 4096 = 212
Total Number of Sets =
4 words fit in a cache line
• Array y
– i – Stride 1 access, misses.
– j – The entire array occupies contiguous lines, which is less than the available lines. So, after one iteration for j, the entire array is cached.
– k – Array is cached after 1st interation, 1 miss;
• Matrix A
– i – Column access(Stride = N), N misses.
– k – Again, since the matrix does not fit in the cache, there is a miss on every iteration, N misses.
• Matrix X
– i – No relation.
– j – Row access( misses.
– k – Access for a new row for every k, N misses.
y A X k j
i
Total
4 Problem 4
All experiments were performed on the csews25 machine from the KD lab, with N = 2048. The machine has an Intel i7-8700 CPU with the following cache hierarchy,
Cache Size Line Size Number of Lines Associativity
L1 dcache(Private) 32KB 64B 512 8
L1 icache(Private) 32KB 64B 512 8
L2 cache(Private) 256KB 64B 4096 4
L3 cache(Shared) 12MB 64B 196608 16
Table 5: Cache hierarchy of system
On average, time taken by the non-blocking version was 95.34 seconds. PAPI 7.1.0.0 was used to measure L1 and L2 data cache misses using the events PAPI_L1_DCM and PAPI_L2_DCM.
A Block B Block C Block Time Taken(s) Speedup L1 misses L2 misses
4×4 4×4 4×4 39.48 2.41 1186301215 1639951927
8×8 8×8 8×8 29.49 3.23 668970399 941555202
16×16 16×16 16×16 29.15 3.27 8748386352 2197286503
32×32 32×32 32×32 33.60 2.83 8655644307 8309173839
64×64 64×64 64×64 42.95 2.21 8627149814 23330600520
8×8 8×32 8×32 26.90 3.54 507503367 518608249
8×32 32×8 8×8 34.81 2.73 8727855999 8705780910
8×8 8×16 8×16 28.13 3.38 581711524 783756764
32×32 32×8 32×8 33.56 2.84 8826605740 6907131695
Table 6: Experimental Results

Reviews

There are no reviews yet.

Be the first to review “CS610 – Solved”

CS610 – Solved

Description

Reviews

Related products

CS610 – Solved

CS610 – Solved

CS610 – Solved

CS610 – General Policies Solved

CS610 – General Policies Solved