100% Guaranteed Results


CS610 – Solved
$ 25.00
Category:

Description

5/5 – (1 vote)

CS610 – Assignment 2

Shivam Sharma (210983, sshivam21@iitk.ac.in)
1 Problem 1
NOTE – The orignal testcase given was too small for any meaningful benchmarking and perf c2c could not sample enough times from it, so every benchmark here is on a file split into 5 files of 1MB each, the input is biginput. The mentioned file is also submitted.
The given code runs in about 145ms on my machine. Running perf c2c gives the following output –

1.1 False Sharing of tracker.word_count
Fix – Seperate the cache lines by adding 56 byte padding and aligning the start address to 64. This is implemented in problem1_pad_shared.cpp and reduced the runtime to 110ms. Running perf now
gives,

We have successfully avoided one false sharing.
1.2 Unnecessary Contention of Locks
The threads continuously try to accquire tracker.word_count_mutex and line_count_mutex, but the critical section just contains of incrementing an integer. This leads to unnecessary contention of these locks and false sharing of the cache lines containing tracker.total_words_processed and tracker.total_lines_processed.
Fix – Maintain the counts in a local variable for each thread. When the thread is about to exit, acquire the locks and update the global tracker. This is implemented in problem1_less_contention.cpp and reduced the runtime to 39ms. Note that this run only contains of this optimization, not the previous one. Running perf now gives,

Finally, both of these are implemented in problem1_opt.cpp and the runtime is 18ms. The speedup from the original code is 8.05 times. Running perf now gives,

2 Problem 2
The following points describe the various synchronization issues in the problem and the approach taken to solve them,
2.1 Starting of Threads
It is mentioned in the problem that the threads should contend for the file after all of them are created. A pthread_barrier_t reference is passed to all threads and they call wait on it.
2.2 Atomic Read of L Lines
The threads are required to read L lines atomically. For this a pointer to AtomicFile is passed to all threads, which is a wrapper around a file. A thread has to accquire a mutex before reading and releases it when the read of L lines is complete.
2.3 Atomic Write of L lines into the Buffer
3 Problem 3
For every pair of addresses, that contain at least one write, let’s perform the delta test,
3.1 A(i,j − i) and A(i,j − i − 1)
Checking for flow dependence,
2
i = i + ∆i =⇒ ∆i = 0
j − i = (j + ∆j) − (i + ∆i) − 1 =⇒ ∆j = 1
The direction vector is (o,+), which is valid and the Flow Dependence is carried by loop j. Checking for anti dependence,
i + ∆i = i =⇒ ∆i = 0
(j + ∆j) − (i + ∆i) = j − i − 1 =⇒ ∆j = −1
The direction vector is (o,−), which is invalid, so No anti dependence.
3.2 A(i,j − i) and A(i + 1,j − i)
Checking for flow dependence,
i = i + ∆i + 1 =⇒ ∆i = −1
j − i = (j + ∆j) − (i + ∆i) =⇒ ∆j = −1
The direction vector is (−,−), which is invalid, so No Flow dependence. Checking for anti dependence,
i + ∆i = i + 1 =⇒ ∆i = 1
(j + ∆j) − (i + ∆i) = j − i =⇒ ∆j = 1
The direction vector is (+,+), which is valid, so Anti Dependence is carried by loop i.
3.3 A(i,j − i) and A(i − 1,j + i − 1)
Checking for flow dependence,
i = i + ∆i − 1 =⇒ ∆i = 1
j − i = (j + ∆j) + (i + ∆i) − 1 =⇒ ∆j = −2 ∗ i
The direction vector is (+,−), which is valid, so Flow dependence is carried by loop i. Checking for anti dependence,
i + ∆i = i − 1 =⇒ ∆i = −1
(j + ∆j) − (i + ∆i) = j + i − 1 =⇒ ∆j = 2 ∗ i − 2
The direction vector is (−,+), which is invalid, so No Anti Dependence.
3.4 A(i,j − i) and A(i,j − i)
Checking for output dependence,
i = i + ∆i =⇒ ∆i = 0
j − i = (j + ∆j) + (i + ∆i) =⇒ ∆j = 0
So, No output dependence.
3

Reviews

There are no reviews yet.

Be the first to review “CS610 – Solved”

Your email address will not be published. Required fields are marked *

Related products