CSED211 Homework 3 Solved

Description

5/5 – (1 vote)

20190084 권민재
1. Exercise 3.69
You are charged with maintaining a large C program, and you come across the following code:

The declarations of the compile-time constant CNT and the structure a_struct are in a file for which you do not have the necessary access privilege. Fortunately, you have a copy of the .o version of code, which you are able to disassemble with the objdump program, yielding the following disassembly:

Using your reverse engineering skills, deduce the following:
A. The value of CNT
Solution

Answer
7
B. A complete declaration of structure a_struct. Assume that the only fields in this structure are idx and x, and that both of these contain signed values.
Solution
mov 0x8(%rax),%rdx movslq %ecx,%rcx mov %rcx,0x10(%rax,%rdx,8)
mov 0x8(%rax),%rdx에서, ap->idx가 ap+8에 존재하는 것을 알 수 있고 (배열의 인덱스로 사용되므로), movslq에서 이의 자료형
은 long 인 것을 알 수 있다. 또한, mov %rcx,0x10(%rax,%rdx,8)에서, ap->x는 ap+0x10부터 존재하는 8바이트 단위의 배열임 을 알 수 있는데, a_struct의 크기가 40바이트 이므로, ap->x의 크기는 로 추정할 수 있므로, 여기서 배열에 8바이트 자료 형이 4번 들어갈 수 있다는 것을 알 수 있다.
Answer

2. Exercise 3.72
(a) C code

Solution
leaq 30(,%rdi,8), %rax # rax = n * 8 + 30
andq $-16, %rax # rax = (n * 8 + 30) & 0xFFFFFFFFFFFFFFF0, rax의 값이 16 이상 239 이하 임 을 보장하는 동시에, 메모리 얼라인을 맞춘다.
subq %rax, %rsp # s2 = s1 – ((n * 8 + 30) & 0xFFFFFFFFFFFFFFF0)
Answer

B. Explain, in mathematical terms, the logic in the computation of .
Solution

Answer

C. Find values of and that lead to minimum and maximum values of .
Minimum
이 16의 배수이고, 이 짝수이면, 메모리 alignment를 맞추기위해 더 소모하는 바이트가 없으므로 이 최소화된다.
Maximum
이 16의 배수보다 1 큰 수이고, 이 홀수라면, 메모리 alignment를 맞추기 위해 각각 15바이트와 8바이트를 더 쓰게 되어 이 최대화
된다.
D. What alignment properties does this code guarantee for the values of and ?
를 할당하고 지정하는데에는 rax , r8의 값이 그 주소로 쓰이는데, andq $-16, %rax , andq $-16, %r8에 의해 rax , r8이 16
의 배수로 보장되므로, 는 메모리 alignment를 맞출 수 있게 된다. 이후, subq %rax, %rsp에 의해 rsp는 메모리 alignment된 값을 가지게 되며, 이것은 곧 이므로 는 메모리 align이 맞춰지게 된다.
3. Exercise 5.13

Our measurements show that this function has CPEs of 1.50 for integer data and 3.00 for floating-point data. For data type double, the x86-64 assembly code for the inner loop is as follows:
# Inner loop of inner4. data_t = double, OP = *
# udata in %rbp, vdata in %rax, sum in %xmm0
# i in %rcx, limit in %rbx .L15: # loop:
vmovsd 0(%rbp,%rcx,8), %xmm1 # Get udata[i] vmulsd (%rax,%rcx,8), %xmm1, %xmm1 # Multiply by vdata[i] vaddsd %xmm1, %xmm0, %xmm0 # Add to sum addq $1, %rcx # Increment i cmpq %rbx, %rcx # Compare i:limit jne .L15 # If !=, goto loop
Assume that the functional units have the characteristics listed in Figure 5.12.
A. Diagram how this instruction sequence would be decoded into operations and show how the data dependencies between them would create a critical path of operations, in the style of Figures 5.13 and 5.14.

B. For data type double, what lower bound on the CPE is determined by the critical path?
Critical path는 sum을 업데이트하는 과정에서 형성되며, 이때 이 path에 floating point addition이 존재하므로, lower bound는 floating point addition의 CPE에 의해 결정된다.
C. Assuming similar instruction sequences for the integer code as well, what lower bound on the CPE is determined by the critical path for integer data?
Integer-add 연산에 의해 결정되며, 이것의 CPE는 1.0이다.
D. Explain how the floating-point versions can have CPEs of 3.00, even though the multiplication operation requires 5 clock cycles.
곱셈 연산이 더 큰 CPE 값을 가지지만, 곱셈 연산은 critical path에 포함되지 않기 때문에 muliplier를 통해 파이프라이닝 될 수 있다.
4. Exercise 5.17
The library function memset has the following prototype: void *memset(void *s, int c, size_t n);
This function fills n bytes of the memory area starting at s with copies of the low- order byte of c. For example, it can be used to zero out a region of memory by giving argument 0 for c, but other values are possible. The following is a straightforward implementation of memset:

Implement a more efficient version of the function by using a word of data type unsigned long to pack eight copies of c, and then step through the region using word-level writes. You might find it helpful to do additional loop unrolling as well. On our reference machine, we were able to reduce the CPE from 1.00 for the straightforward implementation to 0.127. That is, the program is able to write 8 bytes every clock cycle.
Answer

5. Exercise 5.19
In Problem 5.12, we were able to reduce the CPE for the prefix-sum computation to 3.00, limited by the latency of floating-point addition on this machine. Simple loop unrolling does not improve things.
Using a combination of loop unrolling and reassociation, write code for a prefix sum that achieves a CPE less than the latency of floating-point addition on your machine. Doing this requires actually increasing the number of additions performed. For example, our version with two-way unrolling requires three ad- ditions per iteration, while our version with four-way unrolling requires five. Our best implementation achieves a CPE of 1.67 on our reference machine. Determine how the throughput and latency limits of your machine limit the minimum CPE you can achieve for the prefix-sum operation.
Answer

6. Exercise 6.24
Suppose that a 2 MB file consisting of 512-byte logical blocks is stored on a disk drive with the following characteristics:
Parameter Value
Rotational rate 18,000 RPM
8 ms
Average number of sectors/track 2,000
Surfaces 4
Sector size 512 bytes
For each case below, suppose that a program reads the logical blocks of the file sequentially, one after the other, and that the time to position the head over the first block is .
A. Best case: Estimate the optimal time (in ms) required to read the file given the best possible mapping of logical blocks to disk sectors (i.e., sequential).

B. Random case: Estimate the time (in ms) required to read the file if blocks are mapped randomly to disk sectors.
Because blocks are not arranged sequentially, they need to be approached randomly for each block, which is like head over for every block.

Reviews

There are no reviews yet.

Be the first to review “CSED211 Homework 3 Solved”

Description

Reviews

Related products

CSED211 – Instructor: Prof. Jisung Park Solved

CSED211 – Instructor: Prof. Jisung Park Solved

CSED211 Homework 5 Solved

CSED211 Homework 4 Solved

CSED211 – Malloc Lab: Writing a Dynamic Storage Allocator Solved