100% Guaranteed Results


Statistical NLP – Solved
$ 24.99
Category:

Description

5/5 – (1 vote)

Deliverables:
• Answers to problem set questions. If you use good handwriting, you are welcome to handwrite them and email us a scanned copy.
1 Course Rules / Cheating Policy
I understand that most students would never consider cheating. There is, however, a fraction of students for whom this is not the case. To make sure we have a common understanding of what the course rules are, I ask you to print the last page of this assignment and acknowledge the rules by signing it. Please hand in your signed copy in class.
2 Problem Set ( 45 points)
Consider n-gram language modeling:

Let V denote the vocabulary size (typically around a million), n the n-gram order (typically around 5), and C the number of tokens in the corpus used to compute the relevant counts (typically around 1 billion).
1. (15 points) Characterize the memory complexity of Kneser Ney language models in big-O notation. You should decide what variables are important to model (Hint: think about how to store the relevant counts efficiently).
2. (15 points) The term inference time refers to one call to the language model i.e. computing under the language model for one choice of . Characterize the inference time complexity of Kneser Ney language models in big-O notation. You should decide what variables are important to model (Hint: Think about which quantities can be cached efficiently to speed up inference time).
3. (15 points) Recall absolute discounting for n = 2:

For n = 2 (a bigram model), give an example of a small vocabulary V and corpus C where absolute discounting does not preserve the marginal constraint i.e.
Pb(wi) 6= X Pad(wi|wi−1)Pb(wi−1) (1)
wi−1
3 Course Rules / Cheating Policy
I understand that most students would never consider cheating in any form. There is, however, a fraction of students for whom this is not the case. To make sure we have a common understanding of what the course rules are, I ask you to print this page and acknowledge the rules by signing it. Please hand in your signed copy in class.
The rules below are adapted from Smith & Dyer at CMU (see their class Natural Language Processing (11-{4,6}11)).
• If you find an assignment’s answer, partial answer, or helpful material in published literature or on the Web, you must cite it appropriately. Don’t claim to have come up with an idea that wasn’t originally yours; instead, explain it in your own words and make it clear where it came from.
• On the course project, you are encouraged to use existing NLP tools. You must acknowledge these appropriately in all documentation, including your final report. If you aren?t sure whether a tool or data resource is appropriate for use on the project, because it appears to solve a major portion of the assignment or because the license for its use is not clear to you, or if you aren’t sure how to acknowledge a tool appropriately, you must speak with the course staff.
Clear examples of cheating include (but are not limited to):
• Showing a draft of a written solution to another student.
• Showing your code to another student.
• Getting help from someone or some resource that you do not acknowledge on your solution.
• Copying someone else’s solution to an assignment.
• Receiving class related information from a student who has already taken the exam.
• Attempting to hack any part of the course infrastructure.
• Lying to the course staff.
I hereby acknowledge that I have read and understood the course rules.
Name:
Signature:

Reviews

There are no reviews yet.

Be the first to review “Statistical NLP – Solved”

Your email address will not be published. Required fields are marked *

Related products