100% Guaranteed Results


NLP – Natural Language Processing: Solved
$ 24.99
Category:

Description

5/5 – (1 vote)

Assignment 4: What are you asking me?
Jordan Boyd-Graber
Introduction
As always, check out the Github repository with the course homework templates:
git://github.com/ezubaric/cl1-hw.git
The code for this homework is in the hw4 directory.
The aim of this assignment is to do text classification on trivia questions, sorting them into their appropriate category. We’ll be using the Na¨ıve Bayes classifier provided by nltk.
Unlike previous assignments, the code provided with this assignment has all of the functionality required. Your job is to make the functionality better by improving the features the code uses for text classification.
About the Data
First, visit the Kaggle site and download the two csv files with the training and test data and place them in your hw4 directory.
Quiz bowl is an academic competition between schools in English-speaking countries; hundreds of teams compete in dozens of tournaments each year. Quiz bowl is different from Jeopardy, a recent application area. While Jeopardy also uses signaling devices, these are only usable after a question is completed (interrupting Jeopardy’s questions would make for bad television). Thus, Jeopardy is rapacious classification followed by a race—among those who know the answer—to punch a button first. Here’s an example of a quiz bowl question:
Relativity.
answer: Albert Einstein
Two teams listen to the same question. Teams interrupt the question at any point by “buzzing in”; if the answer is correct, the team gets points and the next question is read. Otherwise, the team loses points and the other team can answer.
Classifying Category
There are many kinds of questions asked in these tournaments: science (as above), literature, history, etc. The goal of this project is to create an automated system that predicts the category of a question as accurately as possible.
These data will be the subject of the final project (you’ll help to answer the questions), so this will be a useful warmup to help you get to know these data a little bit better.
Submission
In addition to turning in your code on Moodle, you’ll also need to submit your predictions on Kaggle, an online tournament site for machine learning competitions.
In addition, please turn in a file called explanation.txt explaining your process of creating additional features. Make sure you state your username there.
Your username should be of the form CU Firstname.Lastname so that we can easily map it to your grade.
How this Assignment is Graded (35+ points)
You’ll get full credit on this assignment (35 points) if you can significantly improve on the baseline system (as reported by the Kaggle system). If you can do much better than your peers, you can earn extra credit (up to 15 points).
Questions / Hints
• Don’t use all the data until you’re ready. Use the –subsample option to use a subset of the data to see how you’re doing on smaller datasets.

Reviews

There are no reviews yet.

Be the first to review “NLP – Natural Language Processing: Solved”

Your email address will not be published. Required fields are marked *

Related products