Description
ASSIGNMENT-4: DECISION TREE
(Read all the instructions carefully & adhere to them.)
Total Credit: 20
Instructions:
2. Markings will be based on the correctness and soundness of the outputs.
3. Proper indentation and appropriate comments are mandatory.
4. Make proper documentation of all results and observations with their analysis.
4. You should zip all the required files and name the zip file as: roll_no_of_all_group_members .zip , eg. 1501cs11_1201cs03_1621cs05.zip.
5. Upload your assignment (the zip file) in the following link:
https://www.dropbox.com/request/bY81nxr2GILp5apbmkFl For any queries regarding this assignment you can contact:
Aizan Zafar ( aizanzafar@gmail.com ) or
Kshitij Mishra ( kmishra.kings@gmail.com )
Write a Python program that implements Question classification using Decision Tree classifier.
Example
Question: What is the temperature at the center of the earth ? Class: NUM, which refers to the question that looks for the numeric type answer.
Dataset
Training Set: http://cogcomp.org/Data/QA/QC/train_5500.label Test Set: http://cogcomp.org/Data/QA/QC/TREC_10.label.
Use only the coarse grained class label to build your model. For more details about the dataset follow these paper: https://goo.gl/jAJFKQ
Features
(a) Length of the question
(b) Lexical Features: Word n-gram.
(c) Syntactic Features: Parts of speech tag unigrams.
Result and Evaluation
● Report the 10-fold cross-validation results in terms of precision, recall, and F-score.
● Report results of feature ablation study and state which feature has contributed most towards correctly predicting a particular class
● Report precision, recall, and F-score measures on test sets using models based on the gini index, mis-classification error and cross-entropy.
● Show whether errors propagated by one model are corrected by other models or not. If yes, then report how many percent of samples are corrected.
Ex. Observe how many samples are mis-classified using gini index based model but correctly classified by mis-classification error and cross-entropy based model.




Reviews
There are no reviews yet.