100% Guaranteed Results


Stat656 – Week
$ 24.99
Category:

Description

5/5 – (1 vote)

Assignment: You can complete one of the two parts to this assignment, or both. If you do both parts, you can obtain extra points for this assignment. You are expected to complete at least one part successfully.

Data Files:
TextFiles -> A directory of 8 text files, each a different book. SAS and Python will use the same data files.

Part 1: Create a SAS EM project names “Week 9 Homework”. In that project read this data file using the file import node from the text mining tab in SAS EM. Construct the term/document matrix for the following four scenarios using the Parse node in SAS EM.

Scenario Remove Stop Words POS Stem
1 Yes Yes Yes
2 Yes No Yes
3 Yes No No
4 No No No

Report a screen shot of the diagram and the file import property window.

For each of the four scenarios, report the following.
a. Screen shot of the parse node properties.
b. The total number of terms extracted.
c. A table showing the top 20 terms along with the document counts for each term. The top 20 terms are the 20 terms with the highest frequencies (term counts).

Part 2: Do the same assignment as Part 1 using Python.

Use pandas to read the 8 text documents, and NLTK to prepare the term/document matrices described in Part 1.

Report the following:

a. Your python code
b. Run the 4 scenarios described in Part 1 For each scenario report:
1. The total number of terms extracted for that scenario.
2. The top twenty terms sorted by the number of times each term occurs among the 20 documents.

Reviews

There are no reviews yet.

Be the first to review “Stat656 – Week”

Your email address will not be published. Required fields are marked *

Related products