100% Guaranteed Results


DATA8001 Assignment 1 Solved
$ 29.99
Category:

Description

5/5 – (1 vote)

Summary
The DATA8001 Assignment 1 is worth 50% of your overall module score.
Download the zip file from Canvas corresponding to your student id and unzip the contents into your local assignment folder and ensure your files are similar to Figure 1.

Figure 1 – Example Assignment Folder & Files
Assignment Sections (50%)

Data ETL – 10%
Clean the dataset provided: data/R00000000_original.csv and save as data/R00000000_processed.csv replacing R00000000 with your CIT student number.
All code required to reproduce the data ETL process should be placed in the Python library file (at the bottom where indicated): lib/R00000000_util.py and able to be called from the Jupyter Notebook: R00000000_A1_Notebook.ipynb.

Original Data Headings
Column Name Column Description
car_reg the car registration plate
county the county car was purchased & registered
make the car manufacturers name
model the car model name
type the type of car (e.g., saloon, hatchback etc.)
colour the colour of the car
tax_band the tax band of the car
price the purchase price of the car in Euros

Processed Data Headings & Expected Data Types
Column Name Column Description Data Type
car_reg Cleaned car registration plate String (uppercase)
year The year the car was purchased Int
month The month the car was purchased Int
county Cleaned county name String (uppercase)
make Cleaned car manufacturers name String (uppercase)
model Cleaned car model name String (uppercase)
type Cleaned car type String (uppercase)
colour Cleaned colour of the car String (uppercase)
tax_band Cleaned tax band of the car String (uppercase)
price the purchase price of the car in Euros Float

Example

Data Visualisation – 10%
Load the processed dataset (data/R00000000_processed.csv) into the assignment notebook:
R00000000_A1_Notebook.ipynb and answer the 5 questions including 1 (& only 1) visualisation of your choice that best answers each question. Show your workings in the Jupyter Notebook for each question.

Data Modelling – 10%
Create a Linear Regression model and any transformations required to give your model the best accuracy. Using the Python class provided in lib/R00000000_util.py, save the object to the model folder as: model/R00000000.pkl.
All code required to reproduce the modelling process should be placed in the Python library file: lib/R00000000_util.py and able to be called from the Jupyter Notebook: R00000000_A1_Notebook.ipynb.
The pickled model file should be loaded and called from the Jupyter Notebook and available to process unseen test data including any transformations required to ensure the model works. Note: the unseen test data will have the same headings & datatypes as your data/R00000000_processed.csv file.

Report & Questions (15%)
Write a max 2-page report outlining the steps taken to complete the assignment. Identify any areas you feel are worth mentioning during the ETL, visualisation of modelling steps including any insights developed.

Presentation (5%)
Note: DO NOT submit any PowerPoint files as part of your project submission, they will not be graded.

Submission Details
Students should upload a zip file with the same name as the downloaded zip file (e.g., R00000000.zip) containing their completed work containing ONLY the folders & files listed in Figure 2.
Files:
• R00000000_A1_Notebook.ipynb – completed notebook to call ETL process, visuals and answers and modelling.
• R00000000_A1_Report.docx – 2-page report and 2 answer exam type questions
• data/R00000000_processed.csv – clean dataset
• lib/R00000000_util.py – all the Python code required to recreate your work.
• models/R00000000.pkl – your pickled model object (ML model and transformations)

Figure 2 – Example submission Folder & Files

Reviews

There are no reviews yet.

Be the first to review “DATA8001 Assignment 1 Solved”

Your email address will not be published. Required fields are marked *

Related products