CSC343 – Project Phase 2: Solved

Description

5/5 – (1 vote)

Schema Implementation, Data Cleaning, and Import
Schema implementation
• Define a primary key for every table.
• Wherever data in one table references another table, define a foreign key in SQL.
• Consider a NOT NULL constraint for every attribute in every table. It doesn’t always make sense, but it usually does — especially if your design is good.
• Express any other constraints that make sense for your domain.
• Every table must have a comment describing what it means for something to be in the table.
To facilitate repeated importing of the schema as you correct and revise it, begin your DDL file with our standard three lines:
drop schema if exists projectschema cascade; — You can choose a different schema name.
create schema projectschema; set search_path to projectschema;
Be sure that you can import your schema without errors.
Record your design decisions
If you decided on some changes on your accord, explain these also.
If you changed nothing, explain why. Saying that the schema was already good is not sufficient; you must explain how you know it is good.
Data cleaning and import
In this step, you will create the SQL statements necessary to import your data.
You have learned how to insert a row into a table using an INSERT INTO statement such as this:
INSERT INTO Student VALUES (00157, ’Leilani’, ’Lakemeyer’, ’UTM’, ’lani@cs’, 3.42);
You could populate an entire database with a long series of these statements, however there is an overhead cost associated with executing a SQL statement, and you will incur that cost for every individual INSERT INTO. A more efficient approach is to use the postrgreSQL command COPY. It lets you load all the rows of a table in one statement, so you incur the overhead cost only once, for the whole table. This is not only faster than INSERT INTO, it is also more convenient. You probably already have your data in a csv or formatted text file, and COPY lets you load that data directly (rather than having to covert the data into a series of INSERT INTO statements). For instance, if you had data in a comma-separated csv file called data.csv, you might say:
COPY Student from data.csv with csv
More about cleaning the data
Please re-read the section “Resources for data cleaning” from the phase 1 handout for some tips.
One way to find the constraint violations is to define all the constraints, import the data, and watch the errors fly by. But an early error can influence subsequent errors, making the process laborious. An alternative is to omit some or all of the constraints from the schema at first, import the data, and then run queries to find data that would violate if the constraint were present. Once you have resolved all the issues, you can clear out the database, import the full schema with constraints, and then import the cleaned up data.
Record your cleaning process
Keep a written record of the steps you take to clean and import your data, as well as the decisions you make along the way (and why). This should be detailed enough that someone else with the same data could follow your steps and end up with exactly the same database.
Hand in the following:
• A file called schema.ddl containing the schema expressed in the SQL Data Definition Language.
• The data itself: Pick one data file that is representative of the format your data is in, paste a few lines of that data into another file, and hand that it. Name it whatever is appropriate, but include ”data” somewhere in the filename (eg sample-data.csv, or country-data.json).
• A file called phase2.pdf with a short report containing the following sections: Design Decisions, and Cleaning Process. (See above for what to put in each section.) Your report should be roughly 2 pages long.

Reviews

There are no reviews yet.

Be the first to review “CSC343 – Project Phase 2: Solved”

CSC343 – Project Phase 2: Solved

Description

Reviews

Related products

CSC343 – goals Solved

CSC343 Phase 1 Submission Solved

CSC343 – Term Project Solved

CSC343 – goals Solved

CSC343 – Assignment 2 Solved