Description
Overview & Goals
This is the hands-on section of the course. The practice of data science largely boils down to writing code. So although this is not a programming class per se, the majority of the sections and assignments will center around using Python to do data science.
Not everything you are expected to do will be explicitly mapped out, step by step. Not only would that level of direction be prohibitively long to prepare, it would ultimately be dishonest and unhelpful; data science is often about figuring things out. If clear, step-by-step instructions could be written down that always worked, data science would be automated and this class would be moot. The human level of figuring things out when the answers are not entirely clear and there is uncertainty in how to proceed is the actual “practice” part of “data science in practice.” You are going to have to be okay with trying things out that don’t work, with getting stuck, with not quite knowing what is going on all the time. That’s the job.
The goal of this class is not to teach you (all of) data science; we can’t possibly do that in 10 weeks. The goal is to give you a meaningful introduction and hands-on experience of what data science is, and to provide you with a basis from which you can continue to learn data science. There are an amazing amount of resources out there for this topic. The difficulty, as a newcomer, is to figure out what it is you’re looking for, and how to find it; that is, figuring out what you don’t know, what you need to learn, and where you can find those answers. Technical experts don’t know all the answers, they just know where to find the answers and how to implement them.
A major goal of these sections and assignments is to show you what is available so that you know where to look if you want to keep going in data science.
Section Topics & Hands-On Materials
The tutorials are not intended, or written, as in-depth tutorials covering everything about the topics. Instead they are more like an index, or a map. For each topic, we aim to give a cursory overview and simple demonstration of what the topic under investigation is, and guide you to bigger and better resources to really dive into it. You are not expected to, and will not be able to, follow every link for every topic; pick the ones you are most interested in and/or are the most helpful ones to get you unstuck for the assignments and/or project.
Section Attendance & Switching
Supported Tools
Officially, we will be using, assignments will require, and we will support the use of:
– python3 with the anaconda platform
– Jupyter Notebooks
– git & Github (optionally using the SourceTree GUI)
The assignments must be completed using these tools. You are welcome to explore other tools as you explore these topics, and to use different tools for the project. Note however, that we offer no guarantees that we can help with other languages / modules / tools, etc.
Assignments
Assignments will be done in Jupyter Notebooks. They will be released on Github (https://github.com/COGS108) and submitted to TritonED.
Assignment Schedule
The (tentative) schedule for assignments is as follows:
A2 – Data Exploration (12%) 11:59 pm, TBD
A3 – Data Privacy (12%) 11:59 pm, TBD
A4 – Data Analysis (12%) 11:59 pm, TBD
A5 – Nature Language Processing (12%) 11:59 pm, TBD
Using Jupyter Notebooks for Class Assignments
We will be using a system that allows for automatically grading notebook submissions. Notebooks will be released with step-by-step instructions on what code to enter. Follow these instructions for working with these notebooks:
– Whenever you see ‘# YOUR CODE HERE’, replace it with code to answer the question.
– Also, remove the ‘raise’ line, or the notebook will raise an error – Do not edit or delete and cell that has ‘assert’ lines in it.
– These lines are used to check your code. Editing them will be flagged as attempted cheating, and they will be reset to the original versions before grading.
– You can add new cells, and write extra code, as long as you follow what is written above.
– Your grade is partly graded on the public tests (the ‘asserts’) that are released with the assignment, and partly on a hidden set of tests.
Assignment Questions & Using Piazza
Grades
Assignment Regrades & Solutions




Reviews
There are no reviews yet.