Here is the study guide for the third (final) exam.
Instructor
What you need to know about Week 13
- Read the following
- Model Training with Machine Learning – Data Science Primer; Note that the Dropbox PDF looks off so use the link.
- UNSW – 2020 – Descriptive, Predictive, Prescriptive Analytics
2. Take Quiz
3. Remember group projects due to me April 26 at 3 pm. Groups will present to class April 29. We will also do exam 3 review then.
4. Bonus: Confidence Interval Explained
- There are numerous online resources explaining the idea of confidence intervals. Here a short video without too many statistical nuances.
- Here’s a machine learning overview that’s pretty handy.
What you need to know for Week 12
- Read the following:
- Feldman – 2013 – Techniques and applications for sentiment analysis
- Grubbs et al. – 2020 – Understanding Political Twitter
- Hanna et al. – 2021 – What is unstructured data?
2. Take Quiz
3. Group project deliverable by end of class April 15. Need topic fleshed out and outline what each person in group is doing.
4. Bonus
- Take a look at the fascinating video of how Tesla sees the traffic environment. This is an example of an extremely sophisticated unstructured data analysis in real time.
Handy links
- Sentiment Viz
- StockTwits
- Corporate efforts: IBM Tone Analyzer, AWS Comprehend
Assignment 3: Final (Group) Project due April 26, presentations April 29
This is the team project.
Here are the assignment instructions and grading criteria. Groups MUST be 4 to 5 members. Form your groups in this Google Doc. You may not do this assignment on your own.
Once we form groups the following incremental deadlines will apply:
April 8 (end of class): Need your idea you’ll examine in the assignment for approval.
April 15 (end of class): Need a note that your group has met and set individual deliverables for the group.
For these interim deadlines all I need is an email from each group leader detailing the team, the topic and rough plan. The main goal is to account for idea changes (many of you will course correct after exploring the data). I’m here to help you focus, refine, find sources etc.
One more thing: COVID-19 can’t be a topic.
The assignment is due Friday, April 26 at 3 pm. We’ll do the presentations Monday, April 29.
What you need to know about Week 11
- Read the following.
- CFI – Pivot Table Guide
- Durcevic – 2020 – Move Beyond Excel, PowerPoint & Static Business Reporting with Powerful Interactive Dashboards
2. Take Quiz
3. Make sure you’re on a team for Group assignment
Note that by the end of class April 8, I will need a topic that the group will focus on (you can change, tweak later if warranted but need you to think about the narrative etc.). We will give you time at the beginning of class to select a team if you’re not on one.
4. Bonus: Great Tutorial on Excel Pivot Table Feature
This video provides a great introduction to doing pivot tables in Excel.
Study guide for Exam 2
Here is the study guide for the second exam, which will be held on Canvas with availability at 5:30 pm Monday, April. 1.
In addition, Sajin will hold an exam review session. Look for email separately.
What you need to know about week 9
- Read the following:
2. Take Quiz
3. Send me extra credit 2 if you haven’t already
4. More background on KPIs
Wikipedia has fairly good articles on KPIs (https://en.wikipedia.org/wiki/Performance_indicator) and the SMART criteria (https://en.wikipedia.org/wiki/SMART_criteria).
What you need to know about week 8
- Read the following:
- IBM – 2020 – What is ETL (Extract, Transform, Load)?
- Extra but not required: Apache Hadoop projects in 8.2 we’ll talk a bit about Hadoop and MapReduce. It’s worth noting the more updated projects and the concepts in this video.
2. Take Quiz
3. Download Tableau Prep here (you’ll need it for the ETL in class exercise).
4. See extra credit opportunity
Extra credit No. 2: Due 3/25
For the second extra credit, I’m looking to connect your passion to data sets. Here’s the assignment, which is good for two points to your final grade.
- Think about something your passionate about and want to explore more. Then think about two or three data sets that would apply to that passion. From there, map out a plan of what you’d look for.
Deliverable: Two or three paragraphs on your passion, why it matters and how data matters to it.
- Look into those data sets, form some ideas and map out how it could alter your career, enhance it or at least help you get others as passionate about your topic as you are.
Deliverable: Another two or three paragraphs.
The goal here is to get you thinking about how data connects with pretty much anything, but especially your career path. In recent semesters, we’ve come to the final project and I’d see topics like food insecurity, Social Security, and various social issues and thought that I wish I knew that earlier.
Bonus with this extra credit is that if you have a passion and three other like-minded people you have your team for assignment 3.
Happy to be a sounding board so reach out as needed.
What you need to know for Week 7
- Read the following:
- Moss – 2021 – Five Times Excel Led to Disaster
- Redman – 2013 – Data’s Credibility Problem
- Rosenblum and Dorsey – 2014 – Knowing Just Enough about Relational Databases
- Tableau – Guide to data cleaning
2. Take Quiz.
3. Look over class materials for 3/11. Note that we have an assignment due that Friday, but unlikely to provide a lot of class time to completing it this time. It may be beneficial to look over 7.2 in class assignment as well as assignment 2.
4. Assignment 2: Data cleansing in Excel due 3/15, 3 p.m. Vandelay spreadsheet. Will need spreadsheet as well as answer sheet emailed to me.
5. Additional stuff.
What is a Relational Database?
This video demonstrates differences between a relational database and an Excel spreadsheet for storing data:
How Bad Data Ruins Your Day
Here are some of my favorite horror stories how things went wrong due to bad data:
https://www.simscale.com/blog/2017/12/nasa-mars-climate-orbiter-metric/
(A blog post with examples – including the classic NASA mishap – of how things go wrong when people get data types wrong.)