Your reading for 11.1, 11.2
A note about that first reading: It’s a bit dated and Hadoop has advanced since that article. Much of the focus in the open source community has been on side projects tied to Hadoop. One common theme is that analytics and better user interfaces are being layered onto Hadoop. Most companies would use Hadoop via companies like Cloudera and Hortonworks. These companies package Hadoop and sell services and support. To see what I mean re Hadoop and its other projects see the primary Apache page. For our purposes, we’ll keep Hadoop high level, but in the data science department, internship interviews etc you may want to know about projects like Hive, Cassandra, Pig and Spark.
Assignment 4: Group Project: Due April 27 at 3 p.m.; Presentations April 30
Here are the assignment instructions. Groups MUST be 4 to 5 members. You may not do this assignment on your own or in smaller groups than 5. Note that the date on the assignment is incorrect.
Once we form groups April 2 the following deadlines will apply:
April 9: Need your idea you’ll examine in the assignment for approval.
April. 16: Need a note that your group has met and set individual deliverables for the group.
For these interim deadlines all I need is an email from each group leader detailing the team, the topic and rough plan. The main goal is to account for idea changes (many of you will course correct after exploring the data). I’m here to help you focus, refine, find sources etc.
The assignment is due April 27, 2017 at 3 p.m. We’ll do the presentations Monday, April 30.
Study Guide for Exam 2
Here is the study guide for the second exam. And here’s the more detailed version.
Agenda for the exam will be to:
–Re-form groups for last group project at the beginning.
–Take test
In-Class Exercise 9.2: Creating Interactive Dashboards
Here is the exercise.
And here is the Excel workbook you’ll need [Pew Story Data (Jan – May 2012).xlsx]
In-Class Exercise 9.1: Connecting Diverse Data
Here is the exercise.
And here are the workbooks [2012 Presidential Election Results by District.xlsx and Portrait 113th Congress.xlsx]
Weekly Question #7: Complete by March 26, 2018
Leave your response as a comment on this post by the beginning of class on March 26, 2018. Remember, it only needs to be three or four sentences. For these weekly questions, I’m mainly interested in your opinions, not so much particular “facts” from the class!
Here is the question:
Give an example of a KPI – some sort of metric for performance – that you use on a regular basis? Briefly discuss how it conforms to the SMART criteria.
(For example, my car tells me its average gas mileage. This is specific and measurable – gas mileage is a precise measure. It is achievable – I can alter my driving to try to get better mileage. It’s relevant – gas mileage has an impact on my costs! And it’s time-variant, I can look at gas mileage over a week, or a day, or a month.)
Reading Quiz #7: Complete by March. 26, 2018
Some quick instructions:
- You must complete the quiz by the start of class on March 26, 2018.
- When you click on the link, you may see a Google sign in screen. Use your AccessNet ID and password to sign in. It will then take you to the quiz.
If it says you don’t have access, make sure you’re signed out of your regular Gmail (non-TUMail) account! - You can only do the quiz once. If you submit multiple times, I’ll only use the first (oldest) one.
- This is “open book” – you can use the articles to answer the questions – but do not get help from anyone else.
Ready? Take the quiz by clicking this link.
Your reading for the week (data integration)
In-Class Exercise 8.2: Visualizing KPIs
Here is the exercise.
And here is the spreadsheet to complete the exercise [In-Class Exercise 8.2 – OnTime Airline Stats [Jan 2014].xlsx].
Data is beautiful data science, visualization link worth checking out
I came across this story on someone on Reddit visualizing his Tinder experience only to find another person did their 500-day OKCupid outcomes.
Both of the data sets (along worth a bunch of others) are on the Data is Beautiful Reddit. The thread highlights the democratization of visualizing data. Worth checking out for giggles.