MIS2502: Data Analytics (Spring 2017)

Instructor: Jing Gong, Section 002/004

Assignment #9: Association Rules (Due by Tuesday, May 2, 2017 at 11:59 pm)

Here are the assignment instructions and an answer sheet to submit (in Word format).

And here is the data set for the assignment (Groceries.csv).

This assignment is due by Tuesday, May 2, 2017 at 11:59 pm!

For this assignment, you’ll need to modify the R script you used in ICA #12.2 (aRules.r). To do this, you should finish the related in-class activity first (ICA #12.2- Association Rule Mining Using R).

Final Exam Study Guide

Here is the study guide for the final exam.

There will be an in-class review session on April 27. Below is the date/time for each section.

Date/Time:

  • Section 002 (the 3:30 – 4:50 class): Tuesday 5/9 at 1:00-2:30
  • Section 004 (the 2:00 – 3:20 class): Thursday 5/4 at 1:00-2:30

(We will have the room until 3:00 pm but we will only use 1.5 hours for the final exam)

Place: Regular classroom

Assignment #8: Clustering Using R (Due Tuesday, April 25, 2017 at 11:59 pm)

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need: Jeans.csv. (Right click to download the file. Make sure that the name doesn’t change)

This assignment is due Tuesday, April 25, 2017 at 11:59 pm.

For this assignment you’ll need to modify the R script you used in In-Class Exercise #11 (Clustering.R). To do this, you should finish the related in-class exercise first (ICA #11 – Clustering Using R).

Need Extra Credit? Or Professional Achievement Points? Read on…

If you’re an MIS major you need Professional Achievement Points. And everybody likes extra credit.

So see instructions for this Bonus Assignment: MIS2502 Extra Credit and PAP Assignment

Do a write-up on a data-related topic and submit it by April 28, 2017 at 11:59 PM and you’ll receive 50 professional achievement points (which only matters to MIS majors) and 3 points extra credit on your final exam (which is good for everyone).

Follow the instructions carefully. If you don’t follow the instructions you will not receive credit.

Assignment #7: Decision Trees in R (Due Tuesday, April 18, 2017 at 11:59 pm)

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need BankLoan.csv. (Right click to download the file. Make sure that the name doesn’t change)

This assignment is due Tuesday, April 18, 2017 at 11:59 pm.

For this assignment you’ll need to modify the R script you used in ICA #10 (dTree.r). To do this, you should finish the related in-class exercise first (ICA #10 – Decision Trees in R).

Assignment #6 Hints – A Checklist

Dear R experts,

While working on Assignment #6, you may experience errors. In order to at least run the script without error messages, you often have to “debug” to fix the errors.

Here is a list of things you can check:

1. If you get an error message like this:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'OnTimeAirport-Jan14.csv': No such file or directory

ask yourself the following questions:

  • Did you have all the files (both Descriptives.r file and OnTimeAirport-Jan14.csv file) in the same directory/folder?
  • Are the file names correct? Sometimes when you download the file to your disk multiple times, the system automatically rename the files (like OnTimeAirport-Jan14(1).csv). If this is the case, you have to change the name back.
  • Did you set the working directory to source file location? (How to do that? Read instructions on ICAs #9.1 and #9.2.)

2. Under the VARIABLES section, did you change the following variable values accordingly for the new dataset?
(1) INPUT_FILENAME
(2) HISTLABEL
(3) HIST_TITLE

3. Were you able to install the “psych” package? (Does lines 30-31 work?)
If not, close RStudio, then open R (instead of RStudio), copy and paste lines 30-31 into the R Console, press enter, and see if the package is correctly installed. You may be asked to pick a mirror, just pick one in the US. Once you finish, re-open RStudio and re-run the code to see it works now.

4. In the original Descriptives.r file, our analysis was based on the NBA data. Specifically, we looked at two columns, dataSet$Salary and dataSet$Position. For the assignment, because we are using a different dataset called OnTimeAirport-Jan14.csv, you need to make the following changes accordingly:

(1) Do a thorough search of your r script and make sure that you changed all places with dataSet$Salary and dataSet$Position to the new columns we are interested in in the new dataset.

(2) Make sure that you changed line 87:

subset <- dataSet[ which(dataSet$Position=='PG' |  dataSet$Position=='SF'), ];

into:

subset <- dataSet[ which(dataSet$Origin=='PHL' |  dataSet$Originn=='PIT'), ];

The reason is that now we are interested in looking at Philadelphia and Pittsburgh as origin airports (as asked in Question 5 of the answer sheet).

If you have everything listed above checked, you should be able to at least complete the first six steps on the assignment.

5. To complete Steps 7-9, you should add a few more lines in between the t.test() and the sink() functions. More specifically, I recommend you to add lines right before the following line (line 95 of the original Descriptives.r file).

# This stops R from writing any more to the text output file.

That’s all I can think of so far. Good luck!

Assignment #6 – Introduction to Working with R RStudio (Due Tuesday, April 11, 2017 at 11:59 pm)

Update as of 4/4/2017 at 4:46 pm: I made a small change to the assignment instructions. Step 2 on Page 2 is slightly modified. (That is, you do not need to add a “#” sign before line 66 as said in the previous version).

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need OnTimeAirport-Jan14.csv. (Right click to download the file. Make sure that the name doesn’t change.)

This assignment is due Tuesday, April 11, 2017 at 11:59 pm.

For this assignment, you’ll need to modify the R script you used in ICA #9.2 (Descriptives.r). To do this, you should finish the related in-class exercise first.

Companies Using R

What companies use R for data analysis? Check the list compiled by listendata.com. Here are some examples:

  1. Facebook – For behavior analysis related to status updates and profile pictures.
  2. Google – For advertising effectiveness and economic forecasting.
  3. Twitter – For data visualization and semantic clustering
  4. Microsoft – Acquired Revolution R company and use it for a variety of purposes.
  5. Uber – For statistical analysis
  6. Airbnb – Scale data science.
  7. IBM – Joined R Consortium Group
  8. ANZ – For credit risk modeling
  9. HP
  10. Ford
  11. Novartis
  12. Roche
  13. New York Times – For data visualization
  14. Mckinsey
  15. BCG
  16. Bain

Hint for Assignment #5 (Pivot Tables)

I got quite some questions related to Q4 and Q5 of Assignment #5, both of which ask you to create a pivot table first, and then use the excel AVERAGE function in a separate cell outside the pivot table.

Here is an example of what I mean by “use the Excel AVERAGE function to average those values … in a separate cell outside the Pivot Table.”

Link: Hint for HW5 – Q4 and Q5

Let me know if you have any questions.

-Jing