MIS2502: DATA ANALYTICS (SPRING 2018)

INSTRUCTOR: JAEHWUEN JUNG, SECTION 001/003

Assignment #9: Association Mining Using R [Due Friday, 4/27/18 at 11:59 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need Groceries.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #15 (aRules.r). To do this, you should finish the related in-class exercise first.

In addition, no late submission will be accepted for this assignment (I will post the solution right after the deadline so you can prepare exam 3.)

Due date: Friday, 4/27/2018, 11:59 pm.

Assignment #8: Clustering Using R [Due Friday, 4/20/18 at 11:59 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need Jeans.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #13 (Clustering.r). To do this, you should finish the related in-class exercise first.

Due date: Friday, 4/20/2018, 11:59 pm.

 

Assignment #7: Decision Tree Using R [Due Friday, 4/13/18 at 11:59 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need BankLoan.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #12 (dTree.r). To do this, you should finish the related in-class exercise first.

Due date: Friday, 4/13/2018, 11:59 pm.

Assignment #6 Hints – A Checklist

Hello everyone,

While working on Assignment #6, you may experience errors. In order to at least run the script without error messages, you often have to “debug” to fix the errors.

Here is a list of things you can check:

1. If you get an error message like this:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'OnTimeAirport2017Dec.csv': No such file or directory

ask yourself the following questions:

  • Did you have all the files (both Descriptives.r file and OnTimeAirport2017Dec.csv file) in the same directory/folder?
  • Are the file names correct? Sometimes when you download the file to your disk multiple times, the system automatically rename the files (like OnTimeAirport2017Dec(1).csv). If this is the case, you have to change the name back.
  • Did you set the working directory to source file location? (How to do that? Read instructions on ICA #11)

2. Under the VARIABLES section, did you change the following variable values accordingly for the new dataset?
(1) INPUT_FILENAME
(2) HISTLABEL
(3) HIST_TITLE

3. Were you able to install the “psych” package? (Did lines 30-31 work? Did you get any error message about the package?)
If you were not able to install the “psych” package in RStudio, try to install it in R first. To do so, close RStudio, then open R (instead of RStudio), copy and paste lines 30-31 into the R Console, press enter, and see if the package is correctly installed. You may be asked to pick a mirror, just pick one in the US. Once you finish, re-open RStudio and re-run the code to see it works now.

4. In the original Descriptives.r file, our analysis was based on the NBA data. Specifically, we looked at two columns, dataSet$Salary and dataSet$Position. For the assignment, because we are using a different dataset called OnTimeAirport2017Dec.csv, you need to make the following changes accordingly

5. To complete Steps 7-9, you should add a few more lines in between the t.test() and the sink() functions. More specifically, I recommend you to add lines right before the following line (line 95 of the original Descriptives.r file).

# This stops R from writing any more to the text output file.

That’s all I can think of so far.

Good luck!

Bonus Assignment (Optional): One page write-up [Due Saturday, 4/28/18 at 11:59 pm]

This is a bonus assignment which you can get extra credit (30 points) as well as 50 professional achievement points (if you are a MIS major).

It is optional and it is due on Saturday, Apr 28th, 11:59 pm.

As this assignment is for extra credit, late submissions will not be accepted.

Here are the instructions: Bonus Assignment – Extra Credit and PRO assignment

Submit your write-up as a word or PDF document through Canvas > Assignment > To-Do before deadline.

Assignment #6: Getting Familiar with R/RStudio [Due Friday, 4/6/18 at 11:59 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need OnTimeAirport2017Dec.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #11 (Descriptives.r). To do this, you should finish the related in-class exercise first.

Due date: Friday, 4/6/2018, 11:59 pm.

Exam #2 Study Guide: SQL/Data Visualization/Dimensional Data Modeling [During class time on 3/28]

  • The exam is closed book, closed notes. The exam will be a combination of multiple-choices and short-answer questions.
  • You will NOT be able to use a computer during the exam.
  • Exam # 2 covers all the lectures after “Advanced queries” and those before advanced analytics in R. We will have a review session on Friday (3/23) and Monday (3/26).
  • Study Guide for Exam #2 (word)
  • Study Guide for Exam #2 (pdf) (updated: 3/25)

Practice questions. Here are some practice questions, in case you want more exercises.

 

Setting up R and RStudio

R is a widely-used, open source statistical analysis platform. RStudio is an integrated development environment for R – that means it makes using R easier!

  • You should install both software packages – R and RStudio! Don’t just install R or your life will be difficult! 

We’ll be using this software to do some advanced analytics in the second half of the semester! You can get a full copy of the software – PC or Mac – for free!

First, download and install R:

  • Download the installation package for R.
    • Choose the link for your operating system (Windows or MacOS).
      • If you have Windows, choose the “base” installation file.
      • If you have a Mac, you’ll have to choose the one that corresponds to your version of MacOS.
    • Download the latest version (Currently the latest version is the latest version is R 3.4.4. But if there is a newer version, simply download the latest version.)
    • Install the software, accepting the default options.

Now, download and install RStudio: (You need to have R installed first!)

  • Download the appropriate installer from the RStudio website.
    • We will use the RStudio Desktop (Open Source License) version, which is free.
    • Scroll down to the bottom of the page, and choose the link for your operating system (Windows or MacOS).
    • Download the latest version (Currently the latest version is RStudio 1.1.442. But if there is a newer version, simply download the latest version.)
    • Install the software, accepting the default options.

After both are installed, you’re always going to run RStudio, which will use R behind the scenes to give you a pleasing analytics experience!