MIS2502: DATA ANALYTICS (Fall 2018)

INSTRUCTOR: JAEHWUEN JUNG, SECTION 004

Assignment #8: Association Mining Using R [Due Tuesday Before Class!, 12/4/18 at 12:29 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need Groceries.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #15 (aRules.r). To do this, you should finish the related in-class exercise first.

In addition, no late submission will be accepted for this assignment (I will post the solution right after the deadline so you can prepare the exam.)

Due date: Tuesday, 12/4/2018, 12:29 pm.

Assignment #7: Clustering Using R [Due Tuesday, 11/27/18 at 5:59 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need Jeans.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #13 (Clustering.r). To do this, you should finish the related in-class exercise first.

Due date: Tuesday, 11/27/2018, 5:59 pm.

Assignment #6: Decision Tree Using R [Due Thursday, 11/15/18 at 5:59 pm]

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need BankLoan.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #12 (dTree.r).

Due date: Thursday, 11/15/2018, 5:59 pm.

Assignment #5: Getting Familiar with R/RStudio [Due Tuesday, 11/6/18 at 5:59 pm]

Here is the assignment instructions.

Here is the data file you’ll need OnTimeAirport2017Dec.csv. (Right click to download the file. Make sure that the name doesn’t change.)

For this assignment, you’ll need to modify the R script you used in ICA #11 (Descriptives.r). To do this, you should finish the related in-class exercise first.

Due date: Tuesday, 11/6/2018, 5:59 pm.

 

Exam #2 Study Guide: SQL/Data Visualization/Dimensional Data Modeling [During class time on 10/30]

  • The exam is closed book, closed notes. The exam will be a combination of multiple-choices and short-answer questions.
  • You will NOT be able to use a computer during the exam.
  • Exam # 2 covers all the lectures after “Advanced queries” and those before advanced analytics in R. We will have a review session on Thursday (10/25) and Monday (Optional, 2:30-4:30 pm on 10/29, Speakman 200 by Joe Deng).
  • Study Guide for Exam #2 (word)
  • Study Guide for Exam #2 (pdf)
  • Practice questions. Here are some practice questions, in case you want more exercises.

Setting up R and RStudio

R is a widely-used, open source statistical analysis platform. RStudio is an integrated development environment for R – that means it makes using R easier!

  • You should install both software packages – R and RStudio! Don’t just install R or your life will be difficult! 

We’ll be using this software to do some advanced analytics in the second half of the semester! You can get a full copy of the software – PC or Mac – for free!

First, download and install R:

  • Download the installation package for R.
    • Choose the link for your operating system (Windows or MacOS).
      • If you have Windows, choose the “base” installation file.
      • If you have a Mac, you’ll have to choose the one that corresponds to your version of MacOS.
    • Download the latest version (Currently the latest version is the latest version is R 3.5.1. But if there is a newer version, simply download the latest version.)
    • Install the software, accepting the default options.

Now, download and install RStudio: (You need to have R installed first!)

  • Download the appropriate installer from the RStudio website.
    • We will use the RStudio Desktop (Open Source License) version, which is free.
    • Scroll down to the bottom of the page, and choose the link for your operating system (Windows or MacOS).
    • Download the latest version (Currently the latest version is RStudio 1.1.456. But if there is a newer version, simply download the latest version.)
    • Install the software, accepting the default options.

After both are installed, you’re always going to run RStudio, which will use R behind the scenes to give you a pleasing analytics experience!

Assignment #4: ETL/Pivot Table in Excel [Due Tuesday, 10/23/18 at 5:59 pm]

Here are the instructions: Assignment #4 – ETL/Pivot table

For ETL related questions (Part 1-4), you’ll need to complete the assignment using this excel file: ETL Workbook.xlsx. Please submit the final version of your “ETL Workbook.xlsx” worksheet to Canvas.

For Pivot table related questions (Part 5), complete and submit the assignment as a word or pdf file using Excel workbook: VandelaySales.xlsx

Due date: Tuesday, 10/23/2018, 5:59 pm.

 

Group Assignment: Temple Analytics Challenge! [Due Wednesday, 10/31/18 at 11:59 pm]

The group project is based on the Temple Analytics Challenge (http://ibit.temple.edu/analytics/), a University-wide data analysis and visualization competition.

Here is the instruction for the group project CLICKThe group project is due by October 31, 2018. 

  • You should work in teams of up to four members.
    • Each member of the team will receive the same grade. Team member can be anyone at Temple University.
  • For each group, please email me the names and TUIDs (i.e. tuz12345) of all group members by October 12 (Friday) before noon.
  • You will complete the two deliverables for the challenge (both clearly displaying the names of your team members):
    • The graphic (or series of graphics) as a PDF.
    • A brief summary of no more than one page explaining your graphic and why you think it is effective – also as a PDF.
  • Your deliverables should be submitted via Canvas. Each group will only need to submit once.