Instructor: Jing Gong. Class Time: M/W/F 2:00-2:50 pm


Assignment #6: Decision Trees in R (Due Monday, November 27, 2017 at 11:59 pm)

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need BankLoan.csv. (Right click to download the file. Make sure that the name doesn’t change)

This assignment is due Monday, November 27, 2017 at 11:59 pm.

For this assignment, you’ll need to modify the R script you used in ICA #10 (dTree.r). To do this, you should finish the related in-class exercise first (ICA #10 – Decision Trees in R).

Assignment #5 Hints – A Checklist

Dear R experts,

While working on Assignment #5, you may experience errors. In order to at least run the script without error messages, you often have to “debug” to fix the errors.

Here is a list of things you can check:

1. Error Message. If you get an error message like this:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'OnTimeAirport-Jan14.csv': No such file or directory

ask yourself the following questions:

  • Did you have all the files (both Descriptives.r file and OnTimeAirport-Jan14.csv file) in the same directory/folder?
  • Are the file names correct? Sometimes when you download the file to your disk multiple times, the system automatically rename the files (like OnTimeAirport-Jan14(1).csv). If this is the case, you have to change the name back.
  • Did you set the working directory to source file location? (How to do that? Read instructions on ICA #9)

2. Changing Variable Values. Under the VARIABLES section, did you change the following variable values accordingly for the new dataset?


3. Package Installation. Were you able to install the “psych” package?
If not, check the solutions here: Problems Installing Packages?

4. Modifying R Script. In the original Descriptives.r file, our analysis was based on the NBA data. Specifically, we looked at two columns, dataSet$Salary and dataSet$Position. For the assignment, because we are using a different dataset called OnTimeAirport-Jan14.csv, you need to make the following changes accordingly:

  • Do a thorough search of your r script and make sure that you changed all places with dataSet$Salary and dataSet$Position to the new columns we are interested in in the new dataset.
  • Make sure that you changed line 87:
subset <- dataSet[ which(dataSet$Position=='PG' |  dataSet$Position=='SF'), ];


subset <- dataSet[ which(dataSet$Origin=='PHL' |  dataSet$Originn=='PIT'), ];

The reason is that now we are interested in looking at Philadelphia and Pittsburgh as origin airports (as asked in Question 5 of the answer sheet).

  • Make sure that “dataSet” has the right cases (i.e., the letter “S” should be in upper case). This is because R is case-sensitive.

If you have everything listed above checked, you should be able to at least complete the first six steps on the assignment.

5. Adding More Commands. To complete Steps 7-9, you should add a few more lines in between the t.test() and the sink() functions. More specifically, I recommend you to add lines right before the following line (line 95 of the original Descriptives.r file).

# This stops R from writing any more to the text output file.

That’s all I can think of so far. Good luck!

Assignment #5 – Introduction to Working with R and RStudio (Due Monday, November 13, 2017 at 11:59 pm)

Here is the assignment instructions and an answer sheet to submit (in Word format, same as the last page of the assignment instructions).

Here is the data file you’ll need OnTimeAirport-Jan14.csv. (Right click to download the file. Make sure that the name doesn’t change.)

This assignment is due Monday, November 13, 2017 at 11:59 pm.

For this assignment, you’ll need to modify the R script you used in ICA #9 (Descriptives.r). To do this, you should finish the related in-class exercise first.

Hint for Assignment #4 (ETL and Pivot Tables)

I got quite some questions related to Part 4, Question (b) of Assignment #4, which asks you to create a pivot table first, and then use the excel AVERAGE function in a separate cell outside the pivot table.

Here is an example of what I mean by “use the Excel AVERAGE function to average those values … in a separate cell outside the Pivot Table.”

Link: Hint for HW4 – Part4Qb

Let me know if you have any questions.


Assignment #4: ETL and Pivot Tables in Excel (due by Wednesday, 10/25, 11:59 pm)

Here is the assignment: Assignment #4 – ETL and Pivot Tables in Excel

And here is the Excel workbook you’ll need to complete the assignment: ETL Workbook.xlsx

This assignment is due by Wednesday, 10/25, 11:59 pm.

Group Project: Data Visualization and the QVC Analytics Challenge (due by 10/31 at 11:59 pm)

Here are the assignment instructions: Group Project – Data Visualization [f17].

The group project (for the course) is due October 31, at 11:59 pm. 

  • You can work in teams of up to four people.
  • For each group, please email me the names and AccessNet IDs (i.e., tuz12345) of all group members by October 20 before noon. (This way I can enter the groups in Blackboard in advance.)

The group project is based on the Temple Analytics Challenge, a University-wide data analysis and visualization competition.

  • You should enter the challenge as well – to get the extra credit, professional achievement points and a chance to win up to $2500.
  • To enter the challenge, you must submit your entry, also by October 31, at 11:59 PM.


More on Assignment 3 (SQL Part 2)

1. To start Assignment 3, you need to complete ICA #6 first. Want to check if you did ICA #6 correctly? Take a look at the solution key that I posted under Course Materials. If you had any mistake in your solution, follow these steps:

  1. In MySQL Workbench, delete your Company table;
  2. Copy and paste the statements from the file “ICA #6 SQL statements” to your MySQL Workbench;
  3. Replace the schema names “mxxws” with your schema name (mxx should be your MySQL username);
  4. Run all the statements. Make sure you did not get any error message;
  5. If everything works, you can start Assignment 3 now.

2. BOOLEAN data type. In the Contact table, the field IsMain has a BOOLEAN data type. If you do not know how to handle BOOLEAN data type, here is short tutorial that might help: More on MySQL BOOLEAN Data Type

Assignment #3: SQL Part 2 – Putting Information into a Database (Due Friday, 10/13/2017, 11:59 pm)

See instructions here: Assignment #3: SQL Part 2

It’s due on Friday, 10/13/2017, 11:59 pm. Please submit your assignment on Blackboard.

For this assignment, you will need to use MySQL Workbench to finish building the contact management database for MarketCo that you started during ICA #6. Therefore, you must complete the in-class activity before you do this assignment. (After Friday 10/6, I will release the solution for ICA #6.)