Instructor: Jing Gong. Class Time: M/W/F 2:00-2:50 pm

Jing Gong

Jing Gong joins the Fox School on a tenure-track appointment from Carnegie Mellon University, where she studied for a PhD in Information Systems and Management. She is interested in using empirical models to analyze firm and consumer behavior in online markets, with primary focuses in electronic commerce, digital marketing, two-sided online markets, online labor markets, and business analytics. Her research uses interdisciplinary approaches such as econometrics, Bayesian statistics, economic structural modeling, field experiment, and text analytics. Jing’s work has appeared in several major conferences and workshops including International Conference on Information Systems (ICIS), Workshop on Information Systems Economics (WISE), Conference on Information Systems and Technology (CIST), Marketing Science Conference, and China Summer Workshop on Information Management (CSWIM). She was the 2014 recipient of the Best Student Paper Award at the Conference of Information Systems and Technology and the 2014 Best Paper Award runner-up, at the China Summer Workshop on Information Management. Jing holds a Bachelor's Degree in Information Management and Information Systems from Tsinghua University in Beijing, China.
1 2 3 5

Assignment #6: Decision Trees in R (Due Monday, November 27, 2017 at 11:59 pm)

Here is the assignment instructions and an answer sheet to submit (in Word format).

Here is the data file you’ll need BankLoan.csv. (Right click to download the file. Make sure that the name doesn’t change)

This assignment is due Monday, November 27, 2017 at 11:59 pm.

For this assignment, you’ll need to modify the R script you used in ICA #10 (dTree.r). To do this, you should finish the related in-class exercise first (ICA #10 – Decision Trees in R).

Agenda for Week 12 (Week of 11/13)

Class Schedule:

  • Monday — ICA #10 (Decision Trees in R)
  • Wednesday — ICA #10 (Decision Trees in R) continued
  • Friday — Slide Deck: Clustering

Deadlines (unless otherwise mentioned, the due time is 11:59 pm):

  • Assignment #5 (Introduction to working with R and RStudio): 11/13 (Monday).
  • ICA #10 (Decision Trees in R): 11/17 (Friday).
  • Assignment #6 (Decision Trees): 11/27 (Monday).

Assignment #5 Hints – A Checklist

Dear R experts,

While working on Assignment #5, you may experience errors. In order to at least run the script without error messages, you often have to “debug” to fix the errors.

Here is a list of things you can check:

1. Error Message. If you get an error message like this:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'OnTimeAirport-Jan14.csv': No such file or directory

ask yourself the following questions:

  • Did you have all the files (both Descriptives.r file and OnTimeAirport-Jan14.csv file) in the same directory/folder?
  • Are the file names correct? Sometimes when you download the file to your disk multiple times, the system automatically rename the files (like OnTimeAirport-Jan14(1).csv). If this is the case, you have to change the name back.
  • Did you set the working directory to source file location? (How to do that? Read instructions on ICA #9)

2. Changing Variable Values. Under the VARIABLES section, did you change the following variable values accordingly for the new dataset?

  • INPUT_FILENAME
  • HISTLABEL
  • HIST_TITLE

3. Package Installation. Were you able to install the “psych” package?
If not, check the solutions here: Problems Installing Packages?

4. Modifying R Script. In the original Descriptives.r file, our analysis was based on the NBA data. Specifically, we looked at two columns, dataSet$Salary and dataSet$Position. For the assignment, because we are using a different dataset called OnTimeAirport-Jan14.csv, you need to make the following changes accordingly:

  • Do a thorough search of your r script and make sure that you changed all places with dataSet$Salary and dataSet$Position to the new columns we are interested in in the new dataset.
  • Make sure that you changed line 87:
subset <- dataSet[ which(dataSet$Position=='PG' |  dataSet$Position=='SF'), ];

into:

subset <- dataSet[ which(dataSet$Origin=='PHL' |  dataSet$Originn=='PIT'), ];

The reason is that now we are interested in looking at Philadelphia and Pittsburgh as origin airports (as asked in Question 5 of the answer sheet).

  • Make sure that “dataSet” has the right cases (i.e., the letter “S” should be in upper case). This is because R is case-sensitive.

If you have everything listed above checked, you should be able to at least complete the first six steps on the assignment.

5. Adding More Commands. To complete Steps 7-9, you should add a few more lines in between the t.test() and the sink() functions. More specifically, I recommend you to add lines right before the following line (line 95 of the original Descriptives.r file).

# This stops R from writing any more to the text output file.

That’s all I can think of so far. Good luck!

Problems Installing Packages?

If you are unable to install packages in RStudio (such as a “not installed” or “cannot find dependency” error), some common solutions are outlined below:

Try a different CRAN mirror

It is possible your default CRAN Mirror is down or currently unavailable. You can switch to a different CRAN mirror from the RStudio Options Menu.

  1. In RStudio, go to Tools/Global Options…
  2. Click on Packages (on the left).
  3. Click on Change… under CRAN Mirror.
  4. Choose one of the USA sites at the end (i.e., try the one in CA1 or in Dallas, TX).
  5. Take a 5 minute break while the package is being installed.
  6. Rerun the script. It should work.
  7. If it does not work, re-do Steps 1~6 but choosing a different CRAN Mirror.
    If you tried several different CRAN Mirrors but it still gives an error, I recommend that you try the next approach.

Are you able to install packages in R?

If the above does not work, try to install packages in R (outside of RStudio) and see if you’re able to do that.

  1. Open R
  2. Copy the install.packages() function to R console. For example, if you are trying to install the psych package, copy the following code to R console: install.packages(“psych”)
  3. It will ask you to select a CRAN mirror. Choose one of the USA sites at the end (i.e., try the one in CA1 or in Dallas, TX).
  4. Take a 5 minute break while the package is being installed.
  5. Rerun the script. It should work.
  6. If it does not work, re-do Steps 1~5 but choosing a different CRAN Mirror.
    If you tried several different CRAN Mirrors but it still gives an error, I recommend that you try the next approach.

Are you able to connect to the Internet, or does your internet use a proxy?

If you’re not able to connect to the Internet via R, you may not be able to download and install packages. If your networking environment requires outbound network connections to go through a HTTP proxy, see the following Knowledge Base article on Configuring R to Use an HTTP Proxy

Try a different computer

If none of the above solutions work, use a different computer, for example, a lab computer.

I recommend you use the computer labs at Fox, which do have R/RStudio. Here is the list of computer labs and hours within Fox: http://www.fox.temple.edu/technology/it/resources/computer-labs/

NOTE that Tech Center is NOT recommended. The computers in the Tech Center may not have R/RStudio installed.

Want extra practice with R?

If you’re looking for an additional tutorial on R syntax, try this CodeSchool site: http://tryr.codeschool.com/.

It is not required that you do this, but some of you might find it helpful. Remember, most the information you need for the course is in the in-class exercises, slides, notes, and assignments. But sometimes people want something extra, and this is a pretty good interactive walk-through.

The most relevant chapters are 1, 2, 4, and 7. However, it looks like you have to do them in order (you have to complete chapters 1 and 2 to get to chapter 3, etc.). So my suggestion would be to at least go through Chapter 1. If you find it useful, keep going!

Assignment #5 – Introduction to Working with R and RStudio (Due Monday, November 13, 2017 at 11:59 pm)

Here is the assignment instructions and an answer sheet to submit (in Word format, same as the last page of the assignment instructions).

Here is the data file you’ll need OnTimeAirport-Jan14.csv. (Right click to download the file. Make sure that the name doesn’t change.)

This assignment is due Monday, November 13, 2017 at 11:59 pm.

For this assignment, you’ll need to modify the R script you used in ICA #9 (Descriptives.r). To do this, you should finish the related in-class exercise first.

Agenda for Week 11 (Week of 11/6)

Class Schedule:

  • Monday — ICA #8 (Descriptive Statistics Using R and RStudio)
  • Wednesday — ICA #8 (Descriptive Statistics Using R and RStudio) continued
  • Friday — Classification using Decision Trees

Deadlines (unless otherwise mentioned, the due time is 11:59 pm):

  • ICA #8 (Descriptive Statistics Using R and RStudio): 11/8 (Wednesday).
  • Assignment #5 (Introduction to working with R and RStudio): 11/13 (Monday).

Review for Exam 2: Online Practice Problems on Subselects and Joins

Some of you have requested more practice problems for SQL. Here are some exercises on w3resource.com, in case you need additional practice.
http://www.w3resource.com/mysql-exercises/subquery-exercises/
Q1, Q6, Q8, Q9

http://www.w3resource.com/sql-exercises/subqueries/index.php
Q1, Q2, Q3, Q7, Q8, Q12, Q13, Q14, Q25

http://www.w3resource.com/sql-exercises/sql-exercises-quering-on-multiple-table.php
Q1, Q2, Q3, Q4, Q7

Note that the solutions provided in these resources did not specify the schema name. But in the exam, you should include the schema name.

For example, in these resources, they may have SQL queries like this:

SELECT first_name, last_name, salary

FROM employees;

But as we learned in class, we should specify schema name, which is “hr“, as follows:

SELECT first_name, last_name, salary

FROM hr.employees;

Other than this key difference, these practice questions should be good resources for practice.

 

1 2 3 5