Temple MIS - Connect and innovate with an elite information systems program

The dataset I picked was from Kaggle and had data on the Titanic. The outcome variable I used was “survived”, a binary variable that showed whether the passenger survived or not. The features for predictions that I chose were passenger class, sex, age, and fare. Using these features and a decision tree Jupyter notebook, I was able to find any patterns in the features that can predict the survival or death of a passenger. I chose 15 as the best value for the minimum split. This is because the smaller the minimum split is, the more complex the tree is. We want the training set and validation set to be high and predict the outcome variable correctly.

A few insights that the decision tree provided were:

Being female could increase your chance of survival
Paying more for your fare could increase your chance of survival
The younger males had a better chance of survival

The nodes with the highest probability are nodes 16 and 5 with a 100% probability of survival. This means the passengers most likely to survive are:

Females who paid more than $33.85 for their fare
Males younger than 36.25 who paid between $7.85 and $27.07 for their fare

The nodes with the lowest probability are nodes 3, 13, and 10 with a 0% probability of survival. This means the passengers least likely to survive are:

Males younger than 36.25 who paid less than $7.85 for their fare
Males older than 60.5
Males between the ages of 36.85 and 43

Python Decision Tree (Using Titanic Dataset)

MANAGEMENT INFORMATION SYSTEMS

ABOUT MIS

MIS COMMUNITY

CURRENT STUDENTS

Footer

MANAGEMENT INFORMATION SYSTEMS

ABOUT MIS

MIS COMMUNITY

CURRENT STUDENTS