Community Platform
Interests
  • App development
  • Application development
  • Application Programming Interface (APIs)
This Year
150 Points
Total
150 Points
MIS Badge

Click here
to validate the recipient

Extra Credit Assignment

In this project we have to find an dataset and use a decision tree model that we used in a ICA. Then we had to describe the dataset that we used the features and what was going to be the 0utcome variable. Next, I had to find what would be the best minimum split for the dataset based on the validation and the correction set and if you can read the nodes. The you had to pick the node that is most likely to get the variable outcome and the node that is least likely to get the variable outcome.

These are also the answers to the questions

Extra Credit Assignment

  1. The dataset is about adults’ income based on features such as their education, relationship, gender, race, etc. The outcome variable is the income of 50,000 a year, so the features relate to the outcome because they all affect your knowledge and what industry they can get into, for example if you don’t have an education past high school you are limited to the jobs that you can get. The insights that can be drawn from the data is what percentage of features decide what group of people will make or not 50,000 incomes.
  2. The best minimum split is 1000 because when we go high than a thousand in the minimum split it doesn’t affect the validation set and the training set at all. But when we go lower than 1000 the decision tree is hard to read because there are too many nodes.

3. The Node with the highest probability group of people to make more than 50,000 is node #19 the percentage they will make higher than 96%. The node’s features are married people, their education level is higher than high school, might have an associate degree or bachelor’s degree. They have less than 5,095 in capital gains. And the capital-loss is more than 1,782. The node with the lowest probability group of people making less than 50,000 a year is number #4. The percentage of this group of people making more than 50,000 is 2.1%. The features of this node are people who aren’t married, have capital gains less than or equal to 7,073.5. Educational levels are they got a high school diploma or dropped out of high school; capital loss is less than 2,391.50.


Skip to toolbar