ATTRIBUTE INFORMATION:
- There are 30,000 observations.
- The binary variable, default_payment_next_month (Yes=2, No=1) is the response variable.
- There are 23 explanatory variables:
- LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
- SEX: (1 = male; 2 = female).
- EDUCATION : (1 = graduate school; 2 = university; 3 = high school; 4 = others).
- MARRIAGE: (1 = married; 2 = single; 3 = others).
- AGE: (year).
- History of past payment: REPAY_SEP, REPAY_AUG, REPAY_JUL, REPAY_JUN, REPAY_MAY, REPAY_APR. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
- Amount of bill statement: BILL_SEP, BILL_AUG, BILL_JUL, BILL_JUN, BILL_APR, BILL_MAY
- Amount of previous payment: PAY_AMT_SEP, PAY_AMT_AUG, PAY_AMT_JUL, PAY_AMT_JUN, PAY_AMT_MAY, PAY_AMT_APR;
SAS CODE
/* Source File: default of credit card clients.xls */
/* Source Path: /home/mst07221 */
%web_drop_table(WORK.IMPORT2);
FILENAME REFFILE "/home/mst07221/default of credit card clients.xls" TERMSTR=CR;
PROC IMPORT DATAFILE=REFFILE
DBMS=XLS
OUT=WORK.IMPORT2;
GETNAMES=YES;
RUN;
PROC SORT DATA=WORK.IMPORT2;
BY LIMIT_BAL;
ods graphics on;
PROC HPFOREST;
target default_payment_next_month/level=nominal;
input
SEX EDUCATION MARRIAGE REPAY_SEP REPAY_AUG REPAY_JUL REPAY_JUN REPAY_MAY REPAY_APR /level=nominal;
input
LIMIT_BAL AGE BILL_SEP BILL_AUG BILL_JUL BILL_JUN BILL_APR BILL_MAY PAY_AMT_SEP PAY_AMT_AUG PAY_AMT_JUL PAY_AMT_JUN PAY_AMT_MAY PAY_AMT_APR /level=interval;
RUN;
%web_open_table(WORK.IMPORT2);
RESULTS
The HPFOREST Procedure
Performance Information | |
---|---|
Execution Mode | Single-Machine |
Number of Threads | 2 |
Data Access Information | |||
---|---|---|---|
Data | Engine | Role | Path |
WORK.IMPORT2 | V9 | Input | On Client |
Model Information | ||
---|---|---|
Parameter | Value | |
Variables to Try | 5 | (Default) |
Maximum Trees | 100 | (Default) |
Inbag Fraction | 0.6 | (Default) |
Prune Fraction | 0 | (Default) |
Prune Threshold | 0.1 | (Default) |
Leaf Fraction | 0.00001 | (Default) |
Leaf Size Setting | 1 | (Default) |
Leaf Size Used | 1 | |
Category Bins | 30 | (Default) |
Interval Bins | 100 | |
Minimum Category Size | 5 | (Default) |
Node Size | 100000 | (Default) |
Maximum Depth | 20 | (Default) |
Alpha | 1 | (Default) |
Exhaustive | 5000 | (Default) |
Rows of Sequence to Skip | 5 | (Default) |
Split Criterion | . | Gini |
Preselection Method | . | Loh |
Missing Value Handling | . | Valid value |
Number of Observations | |
---|---|
Type | N |
Number of Observations Read | 30000 |
Number of Observations Used | 30000 |
Baseline Fit Statistics | |
---|---|
Statistic | Value |
Average Square Error | 0.172 |
Misclassification Rate | 0.221 |
Log Loss | 0.528 |
Fit Statistics | |||||||
---|---|---|---|---|---|---|---|
Number of Trees | Number of Leaves | Average Square Error (Train) | Average Square Error (OOB) | Misclassification Rate (Train) | Misclassification Rate (OOB) | Log Loss (Train) | Log Loss (OOB) |
1 | 3294 | 0.1139 | 0.237 | 0.1269 | 0.249 | 1.966 | 4.767 |
2 | 6520 | 0.0798 | 0.225 | 0.1135 | 0.244 | 0.513 | 4.096 |
3 | 9603 | 0.0687 | 0.212 | 0.0879 | 0.239 | 0.280 | 3.443 |
4 | 12768 | 0.0632 | 0.202 | 0.0822 | 0.234 | 0.227 | 2.954 |
5 | 15775 | 0.0603 | 0.193 | 0.0758 | 0.229 | 0.213 | 2.520 |
6 | 18789 | 0.0584 | 0.186 | 0.0736 | 0.223 | 0.208 | 2.191 |
7 | 21727 | 0.0570 | 0.179 | 0.0713 | 0.219 | 0.206 | 1.893 |
8 | 24756 | 0.0559 | 0.174 | 0.0692 | 0.215 | 0.205 | 1.644 |
9 | 27565 | 0.0553 | 0.169 | 0.0672 | 0.210 | 0.203 | 1.445 |
10 | 30525 | 0.0548 | 0.165 | 0.0660 | 0.208 | 0.203 | 1.275 |
11 | 33342 | 0.0545 | 0.162 | 0.0652 | 0.206 | 0.203 | 1.142 |
12 | 36112 | 0.0544 | 0.160 | 0.0654 | 0.205 | 0.203 | 1.037 |
13 | 39169 | 0.0540 | 0.157 | 0.0657 | 0.202 | 0.203 | 0.942 |
14 | 42015 | 0.0538 | 0.155 | 0.0648 | 0.201 | 0.203 | 0.862 |
15 | 44959 | 0.0536 | 0.154 | 0.0642 | 0.200 | 0.203 | 0.811 |
16 | 47860 | 0.0534 | 0.152 | 0.0641 | 0.199 | 0.203 | 0.754 |
17 | 50907 | 0.0531 | 0.151 | 0.0636 | 0.196 | 0.202 | 0.714 |
18 | 53923 | 0.0528 | 0.150 | 0.0636 | 0.196 | 0.202 | 0.674 |
19 | 56981 | 0.0526 | 0.149 | 0.0628 | 0.194 | 0.202 | 0.645 |
20 | 60094 | 0.0523 | 0.148 | 0.0630 | 0.195 | 0.201 | 0.618 |
21 | 63098 | 0.0521 | 0.148 | 0.0624 | 0.194 | 0.201 | 0.596 |
22 | 66296 | 0.0517 | 0.147 | 0.0618 | 0.194 | 0.200 | 0.580 |
23 | 69211 | 0.0516 | 0.146 | 0.0621 | 0.193 | 0.200 | 0.559 |
24 | 72437 | 0.0512 | 0.146 | 0.0614 | 0.192 | 0.199 | 0.550 |
25 | 75734 | 0.0509 | 0.145 | 0.0606 | 0.192 | 0.198 | 0.540 |
26 | 78841 | 0.0507 | 0.145 | 0.0604 | 0.192 | 0.198 | 0.530 |
27 | 81858 | 0.0505 | 0.145 | 0.0601 | 0.191 | 0.197 | 0.521 |
28 | 84911 | 0.0505 | 0.144 | 0.0599 | 0.192 | 0.197 | 0.515 |
29 | 87702 | 0.0505 | 0.144 | 0.0604 | 0.191 | 0.198 | 0.509 |
30 | 90681 | 0.0505 | 0.144 | 0.0603 | 0.191 | 0.198 | 0.505 |
31 | 93747 | 0.0504 | 0.143 | 0.0600 | 0.191 | 0.197 | 0.497 |
32 | 96588 | 0.0504 | 0.143 | 0.0602 | 0.189 | 0.198 | 0.492 |
33 | 99778 | 0.0502 | 0.143 | 0.0592 | 0.190 | 0.197 | 0.489 |
34 | 102728 | 0.0502 | 0.143 | 0.0595 | 0.189 | 0.197 | 0.486 |
35 | 105948 | 0.0500 | 0.142 | 0.0595 | 0.188 | 0.197 | 0.482 |
36 | 108908 | 0.0500 | 0.142 | 0.0590 | 0.189 | 0.197 | 0.478 |
37 | 111980 | 0.0499 | 0.142 | 0.0589 | 0.189 | 0.196 | 0.475 |
38 | 114907 | 0.0499 | 0.142 | 0.0592 | 0.188 | 0.196 | 0.473 |
39 | 117971 | 0.0498 | 0.142 | 0.0588 | 0.188 | 0.196 | 0.471 |
40 | 120864 | 0.0498 | 0.141 | 0.0590 | 0.188 | 0.197 | 0.470 |
41 | 123726 | 0.0499 | 0.141 | 0.0584 | 0.187 | 0.197 | 0.468 |
42 | 126662 | 0.0498 | 0.141 | 0.0586 | 0.187 | 0.197 | 0.466 |
43 | 129557 | 0.0498 | 0.141 | 0.0591 | 0.187 | 0.197 | 0.466 |
44 | 132304 | 0.0499 | 0.141 | 0.0593 | 0.188 | 0.197 | 0.461 |
45 | 135209 | 0.0499 | 0.141 | 0.0589 | 0.187 | 0.197 | 0.460 |
46 | 138021 | 0.0499 | 0.140 | 0.0592 | 0.188 | 0.197 | 0.458 |
47 | 141315 | 0.0498 | 0.140 | 0.0591 | 0.187 | 0.197 | 0.457 |
48 | 144402 | 0.0497 | 0.140 | 0.0588 | 0.187 | 0.197 | 0.455 |
49 | 147425 | 0.0497 | 0.140 | 0.0587 | 0.187 | 0.196 | 0.453 |
50 | 150491 | 0.0496 | 0.140 | 0.0585 | 0.187 | 0.196 | 0.452 |
51 | 153679 | 0.0495 | 0.140 | 0.0585 | 0.187 | 0.196 | 0.451 |
52 | 156685 | 0.0495 | 0.140 | 0.0586 | 0.186 | 0.196 | 0.450 |
53 | 159472 | 0.0496 | 0.140 | 0.0588 | 0.187 | 0.197 | 0.450 |
54 | 162672 | 0.0495 | 0.140 | 0.0584 | 0.187 | 0.196 | 0.450 |
55 | 165562 | 0.0495 | 0.139 | 0.0579 | 0.186 | 0.196 | 0.449 |
56 | 168637 | 0.0495 | 0.139 | 0.0580 | 0.187 | 0.196 | 0.449 |
57 | 171394 | 0.0495 | 0.139 | 0.0583 | 0.186 | 0.196 | 0.448 |
58 | 174267 | 0.0495 | 0.139 | 0.0584 | 0.186 | 0.196 | 0.448 |
59 | 177350 | 0.0495 | 0.139 | 0.0583 | 0.186 | 0.196 | 0.447 |
60 | 180405 | 0.0494 | 0.139 | 0.0583 | 0.186 | 0.196 | 0.447 |
61 | 183345 | 0.0494 | 0.139 | 0.0580 | 0.186 | 0.196 | 0.447 |
62 | 186120 | 0.0494 | 0.139 | 0.0585 | 0.186 | 0.196 | 0.446 |
63 | 188734 | 0.0495 | 0.139 | 0.0585 | 0.186 | 0.196 | 0.446 |
64 | 191563 | 0.0495 | 0.139 | 0.0587 | 0.186 | 0.196 | 0.446 |
65 | 194619 | 0.0495 | 0.139 | 0.0586 | 0.186 | 0.196 | 0.446 |
66 | 197498 | 0.0495 | 0.139 | 0.0588 | 0.185 | 0.196 | 0.445 |
67 | 200432 | 0.0495 | 0.138 | 0.0587 | 0.185 | 0.196 | 0.444 |
68 | 203318 | 0.0495 | 0.138 | 0.0592 | 0.186 | 0.197 | 0.444 |
69 | 206392 | 0.0495 | 0.138 | 0.0588 | 0.186 | 0.196 | 0.444 |
70 | 209681 | 0.0494 | 0.138 | 0.0585 | 0.186 | 0.196 | 0.444 |
71 | 212692 | 0.0494 | 0.138 | 0.0583 | 0.186 | 0.196 | 0.443 |
72 | 215727 | 0.0494 | 0.138 | 0.0580 | 0.185 | 0.196 | 0.442 |
73 | 218663 | 0.0494 | 0.138 | 0.0579 | 0.185 | 0.196 | 0.442 |
74 | 221696 | 0.0493 | 0.138 | 0.0580 | 0.185 | 0.196 | 0.442 |
75 | 224667 | 0.0493 | 0.138 | 0.0581 | 0.185 | 0.196 | 0.442 |
76 | 227649 | 0.0493 | 0.138 | 0.0582 | 0.184 | 0.196 | 0.442 |
77 | 230608 | 0.0493 | 0.138 | 0.0583 | 0.184 | 0.196 | 0.441 |
78 | 233755 | 0.0493 | 0.138 | 0.0579 | 0.184 | 0.196 | 0.441 |
79 | 236390 | 0.0493 | 0.138 | 0.0580 | 0.184 | 0.196 | 0.441 |
80 | 239439 | 0.0493 | 0.138 | 0.0577 | 0.184 | 0.196 | 0.441 |
81 | 242146 | 0.0493 | 0.138 | 0.0580 | 0.184 | 0.196 | 0.441 |
82 | 245104 | 0.0493 | 0.138 | 0.0577 | 0.184 | 0.196 | 0.440 |
83 | 248185 | 0.0493 | 0.138 | 0.0580 | 0.184 | 0.196 | 0.440 |
84 | 251073 | 0.0493 | 0.138 | 0.0580 | 0.184 | 0.196 | 0.440 |
85 | 253984 | 0.0493 | 0.138 | 0.0577 | 0.184 | 0.196 | 0.440 |
86 | 256912 | 0.0493 | 0.138 | 0.0578 | 0.184 | 0.196 | 0.440 |
87 | 260062 | 0.0492 | 0.138 | 0.0576 | 0.183 | 0.196 | 0.440 |
88 | 263180 | 0.0492 | 0.138 | 0.0577 | 0.184 | 0.196 | 0.440 |
89 | 266373 | 0.0491 | 0.138 | 0.0576 | 0.184 | 0.196 | 0.440 |
90 | 269247 | 0.0491 | 0.138 | 0.0576 | 0.184 | 0.196 | 0.440 |
91 | 272239 | 0.0491 | 0.138 | 0.0576 | 0.184 | 0.196 | 0.440 |
92 | 275375 | 0.0490 | 0.138 | 0.0574 | 0.184 | 0.195 | 0.439 |
93 | 278332 | 0.0491 | 0.138 | 0.0571 | 0.185 | 0.196 | 0.439 |
94 | 281349 | 0.0491 | 0.137 | 0.0573 | 0.184 | 0.196 | 0.439 |
95 | 284273 | 0.0490 | 0.137 | 0.0574 | 0.185 | 0.196 | 0.439 |
96 | 287059 | 0.0491 | 0.137 | 0.0578 | 0.184 | 0.196 | 0.439 |
97 | 289819 | 0.0491 | 0.137 | 0.0577 | 0.184 | 0.196 | 0.439 |
98 | 292997 | 0.0491 | 0.137 | 0.0577 | 0.184 | 0.196 | 0.439 |
99 | 295907 | 0.0491 | 0.137 | 0.0576 | 0.184 | 0.196 | 0.439 |
100 | 298931 | 0.0491 | 0.137 | 0.0577 | 0.184 | 0.196 | 0.439 |
Loss Reduction Variable Importance | |||||
---|---|---|---|---|---|
Variable | Number of Rules | Gini | OOB Gini | Margin | OOB Margin |
REPAY_SEP | 4520 | 0.033755 | 0.02978 | 0.067509 | 0.06421 |
REPAY_AUG | 3243 | 0.012217 | 0.01002 | 0.024434 | 0.02243 |
REPAY_JUL | 3164 | 0.008982 | 0.00674 | 0.017965 | 0.01581 |
REPAY_JUN | 3369 | 0.005476 | 0.00311 | 0.010953 | 0.00893 |
REPAY_MAY | 3577 | 0.004900 | 0.00244 | 0.009800 | 0.00777 |
REPAY_APR | 3738 | 0.004577 | 0.00182 | 0.009154 | 0.00685 |
SEX | 6255 | 0.002262 | -0.00184 | 0.004524 | 0.00045 |
MARRIAGE | 6520 | 0.002317 | -0.00221 | 0.004635 | 0.00038 |
EDUCATION | 6383 | 0.002914 | -0.00266 | 0.005829 | 0.00053 |
PAY_AMT_SEP | 16773 | 0.013924 | -0.00935 | 0.027848 | 0.00424 |
BILL_APR | 14566 | 0.010728 | -0.00981 | 0.021456 | 0.00069 |
BILL_JUN | 15011 | 0.011076 | -0.01002 | 0.022153 | 0.00076 |
LIMIT_BAL | 19721 | 0.014944 | -0.01024 | 0.029889 | 0.00483 |
PAY_AMT_AUG | 18266 | 0.014662 | -0.01062 | 0.029323 | 0.00383 |
BILL_JUL | 15948 | 0.011740 | -0.01064 | 0.023479 | 0.00094 |
BILL_SEP | 16241 | 0.012512 | -0.01081 | 0.025024 | 0.00137 |
BILL_AUG | 16691 | 0.012304 | -0.01088 | 0.024608 | 0.00121 |
BILL_MAY | 15693 | 0.011328 | -0.01111 | 0.022657 | 0.00011 |
PAY_AMT_JUL | 19324 | 0.013843 | -0.01221 | 0.027686 | 0.00152 |
PAY_AMT_JUN | 21109 | 0.014101 | -0.01352 | 0.028202 | 0.00043 |
AGE | 19789 | 0.014175 | -0.01394 | 0.028349 | 0.00049 |
PAY_AMT_MAY | 22755 | 0.014860 | -0.01474 | 0.029721 | -0.00008 |
PAY_AMT_APR | 26175 | 0.016937 | -0.01669 | 0.033874 | 0.00018 |
- All the 30,000 observations were used in the model because there were no missing values in the predictor variables.
- The ‘Variables to Try’ parameter indicates that 5 of the 23 explanatory variables were randomly selected to be considered for a splitting rule.
- PROC HPFOREST first computes baseline statistics without using a model. The Baseline Fit Statistics table shows a baseline misclassification rate of 0.221 because that is the proportion of observations for which default_payment_next_month =2 (Yes).
- The Fit Statistics table shows that as the number of trees increases, the fit statistics improve (decrease) at first and then level off and fluctuate in a small range. They decrease from 0.1139 to 0.0491.
- The table also provides an alternative estimate of average square error (ASE) and misclassification rate - the out-of-bag (OOB) estimate. This is a convenient substitute for an estimate that is based on test data and is a less biased estimate of how the model will perform on future data. The ASE OOB is worse (larger) than the estimate that evaluates all observations on all trees. The OOB misclassification rate decreases to values that are below the baseline misclassification rate (good model).
- The Loss Reduction Variable Importance table shows that each measure is computed twice: once on training data and once on out-of-bag data. As with fit statistics, the out-of-bag estimates are less biased. The rows are sorted by the OOB Gini measure, which is a more stringent measure than the OOB margin measure. The OOB Gini column is negative for 17 of the 23 variables, and the OOB margin column is negative for one of the 23 variables.
- We can conclude that the REPAY_SEP value is the most important predictor of whether one will default on their next month's credit card payment.
Reference