Data Construction of Credit Card Scoring Model

Posted by djpic on Wed, 31 Jul 2019 10:41:06 +0200

Construction of Credit Card Scoring Model

Background description

At present, the total balance of user's age, credit card and personal credit line, the borrower's overdue in the past two years, forecast whether the borrower will anticipate the number of times, monthly income, debt ratio, family members and other information, through these information to establish wind control, credit scoring model, predict whether the borrower will anticipate.

I. Importing data and databases

Import the corresponding library

import datetime
import pandas as pd
import numpy as np
import os
import seaborn as sns
import re
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings('always')
warnings.filterwarnings('ignore')
sns.set(style="darkgrid")
plt.rcParams['font.sans-serif'] = ['SimHei']  # Used for normal display of Chinese labels
plt.rcParams['axes.unicode_minus'] = False  # Used for normal display

/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)


time: 1.57 s

Import data

train = pd.read_csv('/home/kesci/input/kaggle4396/cs-training.csv')
test = pd.read_csv('/home/kesci/input/kaggle4396/cs-test.csv')
time: 248 ms
train.drop(columns=["Unnamed: 0"], inplace=True)
test.drop(columns=["Unnamed: 0"], inplace=True)
time: 9.83 ms

Data dimension

train.shape
(150000, 11)



time: 3.97 ms

Are there missing values?

train.isnull().sum()
SeriousDlqin2yrs                            0
RevolvingUtilizationOfUnsecuredLines        0
age                                         0
NumberOfTime30-59DaysPastDueNotWorse        0
DebtRatio                                   0
MonthlyIncome                           29731
NumberOfOpenCreditLinesAndLoans             0
NumberOfTimes90DaysLate                     0
NumberRealEstateLoansOrLines                0
NumberOfTime60-89DaysPastDueNotWorse        0
NumberOfDependents                       3924
dtype: int64



time: 30.9 ms

Is there a duplicate value?

train.duplicated().sum()
609



time: 61.2 ms

Overall distribution

train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 11 columns):
SeriousDlqin2yrs                        150000 non-null int64
RevolvingUtilizationOfUnsecuredLines    150000 non-null float64
age                                     150000 non-null int64
NumberOfTime30-59DaysPastDueNotWorse    150000 non-null int64
DebtRatio                               150000 non-null float64
MonthlyIncome                           120269 non-null float64
NumberOfOpenCreditLinesAndLoans         150000 non-null int64
NumberOfTimes90DaysLate                 150000 non-null int64
NumberRealEstateLoansOrLines            150000 non-null int64
NumberOfTime60-89DaysPastDueNotWorse    150000 non-null int64
NumberOfDependents                      146076 non-null float64
dtypes: float64(4), int64(7)
memory usage: 12.6 MB
time: 32.2 ms

Look at the data.

train.head()
SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse NumberOfDependents
0 1 0.766127 45 2 0.802982 9120.0 13 0 6 0 2.0
1 0 0.957151 40 0 0.121876 2600.0 4 0 0 0 1.0
2 0 0.658180 38 1 0.085113 3042.0 2 1 0 0 0.0
3 0 0.233810 30 0 0.036050 3300.0 5 0 0 0 0.0
4 0 0.907239 49 1 0.024926 63588.0 7 0 1 0 0.0
time: 12.1 ms
cor=train.corr()
fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(cor, xticklabels=cor.columns, yticklabels=cor.columns, annot=True, ax=ax);
time: 1.2 s

II. Data Preprocessing

train_clean = train.copy()
time: 6.31 ms

Duplicate removal

train_clean.drop_duplicates(inplace=True)
time: 198 ms

Missing Value Processing

Fill in missing values by mode

def fill_na(df):
    na_list = [i for i in df.isnull().sum().index if df.isnull().sum()[i] > 0]
    for n in na_list:
        train_fillna = train_clean[n][train_clean[n].isna() == False]
        train_clean[n].fillna(train_fillna.median(), inplace=True)
time: 1.13 ms
fill_na(train_clean)
train_clean.isnull().sum()
SeriousDlqin2yrs                        0
RevolvingUtilizationOfUnsecuredLines    0
age                                     0
NumberOfTime30-59DaysPastDueNotWorse    0
DebtRatio                               0
MonthlyIncome                           0
NumberOfOpenCreditLinesAndLoans         0
NumberOfTimes90DaysLate                 0
NumberRealEstateLoansOrLines            0
NumberOfTime60-89DaysPastDueNotWorse    0
NumberOfDependents                      0
dtype: int64



time: 360 ms

Age Distribution of Lenders

plt.figure(figsize=(16, 6))
sns.distplot(train_clean["age"], color = "black");
time: 665 ms
train_clean["age_label"] = pd.cut(train_clean["age"], np.arange(20, 110, 10))
time: 9.82 ms
# Groups with too few merged samples or too close default rates
bins = [0, 30, 40, 50, 60, 70, 110]
labels = ['0-29', '30-39', '40-49', '50-59', '60-69', '70+']
train_clean['age_grouped'] = pd.cut(train_clean['age'], bins, right=0, labels=labels)
train_clean.drop(columns="age", inplace=True)
time: 13.2 ms
def plot_age(col, fun):
    data = pd.concat([train_clean[col], train_clean["age_label"]], axis = 1)
    if fun == "s":
        df = data.groupby("age_label")[col].sum()
    elif fun == "m":
        df = data.groupby("age_label")[col].mean()

    df.plot(kind="bar", figsize=(16, 6))

time: 1.14 ms

Age-related Total Balance of Credit Cards and Personal Credit Lines of Lenders

plot_age("RevolvingUtilizationOfUnsecuredLines", "m");
time: 294 ms
# Revolving Utilization OfUnsecured Lines Term Discretization
bins = [0, 0.15, 0.30, 0.45, 0.60, 0.75, 0.90, 1.05,
        train_clean['RevolvingUtilizationOfUnsecuredLines'].max()*1.05]
labels = [
    '0-0.15',
    '0.15-0.30',
    '0.30-0.45',
    '0.45-0.60',
    '0.60-0.75',
    '0.75-0.90',
    '0.90-1.05',
    '1.05+']

train_clean['ru_grouped'] = pd.cut(train_clean['RevolvingUtilizationOfUnsecuredLines'],
                                   bins, right=0, labels=labels)
train_clean.drop(columns='ru_grouped', inplace=True)
time: 12.8 ms

Whether there is abnormal value in debt ratio

plt.figure(figsize=(16, 6))
sns.distplot(train_clean['DebtRatio'].apply(np.log1p), color="r");
time: 748 ms
train_clean["dr_log"] = train_clean["DebtRatio"].apply(np.log1p)
train_clean.drop(columns="DebtRatio", inplace=True)
plot_age("dr_log", "m")
time: 452 ms
# Grouping NumberOfOpenCreditLines AndLoans
bins = [0, 2, 4, 6, 10, 14,
       train_clean['NumberOfOpenCreditLinesAndLoans'].max()*1.05]
labels = ['0-1', '2-3', '4-5', '6-9', '10-13', '14+']
train_clean['num_oc_grouped'] = pd.cut(train_clean['NumberOfOpenCreditLinesAndLoans'], \
                                       bins, right=0, labels=labels)
train_clean.drop(columns='NumberOfOpenCreditLinesAndLoans', inplace=True)
time: 13.2 ms
# Grouping NumberOfDependents
bins = [0, 1, 2, 4, 
       train_clean['NumberOfDependents'].max()*1.05]
labels = ['0', '1', '2-3', '4+']
train_clean['num_dep_grouped'] = pd.cut(train_clean['NumberOfDependents'], \
                                        bins, right=0, labels=labels)
train_clean.drop(columns='num_dep_grouped', inplace=True)
time: 10.6 ms

Overdue times of borrowers in the past two years

PastDueNotWorse = [i for i in train_clean.columns if "NumberOfTime" in i]
plot_age(PastDueNotWorse, fun = "m")
time: 566 ms
cor = train_clean[PastDueNotWorse].corr()
cor
NumberOfTime30-59DaysPastDueNotWorse NumberOfTimes90DaysLate NumberOfTime60-89DaysPastDueNotWorse
NumberOfTime30-59DaysPastDueNotWorse 1.000000 0.980489 0.984535
NumberOfTimes90DaysLate 0.980489 1.000000 0.991409
NumberOfTime60-89DaysPastDueNotWorse 0.984535 0.991409 1.000000
time: 12 ms
#30-59 days overdue, 60-89 days overdue, 90 days overdue Wednesday column correlation coefficient is higher, keep one column modeling
train_clean.drop(columns=["NumberOfTime30-59DaysPastDueNotWorse", \
                          "NumberOfTime60-89DaysPastDueNotWorse"], inplace=True)
time: 3.13 ms

Whether the Borrower is Overdue and Age Relation

plt.figure(figsize=(16, 6))
sns.countplot(data=train_clean, x="age_label", hue="SeriousDlqin2yrs");
time: 376 ms

Distribution of overdue

train_clean['income_log'] = (train_clean['MonthlyIncome']/10000).apply(np.log1p)
train_clean.drop(columns=['MonthlyIncome'], inplace=True)
time: 8.29 ms

3. Training model

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import f1_score, roc_auc_score, confusion_matrix, accuracy_score, fbeta_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
time: 776 µs

Firstly, logistic regression model is used.

attributes = train_clean.columns.drop(['SeriousDlqin2yrs'])
sol = ['SeriousDlqin2yrs']
df = pd.get_dummies(train_clean, drop_first=True)
X = pd.get_dummies(train_clean[attributes], drop_first=True)
y = train_clean[sol]

X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.25, shuffle=True)

time: 77.2 ms
def plot_est_score(Range):
    score_list = pd.DataFrame({}, index=np.arange(
        Range.shape[0]+1), columns=[["train_score", "test_score"]])
    for i in Range:
        lg = LogisticRegression(C=i, solver='lbfgs')
        pred = lg.fit(X_train, y_train).predict(X_valid)
        ascore = lg.score(X_train, y_train)
        fscore = lg.score(X_valid, y_valid)
        score_list.loc[i-1, "train_score"] = ascore
        score_list.loc[i-1, "test_score"] = fscore
    score_list.dropna(inplace=True)
    score_max = score_list.max()
    score_max_index = score_list[score_list == score_list.max()].dropna().index[0]
    print(
        "nC={}\nmax =\n{}".format(
            score_max_index,
            score_max))
    score_list.plot(figsize=(16, 4))
time: 1.95 ms
plot_est_score(np.array([0.01, 0.03, 0.1, 0.3, 1, 3, 10]))
nC=-0.99
max =
train_score    0.933534
test_score     0.932660
dtype: float64
time: 32.7 s

Here's the start of network tuning

params_LR = {'C': [0.01, 0.03, 0.1, 0.3, 1, 3, 10],
            'solver': ['lbfgs', 'liblinear']}
gs = GridSearchCV(LogisticRegression(max_iter=1000), 
                  param_grid = params_LR,
                  scoring = 'f1',
                  cv=5).fit(X_train, y_train)
gs.best_params_
{'C': 0.01, 'solver': 'lbfgs'}



time: 7min 41s
model_lr = LogisticRegression(C=gs.best_params_['C'], solver=gs.best_params_['solver']).fit(X_train, y_train)
print('train Score: %.6f' % model_lr.score(X_train, y_train))
print('valid Score: %.6f' %  model_lr.score(X_valid, y_valid))
train Score: 0.933534
valid Score: 0.932660
time: 4.63 s

Modeling and Forecasting Using XGBOOST

import xgboost as xgb
params_xgb = {'max_depth': 6,
              'eta': 1,
              'silent': 1,
              'objective': 'binary:logistic',
              'eval_matric': 'f1'}

# Borrowing Gary Mulder's parameters:
params_xgb2 = {'max_depth': 5,
               'eta': 0.025,
               'silent':1,
               'objective': 'binary:logistic',
               'eval_matric': 'auc',
               'minchildweight': 10.0,
               'maxdeltastep': 1.8,
               'colsample_bytree': 0.4,
               'subsample': 0.8,
               'gamma': 0.65,
               'numboostround' : 391}
time: 91.4 ms
regex = re.compile(r"\[|\]|<", re.IGNORECASE)
feature_name = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in X.columns]
time: 1.16 ms
dtrain = xgb.DMatrix(X_train, y_train, feature_names=feature_name)
dvalid = xgb.DMatrix(X_valid, y_valid, feature_names=feature_name)
evals = [(dtrain, 'train'), (dvalid, 'valid')]
model_xgb = xgb.train(params_xgb2, dtrain, 1000, evals, early_stopping_rounds=100);
[0]	train-error:0.066403	valid-error:0.068973
Multiple eval metrics have been passed: 'valid-error' will be used for early stopping.

Will train until valid-error hasn't improved in 100 rounds.
[1]	train-error:0.065118	valid-error:0.066858
[2]	train-error:0.065725	valid-error:0.067259
[3]	train-error:0.066742	valid-error:0.067714
[4]	train-error:0.066751	valid-error:0.067741
[5]	train-error:0.066751	valid-error:0.067741
[6]	train-error:0.066751	valid-error:0.067741
[7]	train-error:0.066751	valid-error:0.067741
[8]	train-error:0.066733	valid-error:0.067768
[9]	train-error:0.066751	valid-error:0.067741
[10]	train-error:0.066751	valid-error:0.067741
[11]	train-error:0.066742	valid-error:0.067741
[12]	train-error:0.066671	valid-error:0.067741
[13]	train-error:0.066742	valid-error:0.067741
[14]	train-error:0.066751	valid-error:0.067741
[15]	train-error:0.066751	valid-error:0.067741
[16]	train-error:0.066751	valid-error:0.067741
[17]	train-error:0.066751	valid-error:0.067741
[18]	train-error:0.066751	valid-error:0.067741
[19]	train-error:0.066751	valid-error:0.067741
[20]	train-error:0.066751	valid-error:0.067741
[21]	train-error:0.066751	valid-error:0.067741
[22]	train-error:0.066751	valid-error:0.067741
[23]	train-error:0.066751	valid-error:0.067741
[24]	train-error:0.066751	valid-error:0.067741
[25]	train-error:0.066751	valid-error:0.067741
[26]	train-error:0.066751	valid-error:0.067741
[27]	train-error:0.066751	valid-error:0.067741
[28]	train-error:0.066751	valid-error:0.067741
[29]	train-error:0.066742	valid-error:0.067741
[30]	train-error:0.066733	valid-error:0.067741
[31]	train-error:0.066671	valid-error:0.067741
[32]	train-error:0.066635	valid-error:0.067714
[33]	train-error:0.066689	valid-error:0.067741
[34]	train-error:0.066742	valid-error:0.067741
[35]	train-error:0.066742	valid-error:0.067741
[36]	train-error:0.066742	valid-error:0.067741
[37]	train-error:0.066698	valid-error:0.067741
[38]	train-error:0.066635	valid-error:0.067741
[39]	train-error:0.066599	valid-error:0.067634
[40]	train-error:0.066617	valid-error:0.067688
[41]	train-error:0.066608	valid-error:0.067634
[42]	train-error:0.066635	valid-error:0.067714
[43]	train-error:0.066653	valid-error:0.067741
[44]	train-error:0.066689	valid-error:0.067741
[45]	train-error:0.066644	valid-error:0.067714
[46]	train-error:0.066689	valid-error:0.067741
[47]	train-error:0.066644	valid-error:0.067714
[48]	train-error:0.066635	valid-error:0.067714
[49]	train-error:0.066617	valid-error:0.067661
[50]	train-error:0.066582	valid-error:0.067607
[51]	train-error:0.06651	valid-error:0.067527
[52]	train-error:0.066457	valid-error:0.067527
[53]	train-error:0.066341	valid-error:0.067473
[54]	train-error:0.066323	valid-error:0.067447
[55]	train-error:0.06626	valid-error:0.067473
[56]	train-error:0.066332	valid-error:0.067473
[57]	train-error:0.06626	valid-error:0.0675
[58]	train-error:0.066189	valid-error:0.067447
[59]	train-error:0.066135	valid-error:0.067447
[60]	train-error:0.066189	valid-error:0.067473
[61]	train-error:0.066144	valid-error:0.067447
[62]	train-error:0.066117	valid-error:0.067366
[63]	train-error:0.066082	valid-error:0.067313
[64]	train-error:0.066028	valid-error:0.067206
[65]	train-error:0.0661	valid-error:0.06734
[66]	train-error:0.066001	valid-error:0.067259
[67]	train-error:0.065984	valid-error:0.067152
[68]	train-error:0.065903	valid-error:0.067018
[69]	train-error:0.065796	valid-error:0.066938
[70]	train-error:0.065876	valid-error:0.066965
[71]	train-error:0.065939	valid-error:0.067045
[72]	train-error:0.065984	valid-error:0.067152
[73]	train-error:0.065894	valid-error:0.066992
[74]	train-error:0.065805	valid-error:0.066965
[75]	train-error:0.065868	valid-error:0.067018
[76]	train-error:0.065912	valid-error:0.067018
[77]	train-error:0.065796	valid-error:0.066965
[78]	train-error:0.065662	valid-error:0.066965
[79]	train-error:0.065725	valid-error:0.066992
[80]	train-error:0.065796	valid-error:0.067018
[81]	train-error:0.065778	valid-error:0.066938
[82]	train-error:0.065752	valid-error:0.066831
[83]	train-error:0.065832	valid-error:0.066938
[84]	train-error:0.065725	valid-error:0.066911
[85]	train-error:0.065609	valid-error:0.066884
[86]	train-error:0.065689	valid-error:0.066938
[87]	train-error:0.065653	valid-error:0.066858
[88]	train-error:0.065618	valid-error:0.066751
[89]	train-error:0.065627	valid-error:0.066751
[90]	train-error:0.065591	valid-error:0.066697
[91]	train-error:0.065636	valid-error:0.066777
[92]	train-error:0.065636	valid-error:0.066804
[93]	train-error:0.065582	valid-error:0.066751
[94]	train-error:0.065582	valid-error:0.06667
[95]	train-error:0.065618	valid-error:0.066751
[96]	train-error:0.065573	valid-error:0.066617
[97]	train-error:0.065484	valid-error:0.066563
[98]	train-error:0.065395	valid-error:0.06659
[99]	train-error:0.065359	valid-error:0.066563
[100]	train-error:0.065421	valid-error:0.066563
[101]	train-error:0.065484	valid-error:0.066617
[102]	train-error:0.065368	valid-error:0.066563
[103]	train-error:0.06527	valid-error:0.066349
[104]	train-error:0.065225	valid-error:0.066269
[105]	train-error:0.065073	valid-error:0.066242
[106]	train-error:0.064984	valid-error:0.066188
[107]	train-error:0.064913	valid-error:0.066162
[108]	train-error:0.064797	valid-error:0.065921
[109]	train-error:0.064868	valid-error:0.066001
[110]	train-error:0.064761	valid-error:0.065813
[111]	train-error:0.064805	valid-error:0.06584
[112]	train-error:0.064743	valid-error:0.065867
[113]	train-error:0.064672	valid-error:0.065813
[114]	train-error:0.064582	valid-error:0.065653
[115]	train-error:0.064475	valid-error:0.065572
[116]	train-error:0.06444	valid-error:0.065626
[117]	train-error:0.06444	valid-error:0.065439
[118]	train-error:0.064404	valid-error:0.065385
[119]	train-error:0.064359	valid-error:0.065385
[120]	train-error:0.06435	valid-error:0.065412
[121]	train-error:0.064368	valid-error:0.065385
[122]	train-error:0.064359	valid-error:0.065465
[123]	train-error:0.06435	valid-error:0.065412
[124]	train-error:0.064359	valid-error:0.065385
[125]	train-error:0.064377	valid-error:0.065546
[126]	train-error:0.064332	valid-error:0.065385
[127]	train-error:0.064341	valid-error:0.065465
[128]	train-error:0.064288	valid-error:0.065465
[129]	train-error:0.064288	valid-error:0.065492
[130]	train-error:0.064216	valid-error:0.065439
[131]	train-error:0.064252	valid-error:0.065385
[132]	train-error:0.064181	valid-error:0.065358
[133]	train-error:0.064047	valid-error:0.065385
[134]	train-error:0.064083	valid-error:0.065358
[135]	train-error:0.064127	valid-error:0.065385
[136]	train-error:0.064091	valid-error:0.065385
[137]	train-error:0.064047	valid-error:0.065412
[138]	train-error:0.06402	valid-error:0.065358
[139]	train-error:0.064002	valid-error:0.065331
[140]	train-error:0.06402	valid-error:0.065358
[141]	train-error:0.063931	valid-error:0.065412
[142]	train-error:0.063993	valid-error:0.065385
[143]	train-error:0.06385	valid-error:0.065358
[144]	train-error:0.063859	valid-error:0.065358
[145]	train-error:0.063868	valid-error:0.065305
[146]	train-error:0.063833	valid-error:0.065251
[147]	train-error:0.063779	valid-error:0.065251
[148]	train-error:0.063681	valid-error:0.065198
[149]	train-error:0.063645	valid-error:0.065198
[150]	train-error:0.06361	valid-error:0.065171
[151]	train-error:0.06361	valid-error:0.06509
[152]	train-error:0.06361	valid-error:0.065144
[153]	train-error:0.063565	valid-error:0.06509
[154]	train-error:0.063547	valid-error:0.065117
[155]	train-error:0.063529	valid-error:0.065117
[156]	train-error:0.063467	valid-error:0.065064
[157]	train-error:0.06352	valid-error:0.065171
[158]	train-error:0.063529	valid-error:0.065224
[159]	train-error:0.063422	valid-error:0.06509
[160]	train-error:0.063413	valid-error:0.065117
[161]	train-error:0.063476	valid-error:0.065171
[162]	train-error:0.063395	valid-error:0.065144
[163]	train-error:0.063422	valid-error:0.065144
[164]	train-error:0.063395	valid-error:0.065144
[165]	train-error:0.06336	valid-error:0.065144
[166]	train-error:0.063369	valid-error:0.065171
[167]	train-error:0.063324	valid-error:0.065117
[168]	train-error:0.06336	valid-error:0.065064
[169]	train-error:0.063315	valid-error:0.065064
[170]	train-error:0.063333	valid-error:0.065037
[171]	train-error:0.063315	valid-error:0.06509
[172]	train-error:0.063297	valid-error:0.065117
[173]	train-error:0.063315	valid-error:0.065144
[174]	train-error:0.063306	valid-error:0.065117
[175]	train-error:0.063253	valid-error:0.065117
[176]	train-error:0.063279	valid-error:0.065117
[177]	train-error:0.063324	valid-error:0.065117
[178]	train-error:0.063288	valid-error:0.06509
[179]	train-error:0.063297	valid-error:0.065198
[180]	train-error:0.063288	valid-error:0.06509
[181]	train-error:0.063297	valid-error:0.065117
[182]	train-error:0.063288	valid-error:0.065144
[183]	train-error:0.06327	valid-error:0.065037
[184]	train-error:0.063217	valid-error:0.064876
[185]	train-error:0.063244	valid-error:0.06493
[186]	train-error:0.063181	valid-error:0.064876
[187]	train-error:0.063181	valid-error:0.064876
[188]	train-error:0.063145	valid-error:0.06485
[189]	train-error:0.063128	valid-error:0.06485
[190]	train-error:0.06319	valid-error:0.064876
[191]	train-error:0.063172	valid-error:0.064796
[192]	train-error:0.063154	valid-error:0.064823
[193]	train-error:0.063181	valid-error:0.06485
[194]	train-error:0.063172	valid-error:0.06485
[195]	train-error:0.063181	valid-error:0.064823
[196]	train-error:0.06319	valid-error:0.064823
[197]	train-error:0.063128	valid-error:0.06485
[198]	train-error:0.063092	valid-error:0.06485
[199]	train-error:0.063029	valid-error:0.064823
[200]	train-error:0.063065	valid-error:0.064823
[201]	train-error:0.06302	valid-error:0.06485
[202]	train-error:0.063012	valid-error:0.064823
[203]	train-error:0.062976	valid-error:0.06485
[204]	train-error:0.063012	valid-error:0.06485
[205]	train-error:0.062958	valid-error:0.064957
[206]	train-error:0.062931	valid-error:0.064903
[207]	train-error:0.062922	valid-error:0.064903
[208]	train-error:0.06294	valid-error:0.06493
[209]	train-error:0.062904	valid-error:0.064876
[210]	train-error:0.062869	valid-error:0.064903
[211]	train-error:0.062895	valid-error:0.06493
[212]	train-error:0.062869	valid-error:0.064957
[213]	train-error:0.062895	valid-error:0.06493
[214]	train-error:0.062851	valid-error:0.06493
[215]	train-error:0.062851	valid-error:0.06493
[216]	train-error:0.062824	valid-error:0.064876
[217]	train-error:0.062806	valid-error:0.064796
[218]	train-error:0.062753	valid-error:0.064796
[219]	train-error:0.062762	valid-error:0.064823
[220]	train-error:0.062735	valid-error:0.064769
[221]	train-error:0.062699	valid-error:0.064823
[222]	train-error:0.062717	valid-error:0.06485
[223]	train-error:0.06269	valid-error:0.064742
[224]	train-error:0.06269	valid-error:0.064742
[225]	train-error:0.062672	valid-error:0.064769
[226]	train-error:0.062646	valid-error:0.064769
[227]	train-error:0.062646	valid-error:0.064796
[228]	train-error:0.062637	valid-error:0.064769
[229]	train-error:0.062646	valid-error:0.064769
[230]	train-error:0.062646	valid-error:0.064769
[231]	train-error:0.062646	valid-error:0.064742
[232]	train-error:0.062655	valid-error:0.064742
[233]	train-error:0.062646	valid-error:0.064769
[234]	train-error:0.062655	valid-error:0.064769
[235]	train-error:0.062663	valid-error:0.064796
[236]	train-error:0.062637	valid-error:0.064796
[237]	train-error:0.06261	valid-error:0.064823
[238]	train-error:0.062619	valid-error:0.06485
[239]	train-error:0.062583	valid-error:0.064823
[240]	train-error:0.062574	valid-error:0.064716
[241]	train-error:0.062547	valid-error:0.064769
[242]	train-error:0.062574	valid-error:0.064742
[243]	train-error:0.062565	valid-error:0.064689
[244]	train-error:0.062583	valid-error:0.064689
[245]	train-error:0.062574	valid-error:0.064689
[246]	train-error:0.062565	valid-error:0.064716
[247]	train-error:0.062574	valid-error:0.064716
[248]	train-error:0.062538	valid-error:0.064689
[249]	train-error:0.062521	valid-error:0.064716
[250]	train-error:0.06253	valid-error:0.064662
[251]	train-error:0.06253	valid-error:0.064689
[252]	train-error:0.062476	valid-error:0.064662
[253]	train-error:0.062476	valid-error:0.064716
[254]	train-error:0.062503	valid-error:0.064716
[255]	train-error:0.062503	valid-error:0.064716
[256]	train-error:0.062521	valid-error:0.064635
[257]	train-error:0.062476	valid-error:0.064635
[258]	train-error:0.062485	valid-error:0.064635
[259]	train-error:0.062503	valid-error:0.064609
[260]	train-error:0.062449	valid-error:0.064475
[261]	train-error:0.062414	valid-error:0.064421
[262]	train-error:0.062414	valid-error:0.064421
[263]	train-error:0.062396	valid-error:0.064421
[264]	train-error:0.062378	valid-error:0.064448
[265]	train-error:0.062351	valid-error:0.064475
[266]	train-error:0.062342	valid-error:0.064448
[267]	train-error:0.062342	valid-error:0.064528
[268]	train-error:0.062333	valid-error:0.064528
[269]	train-error:0.062324	valid-error:0.064528
[270]	train-error:0.062306	valid-error:0.064501
[271]	train-error:0.062298	valid-error:0.064475
[272]	train-error:0.062306	valid-error:0.064528
[273]	train-error:0.06228	valid-error:0.064555
[274]	train-error:0.062289	valid-error:0.064609
[275]	train-error:0.062253	valid-error:0.064662
[276]	train-error:0.062271	valid-error:0.064609
[277]	train-error:0.062253	valid-error:0.064609
[278]	train-error:0.062235	valid-error:0.064609
[279]	train-error:0.062217	valid-error:0.064501
[280]	train-error:0.062226	valid-error:0.064555
[281]	train-error:0.062235	valid-error:0.064501
[282]	train-error:0.062226	valid-error:0.064448
[283]	train-error:0.062181	valid-error:0.064394
[284]	train-error:0.062199	valid-error:0.064448
[285]	train-error:0.062173	valid-error:0.064448
[286]	train-error:0.062146	valid-error:0.064421
[287]	train-error:0.062137	valid-error:0.064394
[288]	train-error:0.062155	valid-error:0.064394
[289]	train-error:0.062173	valid-error:0.064394
[290]	train-error:0.062164	valid-error:0.064421
[291]	train-error:0.062137	valid-error:0.064501
[292]	train-error:0.062146	valid-error:0.064555
[293]	train-error:0.062137	valid-error:0.064501
[294]	train-error:0.06211	valid-error:0.064528
[295]	train-error:0.062101	valid-error:0.064528
[296]	train-error:0.062092	valid-error:0.064475
[297]	train-error:0.062092	valid-error:0.064475
[298]	train-error:0.062083	valid-error:0.064448
[299]	train-error:0.062092	valid-error:0.064394
[300]	train-error:0.06203	valid-error:0.064528
[301]	train-error:0.061994	valid-error:0.064501
[302]	train-error:0.061994	valid-error:0.064475
[303]	train-error:0.062012	valid-error:0.064475
[304]	train-error:0.061985	valid-error:0.064475
[305]	train-error:0.062003	valid-error:0.064475
[306]	train-error:0.061941	valid-error:0.064421
[307]	train-error:0.061932	valid-error:0.064421
[308]	train-error:0.061923	valid-error:0.064421
[309]	train-error:0.061878	valid-error:0.064421
[310]	train-error:0.061869	valid-error:0.064421
[311]	train-error:0.061869	valid-error:0.064394
[312]	train-error:0.061878	valid-error:0.064368
[313]	train-error:0.061869	valid-error:0.064394
[314]	train-error:0.061869	valid-error:0.064421
[315]	train-error:0.061878	valid-error:0.064475
[316]	train-error:0.061851	valid-error:0.064475
[317]	train-error:0.061878	valid-error:0.064448
[318]	train-error:0.061869	valid-error:0.064394
[319]	train-error:0.061833	valid-error:0.064394
[320]	train-error:0.061789	valid-error:0.064314
[321]	train-error:0.061807	valid-error:0.064314
[322]	train-error:0.061807	valid-error:0.064314
[323]	train-error:0.061789	valid-error:0.064287
[324]	train-error:0.061789	valid-error:0.064314
[325]	train-error:0.061789	valid-error:0.064287
[326]	train-error:0.061798	valid-error:0.06426
[327]	train-error:0.061798	valid-error:0.06426
[328]	train-error:0.061798	valid-error:0.06426
[329]	train-error:0.061798	valid-error:0.064234
[330]	train-error:0.061789	valid-error:0.064234
[331]	train-error:0.061789	valid-error:0.06426
[332]	train-error:0.061798	valid-error:0.064314
[333]	train-error:0.061807	valid-error:0.064314
[334]	train-error:0.061816	valid-error:0.064341
[335]	train-error:0.061816	valid-error:0.064314
[336]	train-error:0.061824	valid-error:0.064287
[337]	train-error:0.061824	valid-error:0.064314
[338]	train-error:0.061833	valid-error:0.064314
[339]	train-error:0.061816	valid-error:0.064314
[340]	train-error:0.061816	valid-error:0.064234
[341]	train-error:0.061789	valid-error:0.06426
[342]	train-error:0.061771	valid-error:0.06426
[343]	train-error:0.06178	valid-error:0.064314
[344]	train-error:0.061798	valid-error:0.064287
[345]	train-error:0.061798	valid-error:0.06418
[346]	train-error:0.061744	valid-error:0.064207
[347]	train-error:0.061762	valid-error:0.064153
[348]	train-error:0.061762	valid-error:0.064153
[349]	train-error:0.061762	valid-error:0.064153
[350]	train-error:0.061771	valid-error:0.064234
[351]	train-error:0.061762	valid-error:0.064234
[352]	train-error:0.061744	valid-error:0.064234
[353]	train-error:0.06178	valid-error:0.064234
[354]	train-error:0.061744	valid-error:0.064234
[355]	train-error:0.061744	valid-error:0.06426
[356]	train-error:0.061753	valid-error:0.064287
[357]	train-error:0.061735	valid-error:0.064234
[358]	train-error:0.061744	valid-error:0.06426
[359]	train-error:0.061726	valid-error:0.06426
[360]	train-error:0.061691	valid-error:0.06426
[361]	train-error:0.0617	valid-error:0.06426
[362]	train-error:0.061691	valid-error:0.064287
[363]	train-error:0.061691	valid-error:0.064234
[364]	train-error:0.061691	valid-error:0.064234
[365]	train-error:0.061664	valid-error:0.064287
[366]	train-error:0.061673	valid-error:0.064287
[367]	train-error:0.061646	valid-error:0.064314
[368]	train-error:0.061646	valid-error:0.064314
[369]	train-error:0.061655	valid-error:0.064287
[370]	train-error:0.061646	valid-error:0.064314
[371]	train-error:0.061673	valid-error:0.064314
[372]	train-error:0.061682	valid-error:0.064314
[373]	train-error:0.061664	valid-error:0.064341
[374]	train-error:0.061682	valid-error:0.064368
[375]	train-error:0.061655	valid-error:0.064368
[376]	train-error:0.061637	valid-error:0.064368
[377]	train-error:0.061619	valid-error:0.064341
[378]	train-error:0.06161	valid-error:0.064368
[379]	train-error:0.061628	valid-error:0.064368
[380]	train-error:0.061619	valid-error:0.064368
[381]	train-error:0.061619	valid-error:0.064368
[382]	train-error:0.061637	valid-error:0.064341
[383]	train-error:0.061592	valid-error:0.064341
[384]	train-error:0.061592	valid-error:0.064341
[385]	train-error:0.061575	valid-error:0.06426
[386]	train-error:0.061584	valid-error:0.064287
[387]	train-error:0.061584	valid-error:0.064287
[388]	train-error:0.061592	valid-error:0.064234
[389]	train-error:0.061575	valid-error:0.06426
[390]	train-error:0.061539	valid-error:0.064234
[391]	train-error:0.061521	valid-error:0.06426
[392]	train-error:0.061521	valid-error:0.064234
[393]	train-error:0.06153	valid-error:0.064207
[394]	train-error:0.061539	valid-error:0.064207
[395]	train-error:0.061521	valid-error:0.064234
[396]	train-error:0.061485	valid-error:0.064287
[397]	train-error:0.061485	valid-error:0.064287
[398]	train-error:0.061485	valid-error:0.064287
[399]	train-error:0.061494	valid-error:0.064287
[400]	train-error:0.061485	valid-error:0.064287
[401]	train-error:0.061503	valid-error:0.064287
[402]	train-error:0.061494	valid-error:0.064287
[403]	train-error:0.061494	valid-error:0.064314
[404]	train-error:0.061512	valid-error:0.064314
[405]	train-error:0.061521	valid-error:0.064314
[406]	train-error:0.061503	valid-error:0.064341
[407]	train-error:0.061494	valid-error:0.064368
[408]	train-error:0.061476	valid-error:0.064368
[409]	train-error:0.061476	valid-error:0.064341
[410]	train-error:0.061476	valid-error:0.064341
[411]	train-error:0.061459	valid-error:0.064314
[412]	train-error:0.061423	valid-error:0.06426
[413]	train-error:0.061432	valid-error:0.064207
[414]	train-error:0.06145	valid-error:0.064207
[415]	train-error:0.061467	valid-error:0.064207
[416]	train-error:0.061459	valid-error:0.064207
[417]	train-error:0.061467	valid-error:0.064234
[418]	train-error:0.061459	valid-error:0.064234
[419]	train-error:0.061423	valid-error:0.064234
[420]	train-error:0.061432	valid-error:0.064234
[421]	train-error:0.06145	valid-error:0.06426
[422]	train-error:0.061441	valid-error:0.06426
[423]	train-error:0.061423	valid-error:0.06426
[424]	train-error:0.061441	valid-error:0.06426
[425]	train-error:0.061432	valid-error:0.064234
[426]	train-error:0.061432	valid-error:0.064234
[427]	train-error:0.061414	valid-error:0.064234
[428]	train-error:0.061432	valid-error:0.064234
[429]	train-error:0.061396	valid-error:0.064234
[430]	train-error:0.061423	valid-error:0.064234
[431]	train-error:0.061405	valid-error:0.06426
[432]	train-error:0.06136	valid-error:0.06426
[433]	train-error:0.061369	valid-error:0.06426
[434]	train-error:0.061396	valid-error:0.06426
[435]	train-error:0.061405	valid-error:0.06426
[436]	train-error:0.061405	valid-error:0.06426
[437]	train-error:0.061378	valid-error:0.064287
[438]	train-error:0.061369	valid-error:0.064314
[439]	train-error:0.061378	valid-error:0.064314
[440]	train-error:0.06136	valid-error:0.064314
[441]	train-error:0.061343	valid-error:0.064314
[442]	train-error:0.061325	valid-error:0.064287
[443]	train-error:0.061325	valid-error:0.064341
[444]	train-error:0.061307	valid-error:0.064314
[445]	train-error:0.061325	valid-error:0.064314
[446]	train-error:0.061325	valid-error:0.064314
[447]	train-error:0.061307	valid-error:0.064341
Stopping. Best iteration:
[347]	train-error:0.061762	valid-error:0.064153

time: 1min 59s

Save the model

model_xgb.dump_model('xgb_v1')
time: 206 ms

The Importance of Characteristic Assessment of Credit Card

xgb.plot_importance(model_xgb);
time: 559 ms

Visualization of XGBOOST Trees

xgb.to_graphviz(model_xgb)
time: 159 ms

Forecast whether the borrower will anticipate

dtest = xgb.DMatrix(X_valid, feature_names=feature_name)
y_test = model_xgb.predict(dtest)
entry = pd.DataFrame()
entry['ID'] = np.arange(1, len(y_test)+1)
entry['Probability'] = y_test
time: 1.17 s
entry.to_csv('pred.csv', header=True, index=False)
time: 258 ms

IV. Summary

  1. The age distribution of the loan population is basically normal, and the loan amount of 30-40 is the largest.
  2. 20-30 people are at high risk of credit card overdue
  3. The total balance of credit cards and personal credit lines, debt ratio and monthly income are the three most important factors for whether the lender will be overdue.
  4. Because of the sparse data, the discrete processing is awakened before the modeling, which is helpful to build a strong model.

Topics: network