## 17.3 8th place solution with GitHub

The eighth placed team was a team of eighth, they present their solution at Kaggle discussion

The team consisted of

• Eigth people
• Private group
• Organised via the net

### 17.3.1 Overall architecture

A variety of model were combined

• LightGBM (gbm)
• xgboost (xgb)
• Random Forest (rf)
• Neural Networks (didn’t get picked up on level 2, so they were removed)

### 17.3.2 Input data sets

The team created different data sets and used them with different models

Level 1 data set:

• Data set 1 (0.477 gbm): order, raw numeric, date, categorical
• Data set 2 (0.482 gbm, 0.477 xgb, 0.473 rf): order, path, raw numeric, date
• Data set 3 (0.479 gbm, 0.473 xgb): order, path, numeric, date, refined categorical
• Data set 4 (0.469 xgb, 0.442 rf): has features sorted by numeric values + date features + path, unsupervised nearest neighbors (L1 = Manhattan / L2 = Euclidean distances) per label
• Data set 5 (0.43 xgb): path, unsupervised nearest neighbors

The model was two staged, the second stage was as given below

Level 2 data set:

• Level 1 predictions (we had 12 predictions from level 1)
• Data set 5
• Duplicate feature (count and position)

### 17.3.3 Ensembling

Often a better performance can be achieved when ensembling several model together, good practice is it to use models which a dissimilar because the variance helps to improve the overall performance.

• 30% weighted xgboost gbtree (~0.488 CV)
• 70% weighted Random Forest (~0.485 CV)

### 17.3.4 Features

#### 17.3.4.1 Features used

Features were created using several methods

• Maximum
• Minimum
• Kurtosis
• Lag
• One-hot encoded

### 17.3.5 Validation method

The validation method used was 5-fold cross validation

### 17.3.6 Software

The team used a variety of programming languages and tools

### 17.3.7 Code on GitHub

A detailed explanation of the code is given on GitHub

The scripts for:

• Pre-processing
• Feature engineering
• Modeling scripts
• Hyperparameter optimization using HyperOpt

#### 17.3.7.1 Level 1 model scripts

Lets look into some of the model scripts

##### 17.3.7.1.1GBM Model

temp_model <- lgbm.cv(y_train = label,
x_train = train,
x_test = test,
data_has_label = TRUE,
NA_value = "nan",
lgbm_path = my_lgbm_is_at,
workingdir = my_script_is_using,
files_exist = TRUE,
save_binary = FALSE,
validation = TRUE,
folds = folds,
predictions = TRUE,
importance = TRUE,
full_quiet = FALSE,
verbose = FALSE,
application = "binary",
learning_rate = eta, # The shrinkage rate applied to each iteration
num_iterations = 5000, # The number of boosting iterations
early_stopping_rounds = 700, # The number of boosting iterations whose validation metric is lower than the best is required for LightGBM to automatically stop
num_leaves = leaves, # The number of leaves in one tree
min_data_in_leaf = min_sample, # Minimum number of data in one leaf
min_sum_hessian_in_leaf = min_hess, # Minimum sum of hessians in one leaf to allow a split
max_bin = 255, # The maximum number of bins created per feature
feature_fraction = colsample, # Column subsampling percentage. For instance, 0.5 means selecting 50% of features randomly for each iteration
bagging_fraction = subsample, # Row subsampling percentage. For instance, 0.5 means selecting 50% of rows randomly for each iteration.
bagging_freq = sampling_freq, # The frequency of row subsampling
is_unbalance = FALSE, #  For binary classification, setting this to TRUE might be useful when the training data is unbalanced
metric = "auc",
is_training_metric = TRUE, #  Whether to report the training metric in addition to the validation metric
is_sparse = FALSE) # Whether sparse optimization is enabled

##### 17.3.7.1.2XGBoost model

temp_model <- xgb.train(data = dtrain,
nrounds = floor(best_iter * 1.1), # max number of boosting iterations.
eta = 0.05, # control the learning rate: scale the contribution of each tree by a factor of 0 < eta < 1 when it is added to the current approximation
depth = 7, # maximum depth of a tree
#gamma = 20, #  minimum loss reduction required to make a further partition on a leaf node of the tree.
subsample = 0.9, # Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
colsample_bytree = 0.7, # subsample ratio of columns when constructing each tree
min_child_weight = 50, # minimum sum of instance weight (hessian) needed in a child
booster = "gbtree", # which booster to use, can be gbtree or gblinear
#feval = mcc_eval_nofail,
eval_metric = "auc",
maximize = TRUE,
objective = "binary:logistic",
verbose = TRUE,
prediction = TRUE,
watchlist = list(test = dtrain))


#### 17.3.7.2 Level 2 model scripts

##### 17.3.7.2.170% weighted Random Forest (~0.485 CV)

First read in the results of level 1 models which are now the features for the level 2 model

train <- read_feather("Shubin/retrain_material/train.feather")
train[, "xgb_jay_joost_v2"] <- fread("Laurae/20161110_xgb_jayjoost_fix2/aaa_stacker_preds_train_headerY_scale.csv")$x test[, "xgb_jay_joost_v2"] <- fread("Laurae/20161110_xgb_jayjoost_fix2/aaa_stacker_preds_test_headerY_scale.csv")$x
train[, "gbm_jay_joost_v2"] <- fread("Laurae/20161111_lgbm_jayjoost/aaa_stacker_preds_train_headerY_scale.csv")$x test[, "gbm_jay_joost_v2"] <- fread("Laurae/20161111_lgbm_jayjoost/aaa_stacker_preds_test_headerY_scale.csv")$x
train[, "gbm_jay"] <- fread("Laurae/20161111_lgbm_jay/aaa_stacker_preds_train_headerY_scale.csv")$x test[, "gbm_jay"] <- fread("Laurae/20161111_lgbm_jay/aaa_stacker_preds_test_headerY_scale.csv")$x
train[, "gbm_mike"] <- fread("Laurae/20161110_lgbm_mike/aaa_stacker_preds_train_headerY_scale.csv")$x test[, "gbm_mike"] <- fread("Laurae/20161110_lgbm_mike/aaa_stacker_preds_test_headerY_scale.csv")$x
train[, "xgb_mike"] <- fread("Laurae/20161110_xgb_mike/aaa_stacker_preds_train_headerY_scale.csv")$x test[, "xgb_mike"] <- fread("Laurae/20161110_xgb_mike/aaa_stacker_preds_test_headerY_scale.csv")$x

then train the level 2 model

  temp_model <- h2o.randomForest(x = 1:12,
y = "Response",
training_frame = my_train[[i]],
ntrees = 200, # Number of trees
max_depth = 12, # Maximum tree depth
min_rows = 20, # Fewest allowed (weighted) observations in a leaf
seed = 11111)
##### 17.3.7.2.2 Hyperparameter optimization using HyperOpt

The models have been implemented in R, the hyperparameter optimizsation is implemented in Python.

Define parameters to be optimized

# Random Forest Params
params = {'n_estimators': 100}
params['random_state'] = 100
params['max_features'] = hp.choice('max_features', range(10, 199))
params['max_depth'] = hp.choice('max_depth', range(7,30))
params['verbose'] = 10
params['n_jobs'] = -1


Run optimizer from the library Hyperopt


# Hyperopt
trials = Trials()
counter = 0
best = fmin(score_rf,
params,
algo=tpe.suggest, # search algorithm
max_evals=200,
trials=trials)

choosing the trials option gives back a dictionary with

• trials.trials - a list of dictionaries representing everything about the search
• trials.results - a list of dictionaries returned by ‘objective’ during the search
• trials.losses() - a list of losses (float for each ‘ok’ trial)
• trials.statuses() - a list of status strings