18.1 1st place solution
The first placed team was a two man show, they present their solution at Kaggle discussion
The team consisted of
- Two ML experts
- Private team
18.1.1 Data exploration
Two weeks were invested to explore the data regarding:
- Statistics
- Correlation
18.1.2 Hand crafted features
The team created their own features
Time features are:
- StartStationTimes
- StartTime, EndTime, Duration
- StationTimeDiff
- Start/End part of week (mod 1680)
- Number of records in next/last 2.5h, 24h, 168h for each station
- Number of records in the same time (6 mins)
- MeanTimeDiff since last 1/5/10 failure(s)
- MeanTimeDiff till next 1/5/10 failure(s)
Numeric features are:
Raw numeric features (most of the time we used the raw numeric features or simple subsets based on xgb feature importance)
Z-scaled features for each week
Count encoding for each value
Feature combinations (f1 + - * f2)