18.1 1st place solution

The first placed team was a two man show, they present their solution at Kaggle discussion

The team consisted of

  • Two ML experts
  • Private team

18.1.1 Data exploration

Two weeks were invested to explore the data regarding:

  • Statistics
  • Correlation

18.1.2 Hand crafted features

The team created their own features

Time features are:

  • StartStationTimes
  • StartTime, EndTime, Duration
  • StationTimeDiff
  • Start/End part of week (mod 1680)
  • Number of records in next/last 2.5h, 24h, 168h for each station
  • Number of records in the same time (6 mins)
  • MeanTimeDiff since last 1/5/10 failure(s)
  • MeanTimeDiff till next 1/5/10 failure(s)

Numeric features are:

  • Raw numeric features (most of the time we used the raw numeric features or simple subsets based on xgb feature importance)

  • Z-scaled features for each week

  • Count encoding for each value

  • Feature combinations (f1 + - * f2)

18.1.3 Hardware

Since there was no usage of NN the hardware cold be rather modest

  • Desktop machine (16GB RAM)