23.2 #2 solution Quantum Uncertainty

The second placed team was a two man show, they present their solution at Kaggle discussion

The team consisted of

  • No domain experts
  • ML experts
  • Private team

23.2.1 Overall architecture

Since the team had no domain knowledge and obviously we were at a disadvantage if we tried to become quantum experts in 1 month they needed the model to build the features.

  • Deep learning
    • Dimension 512 to 2048
    • Layers 6 to 24
    • Parameters from ~12M to ~100M
  • Letting the model build the features

23.2.2 Input features and embeddings

  • Three input arrays of dimension 29 (maximum number of atoms)
    • x,y,z position of each atom
    • atom type index (C=0, H=1, etc)
    • j-coupling type index (1JHC=0,2JHH=1,etc.)
  • No manually engineered features

23.2.3 Data augmentation

Data augmentation helps to increase the data basis by producing new samples. Depending on how the augmentation is done it can also be a way of making the model more robust to disturbance, e.g.createing artificially shadow in images makes model less susceptible to lightning conditions

  • Rotations (though not used in final model)
  • J-coupling symmetriy as described here

23.2.4 Ensembling

Often a better performance can be achieved when ensembling several model together, good practice is it to use models which a dissimilar because the variance helps to improve the overall performance.

  • Trained 14 models
    • iterations and versions of same basic structure
  • Best single model: -3.16234

23.2.5 Hardware

On permise as well as rented hardware was used by the team.

  • 3 x 2080 Ti + 128 Gb RAM + 16c32t processor
  • 2 x 1080 Ti + 64 Gb RAM + 8c16t processor
  • Rented 8+ 2080 Ti + 64 Gb RAM + 16c32t processor (multiple machines rented as needed)

23.2.6 Software

The team did not use any of the popular ML frameworks but coded their models from scratch

  • PyTorch
  • FastAi

23.2.7 Code on GitHub

The code is shared at https://github.com/antorsae/champs-scalar-coupling. The jupyter notebook using FastAi is at https://github.com/antorsae/champs-scalar-coupling/blob/master/atom-transfomer.ipynb

In the Model section the transformer is defined as follows:

class AtomTransformer(Module):
    def __init__(self,n_layers,n_heads,d_model,embed_p:float=0,final_p:float=0,d_head=None,deep_decoder=False,
                 dense_out=False, **kwargs):
        
        self.d_model = d_model
        d_head = ifnone(d_head, d_model//n_heads)
        self.transformer = Transformer(n_layers=n_layers,n_heads=n_heads,d_model=d_model,d_head=d_head,
                                       final_p=final_p,dense_out=dense_out,**kwargs)
        
        channels_out = d_model*n_layers if dense_out else d_model
        channels_out_scalar = channels_out + n_types + 1
        if deep_decoder:
            sl = [int(channels_out_scalar/(2**d)) for d in range(int(math.ceil(np.log2(channels_out_scalar/4)-1)))]
            self.scalar = nn.Sequential(*(list(itertools.chain.from_iterable(
                [[nn.Conv1d(sl[i],sl[i+1],1),nn.ReLU(),nn.BatchNorm1d(sl[i+1])] for i in range(len(sl)-1)])) + 
                [nn.Conv1d(sl[-1], 4, 1)]))
        else:
            self.scalar = nn.Conv1d(channels_out_scalar, 4, 1)

        self.magnetic  = nn.Conv1d(channels_out, 9, 1)
        self.dipole    = nn.Linear(channels_out, 3)
        self.potential = nn.Linear(channels_out, 1)
        
        self.pool = nn.AdaptiveAvgPool1d(1)
        
        n_atom_embedding = d_model//2
        n_type_embedding = d_model - n_atom_embedding - 3 #- 1 - 1 
        self.type_embedding = nn.Embedding(len(types)+1,n_type_embedding)
        self.atom_embedding = nn.Embedding(len(atoms)+1,n_atom_embedding)
        self.drop_type, self.drop_atom = nn.Dropout(embed_p), nn.Dropout(embed_p)
            
    def forward(self,xyz,type,ext,atom,mulliken,coulomb,mask_atoms,n_atoms):
        bs, _, n_pts = xyz.shape        
        t = self.drop_type(self.type_embedding((type+1).squeeze(1)))
        a = self.drop_atom(self.atom_embedding((atom+1).squeeze(1)))
        
#        x = torch.cat([xyz, mulliken, ext, mask_atoms.type_as(xyz)], dim=1)
        #x = torch.cat([xyz, mask_atoms.type_as(xyz)], dim=1)
        x = xyz
        x = torch.cat([x.transpose(1,2), t, a], dim=-1) * math.sqrt(self.d_model) # B,N(29),d_model

        mask = (coulomb == 0).unsqueeze(1)
        x = self.transformer(x, mask).transpose(1,2).contiguous()
        
        t_one_hot = torch.zeros(bs,n_types+1,n_pts,device=type.device,dtype=x.dtype).scatter_(1,type+1, 1.)
        
        scalar    = self.scalar(torch.cat([x, t_one_hot], dim=1))
        magnetic  = self.magnetic(x) 
        px = self.pool(x).squeeze(-1)
        dipole    = self.dipole(px)
        potential = self.potential(px)
                
        return type,ext,scalar,magnetic,dipole,potential
    
    def reset(self): pass

The model is instantiated

net, learner = None,None
gc.collect()
torch.cuda.empty_cache()

n_layers=6
n_heads=16
d_model=1024
d_inner=2048*2

deep_decoder = False
dense_out = False

net = AtomTransformer(n_layers=n_layers, n_heads=n_heads,d_model=d_model,d_inner=d_inner,
                      resid_p=0., attn_p=0., ff_p=0., embed_p=0, final_p=0.,
                      deep_decoder=deep_decoder, dense_out=dense_out)

learner = Learner(data,net, loss_func=LMAEMaskedLoss(),)
learner.callbacks.extend([
    SaveModelCallback(learner, monitor='腿MAE', mode='min'),
    LMAEMetric(learner)])