24.2 #2 solution 🤖 Quantum Uncertainty 🤖

The second placed team was a two man show, they present their solution at Kaggle discussion

The team consisted of

No domain experts
ML experts
Private team

24.2.1 Overall architecture

Since the team had no domain knowledge and “obviously we were at a disadvantage if we tried to become quantum experts in 1 month” they needed the model to build the features.

Deep learning
- Dimension 512 to 2048
- Layers 6 to 24
- Parameters from ~12M to ~100M
Letting the model build the features

24.2.2 Input features and embeddings

Three input arrays of dimension 29 (maximum number of atoms)
- x,y,z position of each atom
- atom type index (C=0, H=1, etc…)
- j-coupling type index (1JHC=0,’2JHH=1,etc.)
No manually engineered features

24.2.3 Data augmentation

Data augmentation helps to increase the data basis by producing new samples. Depending on how the augmentation is done it can also be a way of making the model more robust to disturbance, e.g. createing artificially shadow in images makes model less susceptible to lightning conditions

Rotations (though not used in final model)
J-coupling symmetriy as described here

24.2.4 Ensembling

Often a better performance can be achieved when ensembling several model together, good practice is it to use models which a dissimilar because the variance helps to improve the overall performance.

Trained 14 models
- iterations and versions of same basic structure
Best single model: -3.16234

24.2.5 Hardware

On permise as well as rented hardware was used by the team.

3 x 2080 Ti + 128 Gb RAM + 16c32t processor
2 x 1080 Ti + 64 Gb RAM + 8c16t processor
Rented 8+ 2080 Ti + 64 Gb RAM + 16c32t processor (multiple machines rented as needed)

24.2.6 Software

The team did not use any of the popular ML frameworks but coded their models from scratch

PyTorch
FastAi

24.2.7 Code on GitHub

The code is shared at https://github.com/antorsae/champs-scalar-coupling. The jupyter notebook using FastAi is at https://github.com/antorsae/champs-scalar-coupling/blob/master/atom-transfomer.ipynb

In the “Model” section the transformer is defined as follows:

class AtomTransformer(Module):
    def __init__(self,n_layers,n_heads,d_model,embed_p:float=0,final_p:float=0,d_head=None,deep_decoder=False,
                 dense_out=False, **kwargs):
        
        self.d_model = d_model
        d_head = ifnone(d_head, d_model//n_heads)
        self.transformer = Transformer(n_layers=n_layers,n_heads=n_heads,d_model=d_model,d_head=d_head,
                                       final_p=final_p,dense_out=dense_out,**kwargs)
        
        channels_out = d_model*n_layers if dense_out else d_model
        channels_out_scalar = channels_out + n_types + 1
        if deep_decoder:
            sl = [int(channels_out_scalar/(2**d)) for d in range(int(math.ceil(np.log2(channels_out_scalar/4)-1)))]
            self.scalar = nn.Sequential(*(list(itertools.chain.from_iterable(
                [[nn.Conv1d(sl[i],sl[i+1],1),nn.ReLU(),nn.BatchNorm1d(sl[i+1])] for i in range(len(sl)-1)])) + 
                [nn.Conv1d(sl[-1], 4, 1)]))
        else:
            self.scalar = nn.Conv1d(channels_out_scalar, 4, 1)

        self.magnetic  = nn.Conv1d(channels_out, 9, 1)
        self.dipole    = nn.Linear(channels_out, 3)
        self.potential = nn.Linear(channels_out, 1)
        
        self.pool = nn.AdaptiveAvgPool1d(1)
        
        n_atom_embedding = d_model//2
        n_type_embedding = d_model - n_atom_embedding - 3 #- 1 - 1 
        self.type_embedding = nn.Embedding(len(types)+1,n_type_embedding)
        self.atom_embedding = nn.Embedding(len(atoms)+1,n_atom_embedding)
        self.drop_type, self.drop_atom = nn.Dropout(embed_p), nn.Dropout(embed_p)
            
    def forward(self,xyz,type,ext,atom,mulliken,coulomb,mask_atoms,n_atoms):
        bs, _, n_pts = xyz.shape        
        t = self.drop_type(self.type_embedding((type+1).squeeze(1)))
        a = self.drop_atom(self.atom_embedding((atom+1).squeeze(1)))
        
#        x = torch.cat([xyz, mulliken, ext, mask_atoms.type_as(xyz)], dim=1)
        #x = torch.cat([xyz, mask_atoms.type_as(xyz)], dim=1)
        x = xyz
        x = torch.cat([x.transpose(1,2), t, a], dim=-1) * math.sqrt(self.d_model) # B,N(29),d_model

        mask = (coulomb == 0).unsqueeze(1)
        x = self.transformer(x, mask).transpose(1,2).contiguous()
        
        t_one_hot = torch.zeros(bs,n_types+1,n_pts,device=type.device,dtype=x.dtype).scatter_(1,type+1, 1.)
        
        scalar    = self.scalar(torch.cat([x, t_one_hot], dim=1))
        magnetic  = self.magnetic(x) 
        px = self.pool(x).squeeze(-1)
        dipole    = self.dipole(px)
        potential = self.potential(px)
                
        return type,ext,scalar,magnetic,dipole,potential
    
    def reset(self): pass

The model is instantiated

net, learner = None,None
gc.collect()
torch.cuda.empty_cache()

n_layers=6
n_heads=16
d_model=1024
d_inner=2048*2

deep_decoder = False
dense_out = False

net = AtomTransformer(n_layers=n_layers, n_heads=n_heads,d_model=d_model,d_inner=d_inner,
                      resid_p=0., attn_p=0., ff_p=0., embed_p=0, final_p=0.,
                      deep_decoder=deep_decoder, dense_out=dense_out)

learner = Learner(data,net, loss_func=LMAEMaskedLoss(),)
learner.callbacks.extend([
    SaveModelCallback(learner, monitor='👉🏻LMAE👈🏻', mode='min'),
    LMAEMetric(learner)])