## 23.2 #2 solution 🤖 Quantum Uncertainty 🤖

The second placed team was a two man show, they present their solution at Kaggle discussion

The team consisted of

- No domain experts
- ML experts
- Private team

### 23.2.1 Overall architecture

Since the team had no domain knowledge and *“obviously we were at a disadvantage if we tried to become quantum experts in 1 month”* they needed the model to build the features.

- Deep learning
- Dimension 512 to 2048
- Layers 6 to 24
- Parameters from ~12M to ~100M

- Letting the model build the features

### 23.2.2 Input features and embeddings

- Three input arrays of dimension 29 (maximum number of atoms)
- x,y,z position of each atom
- atom type index (C=0, H=1, etc…)
- j-coupling type index (1JHC=0,’2JHH=1,etc.)

- No manually engineered features

### 23.2.3 Data augmentation

Data augmentation helps to increase the data basis by producing new samples. Depending on how the augmentation is done it can also be a way of making the model more robust to disturbance, e.g. createing artificially shadow in images makes model less susceptible to lightning conditions

- Rotations (though not used in final model)
- J-coupling symmetriy as described here

### 23.2.4 Ensembling

Often a better performance can be achieved when ensembling several model together, good practice is it to use models which a dissimilar because the variance helps to improve the overall performance.

- Trained 14 models
- iterations and versions of same basic structure

- Best single model: -3.16234

### 23.2.5 Hardware

On permise as well as rented hardware was used by the team.

- 3 x 2080 Ti + 128 Gb RAM + 16c32t processor
- 2 x 1080 Ti + 64 Gb RAM + 8c16t processor
- Rented 8+ 2080 Ti + 64 Gb RAM + 16c32t processor (multiple machines rented as needed)

### 23.2.6 Software

The team did not use any of the popular ML frameworks but coded their models from scratch

- PyTorch
- FastAi

### 23.2.7 Code on GitHub

The code is shared at https://github.com/antorsae/champs-scalar-coupling. The jupyter notebook using FastAi is at https://github.com/antorsae/champs-scalar-coupling/blob/master/atom-transfomer.ipynb

In the “Model” section the transformer is defined as follows:

```
class AtomTransformer(Module):
def __init__(self,n_layers,n_heads,d_model,embed_p:float=0,final_p:float=0,d_head=None,deep_decoder=False,
=False, **kwargs):
dense_out
self.d_model = d_model
= ifnone(d_head, d_model//n_heads)
d_head self.transformer = Transformer(n_layers=n_layers,n_heads=n_heads,d_model=d_model,d_head=d_head,
=final_p,dense_out=dense_out,**kwargs)
final_p
= d_model*n_layers if dense_out else d_model
channels_out = channels_out + n_types + 1
channels_out_scalar if deep_decoder:
= [int(channels_out_scalar/(2**d)) for d in range(int(math.ceil(np.log2(channels_out_scalar/4)-1)))]
sl self.scalar = nn.Sequential(*(list(itertools.chain.from_iterable(
+1],1),nn.ReLU(),nn.BatchNorm1d(sl[i+1])] for i in range(len(sl)-1)])) +
[[nn.Conv1d(sl[i],sl[i-1], 4, 1)]))
[nn.Conv1d(sl[else:
self.scalar = nn.Conv1d(channels_out_scalar, 4, 1)
self.magnetic = nn.Conv1d(channels_out, 9, 1)
self.dipole = nn.Linear(channels_out, 3)
self.potential = nn.Linear(channels_out, 1)
self.pool = nn.AdaptiveAvgPool1d(1)
= d_model//2
n_atom_embedding = d_model - n_atom_embedding - 3 #- 1 - 1
n_type_embedding self.type_embedding = nn.Embedding(len(types)+1,n_type_embedding)
self.atom_embedding = nn.Embedding(len(atoms)+1,n_atom_embedding)
self.drop_type, self.drop_atom = nn.Dropout(embed_p), nn.Dropout(embed_p)
def forward(self,xyz,type,ext,atom,mulliken,coulomb,mask_atoms,n_atoms):
= xyz.shape
bs, _, n_pts = self.drop_type(self.type_embedding((type+1).squeeze(1)))
t = self.drop_atom(self.atom_embedding((atom+1).squeeze(1)))
a
# x = torch.cat([xyz, mulliken, ext, mask_atoms.type_as(xyz)], dim=1)
#x = torch.cat([xyz, mask_atoms.type_as(xyz)], dim=1)
= xyz
x = torch.cat([x.transpose(1,2), t, a], dim=-1) * math.sqrt(self.d_model) # B,N(29),d_model
x
= (coulomb == 0).unsqueeze(1)
mask = self.transformer(x, mask).transpose(1,2).contiguous()
x
= torch.zeros(bs,n_types+1,n_pts,device=type.device,dtype=x.dtype).scatter_(1,type+1, 1.)
t_one_hot
= self.scalar(torch.cat([x, t_one_hot], dim=1))
scalar = self.magnetic(x)
magnetic = self.pool(x).squeeze(-1)
px = self.dipole(px)
dipole = self.potential(px)
potential
return type,ext,scalar,magnetic,dipole,potential
def reset(self): pass
```

The model is instantiated

```
= None,None
net, learner
gc.collect()
torch.cuda.empty_cache()
=6
n_layers=16
n_heads=1024
d_model=2048*2
d_inner
= False
deep_decoder = False
dense_out
= AtomTransformer(n_layers=n_layers, n_heads=n_heads,d_model=d_model,d_inner=d_inner,
net =0., attn_p=0., ff_p=0., embed_p=0, final_p=0.,
resid_p=deep_decoder, dense_out=dense_out)
deep_decoder
= Learner(data,net, loss_func=LMAEMaskedLoss(),)
learner
learner.callbacks.extend([='👉🏻LMAE👈🏻', mode='min'),
SaveModelCallback(learner, monitor LMAEMetric(learner)])
```