9.5 Neural networks

non linear

activation

softmax types of layers siehe keras fully connected

9.5.1 Geometric Intuition for Training Neural Networks

The YouTube video by Leo Dirac gives are very intuitive explanation of how to a loss function is calculated in a 2-D example and also gives an introduction on Stochastic Weight Averaging (SWA). Leo also discusses minima in high dimensional space.

Stochastic Weight Averaging (SWA):

  • Details at https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/
  • High dimensional function have
    • plenty of saddle points
    • only one minima
      • minima is wide
      • train and test error surfaces are not perfectly aligned in the weight space
      • therefore solution in middle of minima is better in terms of generalization
    • what seems to be seperated minma are connected via one of the many dimensions
  • SWA has been shown to significantly improve generalization in computer vision tasks
  • SWA is shown to improve the stability of training as well as the final average rewards of policy-gradient methods in deep reinforcement learning
  • SWA is suited for low precision training

Leo states his experience on twitter as follows:


An implementation in PyTorch is shown below


from torchcontrib.optim import SWA

...
...

# training loop
base_opt = torch.optim.SGD(model.parameters(), lr=0.1)
opt = torchcontrib.optim.SWA(base_opt, swa_start=10, swa_freq=5, swa_lr=0.05)
for _ in range(100):
     opt.zero_grad()
     loss_fn(model(input), target).backward()
     opt.step()
opt.swap_swa_sgd()

9.5.2 Convolutional Neural Network (CNN) TBD

A Convolutional Neural Network is an neural network are mainly used to analyze image and audio data.

The following explanation is based on The learning machine tutorial “Classification Convolutional Neural Network (CNN)” https://www.thelearningmachine.ai/cnn

A classical CNN consists of

  • one or more convolutoional layers
  • one or more pooling layers
  • one or more fully connected layers

A classical CNN is depicted in the image below



An image is an 3 dimensional array where the third dimension are for the colors red, green and blue, in case of an RGB image. An image can therefore be represented as shown below



To analyze an image the spatial relation between different pixels hold important information. Therefore it is beneficial to use an algorithm which looks not only at one pixel but also at the neighbouring pixels. One way of doing so is to slide an 2 dimensional array over the image array as can be seen below



The 2 dimensional array is called a kernel and is named depending on its dimensions. The kernel in the graph below is a “3 by 3 kernel”. Sliding the array across the image as gives a set of numbers as shown above. Those kernels can detect structures in images such as

  • lines
  • boxes
  • circles

and kernels in later layers in the CNN can detect more complex structures such as

  • faces
  • wheels
  • trees




A kernel has the same depth of the input, in a case of an RGB image, the depth of the kernel is 3. The output of the kernels is added, for each of the positions of the kernels there is one value at the output.

The movement across the image is based on the stride parameters for x and y direction. In the case below the stride is as follows

  • stride x-direction = 1
  • stride y-direction = 1

Depending on stride with and image dimension it might be necessary to apply padding, for details on padding see deepAi



9.5.2.1 Pooling layer

The pooling layer reduces the dimension as shown below. There are different types of pooling layers

  • max
  • average

below the working mechanism for a max pooling layer is shown. The stride for x and y is one, the dimension of the 5 by 5 input is reduced to 3 by 3

After one or more combination of convolutional and pooling layers one or more fully connected layers learn how to classify the image based on non-linear combinations of the high-level features learned by the previous layers.

Figure from https://blogs.nvidia.com/wp-content/uploads/2018/09/autos-672x378.png https://blogs.nvidia.com/wp-content/uploads/2018/09/autos-672x378.png

The fully connected layer with the soft-max activation at the end gives the probability of each category, often as a result the three to five classes with the highest probability are reported.

  • convolutional and pooling layers => high-level features
  • fully connected layers => combine non-linear high level features for classification

The operating principle of a CNN is shown below



9.5.3 RNN TBD

9.5.4 GANs

GANs from Scratch 1: A deep introduction. With code in PyTorch and TensorFlow

https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f



credit of the image (Zhu et al. 2017)

Generative models learn the intrinsic distribution function of the input data p(x) (or p(x,y) if there are multiple targets/classes in the dataset), allowing them to generate both synthetic inputs x’ and outputs/targets y’, typically given some hidden parameters.

GANs they have proven to be really succesfull in modeling and generating high dimensional data, which is why they’ve become so popular. Nevertheless they are not the only types of Generative Models, others include Variational Autoencoders (VAEs) and pixelCNN/pixelRNN and real NVP. Each model has its own tradeoffs.

Some of the most relevant GAN pros and cons for the are:

  • They currently generate the sharpest images

  • They are easy to train (since no statistical inference is required), and only back-propogation is needed to obtain gradients

  • GANs are difficult to optimize due to unstable training dynamics.

  • No statistical inference can be done with them (except here): GANs belong to the class of direct implicit density models; they model p(x) without explicitly defining the p.d.f.

A neural network G(z, θ₁)

Jupyter notebook on github https://github.com/diegoalejogm/gans/blob/master/1.%20Vanilla%20GAN%20PyTorch.ipynb

References

Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks.” In Computer Vision (ICCV), 2017 IEEE International Conference on.