27.17 Favourite specialized hardware
Favourite specialized hardware are shown in the graph below
The last Qualification which is cut off in the legend in the plot above reads “Some college/university study without earning a bachelor’s degree”
- CPU => Central Processing Unit
- Performs basic arithmetic, logic, and input output instructions
- Heart of every computing device
- GPU => Graphics Processing Unit
- Optimized processor for graphics
- Very fast matrix multiplication => speeds up neural network computation
- TPU => Tensor Processing Unit
- A tensor processing unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning.
- Edge TPU
- 4 TOPs24
- Power = 2W
Wei, Brooks, and others (2019)
TPU is highly-optimized for large batches and CNNs, and has the highest training throughput
GPU shows better flexibility and programmability for irregular computations, such as small batches and non- MatMul computations. The training of large FC models also benefits from its sophisticated memory system and higher bandwidth.
CPU has the best programmability, so it achieves the highest FLOPS utilization for RNNs, and it supports the largest model because of large memory capacity.
Tera Operations Per Second ↩︎