Spatial Transformer Networks Tutorial, Part 2 — Bilinear Interpolation

A Self-Contained Introduction

Spatial Transformer modules are a popular way to increase spatial invariance of a model against spatial transformations such as translation, scaling, rotation, cropping, as well as non-rigid deformations. They can be inserted into existing convolutional architectures: either immediately following the input or in deeper layers. They achieve spatial invariance by adaptively transforming their input to a canonical, expected pose, thus leading to a better classification performance. The word adaptive indicates, that for each sample an appropriate transformation is produced, conditional on the input itself. Spatial transformers networks can be trained end-to-end using standard backpropagation.

In this tutorial, we are going to cover all prerequisites needed for gaining a deep understanding of spatial transformers. In the last post, we have introduced the concepts of forward and reverse mapping. In this post we will delve into the details of bilinear interpolation. In the next post, we will introduce all building blocks a spatial transformer module is made of. Finally, in the fourth and last post, we will derive all backpropagation equations from scratch.

As a warm-up we will start with the simple one-dimensional case. Here we are dealing with a sequence of data points that lie on an equally spaced grid:

Since in this tutorial, we will mostly deal with image data, we can safely assume the space between two consecutive data points to be one without loss of generality.

Please note, that the discrete sequence in the diagram above is defined only for integer positions and undefined everywhere else. However, oftentimes we require values at non-integer positions, such as 2.7 in the above example. This is accomplished by interpolation techniques, which estimate the unknown data values from known data values. In the following, we will refer to undefined points, which we want to estimate using an interpolation technique, as sample points and denote them with letter 𝑥.

In linear interpolation we simply fit a straight line to only (!) two neighboring points of 𝑥 and then look up the desired value:

We find the neighboring points of 𝑥 by taking the floor and ceil operations. Remember: floor() rounds 𝑥 to the nearest integer…

## Comments by halbot