Lipschitz Continuity

Let us begin with the definition of Lipschitz continuity:

A function f : RᴹRᴺ is Lipschitz continuous if there is a constant L such thatf(x) - f(y)∥  ≦  L x - y∥ for every x, y.

Here ∥·∥ denotes the usual Euclidean distance. The smallest such L is the Lipschitz constant of f and is denoted Lip(f). Notice that this definition can be generalized to functions between arbitrary metric spaces.

In our case, f is our neural network, and we want it to be Lipschitz continuous with a small Lip(f). This will provide an upper bound for the perturbations of the outputs. Lipschitz continuity also has the following property:

Let f = gh. If g and h are Lipschitz continuous, then f is also Lipschitz continuous with Lip(f) Lip(g) Lip(h).

Therefore, as long as we make each component of a neural network Lipschitz continuous with small Lipschitz constants, the whole neural network will also be Lipschitz continuous with small Lipschitz constants.

As a concrete example, a standard 2-layer feedforward network for binary classification can be written as

f = Sigmoid ∘ FC₂ ∘ ReLU ∘ FC₁

where FCᵢ(x) = Wx + bᵢ are fully connected layers. The components of f are FC₁, ReLU, FC₂, and Sigmoid.

Continue reading:—-7f60cf5620c9—4