Let us begin with the definition of Lipschitz continuity:
A function f : Rᴹ → Rᴺ is Lipschitz continuous if there is a constant L such that∥f(x) - f(y)∥ ≦ L ∥x - y∥ for every x, y.
Here ∥·∥ denotes the usual Euclidean distance. The smallest such L is the Lipschitz constant of f and is denoted Lip(f). Notice that this definition can be generalized to functions between arbitrary metric spaces.
In our case, f is our neural network, and we want it to be Lipschitz continuous with a small Lip(f). This will provide an upper bound for the perturbations of the outputs. Lipschitz continuity also has the following property:
Let f = g ∘ h. If g and h are Lipschitz continuous, then f is also Lipschitz continuous with Lip(f) ≦ Lip(g) Lip(h).
Therefore, as long as we make each component of a neural network Lipschitz continuous with small Lipschitz constants, the whole neural network will also be Lipschitz continuous with small Lipschitz constants.
As a concrete example, a standard 2-layer feedforward network for binary classification can be written as
f = Sigmoid ∘ FC₂ ∘ ReLU ∘ FC₁
where FCᵢ(x) = Wᵢ x + bᵢ are fully connected layers. The components of f are FC₁, ReLU, FC₂, and Sigmoid.
Continue reading: https://towardsdatascience.com/lipschitz-continuity-and-spectral-normalization-b03b36066b0d?source=rss—-7f60cf5620c9—4