The Fenchel–Rockafellar Theorem

The Fenchel–Rockafellar Theorem (or Fenchel's duality theorem) is a really cool way of transforming a (potentially crazy complicated) optimization problem into its (often more well-behaved) dual problem. In this article, we will try to understand how this works (both intuitively and by looking closely at its proof).

The exposition of the material is strongly based on Haim Brezis' book "Functional Analysis, Sobolev Spaces and Partial Differential Equations".

The structure of this article is "tree-like": The current page is its main stem (covering only the most basic parts) but you are welcome to divert on side branches ("examples", "visualization", "rigorous proof" etc.). This is supposed to help you concentrate on the stuff you are actually interested in (so you do not need to skip large parts of the article).

Mathematical prerequisites

We assume basic familiarity with the following notions:

Banach spaces. A B.S. \((E, \|\cdot\|)\) is a complete normed vector space. In the following, all basic quantities lie in this Banach space \(E\). We will also need its dual space \(E^*\), which is the space of continuous linear functionals on \(E\). The dual pairing between functionals \(f\in E^*\) and elements \(x\in E\) is denoted by \(\langle f, x\rangle\)
Note: If you aren't on good terms with Banach spaces, you can always mentally replace:
- \(E \stackrel{\text{(read as)}}{\longmapsto} \mathbb R^n\)
- \(E^* \stackrel{\text{(read as)}}{\longmapsto} \mathbb R^n\)
- \(\langle f, x\rangle \stackrel{\text{(read as)}}{\longmapsto} f^T\cdot x\) (the scalar product of vectors in \(\mathbb R^n\))
Convexity of functions \(\phi: E\to \mathbb R\cup \{+\infty\}\) and convex sets (in vector spaces) and the following definitions:
- The domain of \(\phi: E\to \mathbb R\cup \{+\infty\}\) is \(D(\phi) = \{x\in E: \phi(x) < +\infty\}\)
- The epigraph of \(\phi\) is \(\operatorname{epi}(\phi) = \{[x, \lambda] \in E\times \mathbb R:~ \phi(x) \leq \lambda\}\)
The Hahn-Banach separation theorem in the following form: Let \(E\) be a Banach space. Given two convex sets \(A, B\subset E\) with the following properties:
1. \(A\cap B = \emptyset\)
2. One of the sets is open. (This is only necessary for infinite-dimensional \(E.\))
Then there is a closed hypersurface separating \(A\) and \(B\), i.e. there is a continuous, non-zero functional \(f\in E^*\) and a number \(\alpha \in \mathbb R\) such that \[ f(x) \begin{cases}\leq \alpha &\text{ for all} \quad x\in A\\\geq \alpha &\text{ for all} \quad x\in B\end{cases} \]

Convex conjugate

Let \(\phi: E\to \mathbb R\cup \{+\infty\}\) be a function. Then its convex conjugate is a function \(\phi^*: E^*\to\mathbb R\) defined as \[\phi^*(f) = \sup_{x\in E}\{\langle f, x\rangle - \phi(x)\}. \] In our article we will always just use its negative version \(-\phi^*(f)\) (this is not \((-\phi)^*(f)\)) so we think of \(-\phi^*\) as an inseparable unit for now.

The Fenchel–Rockafellar Theorem: Statement

Let \(\phi,\psi: E\to \mathbb R\cup \{+\infty\}\) be two convex functions. Assume that there is a \(x_0\in D(\phi)\cap D(\psi)\) such that \(\phi\) is continuous at \(x_0\). Then \[\begin{aligned}\inf_{x\in E}\{\phi(x) + \psi(x)\} &= \sup_{f\in E^*}\{-\phi^*(f)-\psi^*(-f)\} \\ &= \max_{f\in E^*}\{-\phi^*(f)-\psi^*(-f)\} \end{aligned} \]

The Fenchel–Rockafellar Theorem: Why is it awesome?

The Fenchel–Rockafellar Theorem translates a "primal" (vanilla) minimization problem into an (often more well-behaved) "dual" maximization problem. The cool part is that the maximum is guaranteed to exist.

The Fenchel–Rockafellar Theorem: Why can we expect it to hold?

A very rough idea is the following: Instead of minimizing \(\phi + \psi = \phi - (-\psi)\), i.e. the signed distance between the graphs of \(\phi\) and \(-\psi\), we can also look for hypersurfaces separating the graphs and maximize the respective signed distances between the graphs and the hypersurface.

Plot of \(\color{blue}\phi(x)\) and \(\color{red}-\psi(x)\).

Plot of \(\color{blue}-\phi^*(f)-\psi^*(-f)\) .

The Fenchel–Rockafellar Theorem: Proof

The main idea is to apply the Hahn-Banach theorem to separate the graphs of \(\phi\) and \(-\psi\).