The Fenchel-Rockafellar Theorem: Proof

You can try out the controls on the following silly example.

Recipe for scrambled eggs.

Take eggs and scramble them.

Ingredients

Take three eggs

Method: One by one.

Take one.

A really nice one.

Method: Take one egg out of the box.

Take another one.

You can do it!

Method: Do as before.

And a last one.

I hope you bought enough eggs.

Method: Do you really need more instructions for this?

Scramble and cook.

I mean, it is in the name.

You're on your own now.

We record the statement of the theorem for convenience.

Let \(\phi,\psi: E\to \mathbb R\cup \{+\infty\}\) be functions. Then the following weak duality holds: \[\inf_{x\in E}\{\phi(x) + \psi(x)\} \geq \sup_{f\in E^*}\{-\phi^*(f)-\psi^*(-f)\}.\] Assume now that \(\phi, \psi\) are convex functions and that there is a \(x_0\in D(\phi)\cap D(\psi)\) such that \(\phi\) is continuous at \(x_0\). Then

\(\inf_{x\in E}\{\phi(x) + \psi(x)\} = \sup_{f\in E^*}\{-\phi^*(f)-\psi^*(-f)\}\). This phenomenon is called strong duality.
The \(\sup\) on the right hand side is actually a \(\max\), i.e. there is a function \(g\in E^*\) such that the equality \[ \sup_{f\in E^*}\{-\phi^*(f)-\psi^*(-f)\} = -\phi^*(g)-\psi^*(-g)\] holds

Proof

Sketch of proof: We prove the theorem by applying the Hahn-Banach theorem in order to separate the epigraph of \(\phi\) and the subgraph of \(\inf_{x\in E}\{\phi(x) + \psi(x)\} - \psi\). The linear functional governing the separating hyperplane will be exactly the element in \(E^*\) we need.

Step 1: Showing weak duality.

We define \[\begin{aligned}a &= \inf_{x\in E}\{\phi(x)+\psi(x)\}\\ b &= \sup_{f\in E^*}\{-\phi^*(f)-\psi^*(-f)\}.\end{aligned}\] It holds that \(b\leq a\). This is called weak duality.

Proof: Calculation.

Step 1.1

Given arbitrary sets \(M,N\) and functions \(R:M\times N\to \mathbb R\cup \{\infty\}\), it holds that \[\sup_{m\in M}\inf_{n\in N} R(m,n)\leq \inf_{n\in N}\sup_{m\in M} R(m,n)\]

Proof: By definition of supremum and infimum.

Step 1.2

QED.

Proof: \[\begin{aligned} b &= \sup_{f\in E^*}\{{\color{blue}-\phi^*(f)}{\color{darkblue}-\psi^*(-f)}\} \\ &= \sup_{f\in E^*}\{{\color{blue}-\sup_{x\in E}\{\langle f, x\rangle - \phi(x)\}}{\color{darkblue} - \sup_{y\in E}\{\langle -f, y\rangle - \psi(y)\}} \} \\ &= \sup_{f\in E^*} \{\inf_{x\in E}\{\phi(x) - \langle f, x\rangle\} + \inf_{y\in E}\{\psi(x) - \langle f, -y\rangle\} \}\\ &= \sup_{f\in E^*} \inf_{x, y\in E} \{ \phi(x) + \psi(y) - \langle f, x-y\rangle\}\end{aligned}\] We set \(M = E^*\), \(N = E\times E\) and \[R(f, (x,y)) = \phi(x) + \psi(y) - \langle f, x-y\rangle.\] Then we can use Step 1.1 to bound \[\begin{aligned} b &\leq \inf_{x, y\in E} \sup_{f\in E^*} \{ \phi(x) + \psi(y) - \langle f, x-y\rangle\}\end{aligned}\]

Remark

The infimum runs over all pairs \(x,y \in E\). Suppose that \(x\neq y\), then the supremum over the remainder term is \(+\infty\) by choosing any \(\hat f\) such that \(-\langle \hat f, x-y\rangle > 0\), which is guaranteed to exist by (for example) the Hahn--Banach theorem. If we now set \(f = r\cdot \hat f\) and let \(r\to\infty\), we get arbitrarily high values for the term \(\phi(x) + \psi(y) -\langle f, x-y\rangle\). If on the other hand we set \(x = y\), then the term depending on \(f\) vanishes. Hence the term on the right hand side is smaller (because it is smaller than \(-\infty\)) and hence this choice "wins". More rigorously, we write:

\[\begin{aligned} b &\leq \inf \left\{\inf_{x\in E, x\neq y\in E} \sup_{f\in E^*} \{ \phi(x) + \psi(y) - \langle f, x-y\rangle\}, \right. \\ & \quad \left. \inf_{x\in E} \sup_{f\in E^*} \{ \phi(x) + \psi(x) - \langle f, x-x\rangle\} \right\}\\ &= \inf\left\{ +\infty , \inf_{x\in E}\sup_{f\in E^*}\{ \phi(x) + \psi(y) \}\right\}\\ &= \inf_{x\in E}\{ \phi(x) + \psi(y) \} \\ &= a\end{aligned},\] where we could drop the supremum over \(f\) in the penultimate step. This concludes the proof of step 1.

Step 2: What is left to prove?

The following three statements are true:

From now on we can assume that \(a > -\infty\).
In order to prove the theorem, it suffices to prove the existence of a \(g\in E^*\) such that \[a \leq -\phi^*(g) - \psi^*(-g) \qquad \text{(1)}\]

Proof: If \(a = -\infty\), the statement of the theorem holds trivially. If \(\text{(1)}\) holds, then \[\begin{aligned}a \leq -\phi^*(g) - \psi^*(-g) \stackrel{\text{Def.}}\leq b \stackrel{\text{Step 1}} \leq a\end{aligned},\] which proves that all \(\leq\) are actually \(=\).

Remark: Mental imagery and a technicality.

The main imagery to have in mind for this proof is the following: We look at \(\color{blue}\phi\) and \(\color{darkblue}-\psi\) in a joint plot.

Then \[a = \inf_x \phi(x)-(-\psi(x))\] is the infimal distance between those two graphs. Now we bring those two graphs as close together as possible by considering \(\phi\) and \(-\psi + a\).

We will use the Hahn--Banach theorem to separate the and the with some hyperplane \(H=\{[x,\lambda]:~ \langle g, x\rangle + \lambda = \alpha\}\).
Then we will prove that the functional \(g\in E^*\) actually is the element in \(E^*\) such that the supremum in the definition of \(b\) is achieved and fulfills \(a\leq b\), which concludes the proof according to Step 2.

We have a small technical problem: Neither of those two shaded sets is open and so we cannot apply Hahn-Banach directly.
That's why in Step 3 we will apply it to \(\operatorname{int}(\operatorname{epi}\phi)\) instead of \(\operatorname{epi}\phi\) first.

Step 3: Separating

We define \[A = \operatorname{int}(\operatorname{epi}(\phi))\] and \[B = a + \operatorname{sub} (-\psi) = \{[x,\lambda]\in E\times \mathbb R:~ \lambda \leq -\psi(x)+ a\}.\] There exists a closed hyperplane \(H = \{[x,\lambda]:~ \langle f, x\rangle + k\lambda = \alpha\}\) that separates \(A\) and \(B\).

Proof: By the Hahn-Banach theorem.

Step 3.1

\(A\) and \(B\) are non-empty, convex and disjoint.

Proof: Both sets are non-empty by the fact that there is a point of continuity. By convexity of \(\phi,\psi\), the sets are also convex. Assume that the sets are not disjoint, i.e. there is a point \([y,\mu]\in E\times \mathbb R\) which is in both \(B\) and \(A\). The first property implies \(-\psi(y)+a\geq \mu\) and the second property is equivalent to \(\mu > \phi(y)\), which means that \(a > \phi(y)+\psi(y)\), a contradiction to the definition of \(a\).

Step 3.2

\(A\) is open.

Proof: By definition of interior.

Step 3.3

QED.

Proof: By 3.1., 3.2., and application of Hahn-Banach.

Remark: How do we separate the correct two sets?

To fix the technical problem from above we will show in Step 4 that the separation property holds for \(\overline{\operatorname{int}(\operatorname{epi}\phi)}\) as well, then for \(\overline{\operatorname{epi}\phi}\) and finally for \(\operatorname{epi}\phi\). Why this is at all necessary is hard to grasp, as the technicality only arises in infinite dimensions. For a first understanding of the proof you can skip this step and think of the hyperplane \(H\) from Step 3 actually separating \(\operatorname{epi}\phi\) and \(a+\operatorname{sub}(-\psi)\).

Step 4: Separating the correct two sets.

The hyperplane \(H\) also separates \(\operatorname{epi}(\phi)\) and \(B\).

Proof: By slightly obscure set theory math.

Step 4.1

\(H\) separates \(\overline{\operatorname{int}(\operatorname{epi}\phi)}\) and \(B\).

Proof: By continuity of the functional \(f\) in the definition of \(H\).

Step 4.2

\(H\) separates \(\overline{(\operatorname{epi}\phi)}\) and \(B\)

Proof: By set theory magic.

Step 4.2.1

For any given set \(C\): If \(\operatorname{int}(C)\neq \emptyset\), then \(\overline{\operatorname{int}(C)}= \overline{C}\).

Proof: Lemma 1.4 in Brezis

Step 4.2.2

\(\operatorname{int}(\operatorname{epi}\phi) \neq \emptyset\).

Proof: By continuity of \(\phi\).

Step 4.2.3

QED.

Proof: By 4.2.1. and 4.2.2., we have \(\overline{\operatorname{int}(\operatorname{epi}\phi)}= \overline{\operatorname{epi}\phi}\). Then the statement follows from 4.1.

Step 4.3

QED.

Proof: By 4.2. and the fact that \(\operatorname{epi}(\phi) \subset \overline{\operatorname{epi}(\phi)}\)

Remark: How do we construct \(g\)?

Our separation hyperplane is \[H = \{[x,\lambda]:~ \langle f, x\rangle + k\lambda = \alpha\}.\] We would like to recast it in the form \[H = \{[x,\lambda:~ \langle g,x\rangle + \lambda = \beta\}\] (for some \(g\in E^*, \beta\in \mathbb R\)). In order to achieve this, we need to prove that \(k\neq 0\).

Step 5: Constructing \(g\), Pt. 1.

By the separation property of \(H\), we have \[\begin{aligned}\tag{eq1}\langle f,x\rangle + k\lambda &\geq \alpha,\quad\text{ for } [x,\lambda]\in \operatorname{epi}(\phi)\\ \langle f,x\rangle + k\lambda &\leq \alpha,\quad\text{ for } [x,\lambda]\in B. \end{aligned}\] It holds that \(k>0.\)

Proof: This is due to the fact that \(A\) lies "top" and \(B\) is "bottom".

Step 5.1

\(k\geq 0\).

Proof: We set \(x = x_0\) in (eq1) and let \(\lambda \to\infty\).

Remark: Why is \(k\neq 0\)?

If the separation was given by \(\{[x, \lambda]: \langle f, x\rangle = \alpha \}\), then the separation hyperplane would be "vertical", which is incommensurate with the assumption that there is a point \(x_0\in D(\phi)\cap D(\psi)\). This is how we will prove that \(k\neq 0\).

Step 5.2

\(k\neq 0\).

Proof: By assuming \(k = 0\), we obtain a contradiction.

Step 5.2.1

\(\|f\|\neq 0\).

Proof: As we assumed \(k = 0\), this would make the hypersurface \(H\) ill-defined otherwise.

Step 5.2.2

\(\|f\| = 0\).

Proof: Follows from (eq1) and the assumption on the point \(x0\).

Step 5.2.2.1

\(\langle f, x\rangle \leq \alpha\) for all \(x\in D(\psi)\) and \(\langle f, x\rangle \geq \alpha\) for all \(x\in D(\phi)\).

Proof: Follows from (eq1).

Step 5.2.2.2

There is a positive quantity \(\epsilon_0\) such that \(\langle f, x_0+\epsilon_0 z\rangle \geq \alpha\) for all \(z\in B(0,1)\).

Proof: By assumption, \(\phi\) is continuous at \(x_0\) and \(x_0\in D(\phi\)). By 5.2.2.1, and the fact that \(x_0+\epsilon_0 z\rangle \in D(\phi)\), the claim follows.

Step 5.2.2.3

There is a positive quantity \(\epsilon_0\) such that \(\langle f, x_0\rangle \geq \alpha + \epsilon_0 \|f\|\).

Proof: By assumption, \(\phi\) is continuous at \(x_0\) and \(x_0\in D(\phi\)). By 5.2.2.1, and the fact that \(x_0+\epsilon_0 z\rangle \in D(\phi)\), it follows that \(\langle f, x_0+\epsilon_0 z\rangle \geq \alpha\) for all \(z\in B(0,1)\). This can be written as \[\langle f, x_0\rangle \geq \alpha - \epsilon_0 \langle f, z\rangle \text{ for all } z\in B(0,1)\] Taking the supremum over all \(z\in B(0,1)\) yields the claim.

Step 5.2.2.4

QED.

From 5.2.2.1 and the fact that \(x_0\) is (also) an element of \(D(\psi)\), we have \(\langle f, x_0 \rangle \leq \alpha\). Combined with 5.2.2.3, this gives \[ \alpha + \epsilon_0\|f\| \leq \langle f, x_0\rangle \leq \alpha,\] which means that \(\|f\| \leq 0\), implying \(\|f\|=0\).

Step 5.2.3

QED.

Proof: Contradiction by 5.2.1 and 5.2.1. Hence, \(k \neq 0\).

Step 5.3

QED.

Proof: By 5.1. and 5.2.

Step 6: Constructing \(g\), Pt. 2.

We redefine the surface as \(H = \{[x,\lambda]:~ \langle g,x\rangle + \lambda = \beta\}\) with \(g = f/\lambda\) and \(\beta = \alpha/\lambda\). Then \[\begin{aligned}\tag{eq3}\langle g,x\rangle + \lambda &\geq \beta,\quad\text{ for } [x,\lambda]\in \operatorname{epi}(\phi)\\ \langle g,x\rangle + \lambda &\leq \beta,\quad\text{ for }[x,\lambda]\in B. \end{aligned}\]

Proof: From \((eq1)\) with new parametrization.

Step 7: A crucial property of \(g\).

From \((eq3)\), we obtain \[\begin{aligned}\phi^*(-g)&\leq - \beta\\ \psi^*(g)&\leq\beta-a\end{aligned}\]

Proof: \[\begin{aligned}\phi^*(-g) &= \sup \{\langle -g,x\rangle - \phi(x)\}\\&=-\inf\{\langle g,x\rangle + \underbrace{\phi(x)}_{\geq \lambda}\} \\&\leq - \inf \{ \langle g,x\rangle + \lambda\} \text{ for all }\lambda \geq \phi(x)\\&\stackrel{(eq3)}{\leq} -\beta. \end{aligned}\] The statement for \(\psi^*(g)\) is obtained similarly.

Step 8: Closing the deal.

QED

Proof: Combining.

Step 8.1

\(-\phi^*(-g) - \psi^*(g) \geq a\).

Proof: By 7.

Step 8.2

\(-\phi^*(-g) - \psi^*(g) \leq b\).

Proof: By definition of \(b\).

Step 8.3

\(a\leq b\).

Proof: By 8.1. and 8.2.

Step 8.4

QED.

Proof: By 8.3. and 2.