A book that I am reading states the following theorem:
Theorem. Let $x$ be an element in a normed linear space $X$ and let $d$ denote its distance from the subspace $M (\bar{M}\neq X)$. Then \begin{equation*} d \triangleq \min_{m\in M} \|x-m\|_{M} = \max_{\substack{x^*\in M^\perp \\ \|x\|_{M^*} \leq 1}} <x,x^*>, \end{equation*} where the maximum on the right is achieved for some $x_0^*\in M^\perp$ with $\|x_0^*\|_{M^*} = 1$. If the minimum on the left is achieved for some $m_0\in M$, then $x-m_0$ is aligned with $x_0^*$.
Proof. The dual problem is a lower bound of the primal problem since \begin{equation*} \max_{\substack{x^*\in M^\perp \\ \|x\|_{M^*} \leq 1}}<x,x^*> = \max_{\substack{x^*\in M^\perp \\ \|x\|_{M^*} \leq 1}} <x-m,x^*>\ \leq \ \|x-m\|_M, \qquad \forall m\in M. \end{equation*} To show equality, we have to exhibit a functional $x^* \in M^\perp$ that achieves the value $d$. Let $[x+M]$ be the subspace generated by $x$ and $M$, and consider the functional $f: [x+M]\rightarrow\mathbb{R}$ defined as follows: \begin{equation*} f(\alpha x+m) = \alpha d, \qquad \forall \alpha\in\mathbb{R}. \end{equation*} Then $f$ is a bounded linear functional on $[x+M]$ with induced norm ($\|f\|_M$) \begin{align*} \|f\|_M &\triangleq \sup_{\alpha\in\mathbb{R},m\in M}\frac{|f(x)|}{\|\alpha x + m\|_M}\\ &= \sup_{\alpha\in\mathbb{R},m\in M}\frac{|\alpha|d}{|\alpha|\|x+\frac{1}{\alpha}m\|_M}\\ &= \sup_{m\in M}\frac{d}{\|x+m\|_M} \quad \text{Since M is a subspace}\\ &= \frac{d}{\inf_{m\in M}\|x+m\|_M} = 1. \end{align*} Define $x_0^*$ to be the Hahn-Banach extension of $f$. It follows immediately that $\|x_0^*\|_{M^*} = 1$, and $<x,x_0^*>=d$. The alignment follows from the fact that \begin{equation*} d = \|x-m_0\|_M\|x_0^*\|_{M^*} = <x,x_0^*> = <x-m,x_0^*> \end{equation*} for any minimizing solution $m_0$.
Two questions:
Since this theorem applies to normed linear space (and subsequently I am using it for Banach spaces), how is possible to take inner products, such as $<x,x^*>$. I thought that inner products are only defined on Hilbert spaces? It seems as though the whole theorem falls apart if an inner product is not defined.
Further, how is there a notion of an orthogonal complement $M^\perp$ if there is no inner product?
With normed spaces that are not Hilbert spaces, elements of the dual space are bounded linear functionals that take an element of the normed space and return a real value. The notation $\langle x, x^{*} \rangle$ means the evaluation of the bounded linear functional $x^{*}$ (an element of the dual space) at $x$ in the original normed space.
If it happens that the space we start with is a Hilbert space, then all of the bounded linear functionals $f(x)$ are of the form
$f(x)=\langle x, f \rangle=\mbox{inner product}(x,f)$
for some element $f$ in the Hilbert space. Thus in the Hilbert space case, the notation $\langle x, f \rangle$ corresponds exactly to the inner product of $x$ and $f$.
This choice of notation can be very confusing to students if they've previously seen $\langle x, y \rangle$ for the inner product of $x$ and $y$. Some books reserve $(x,y)$ for the inner product in Hilbert spaces to avoid that confusion.
The definition of orthogonality in a Hilbert space is that two elements of the space, $x$, $y$, are orthogonal if $(x,y)=0$. This extends to orthogonality of $x$ in a Banach space $S$ and $y$ in the dual space $S^{*}$, if $\langle x,y \rangle=0$. If $M$ is a subset of a Banach space $S$, then the $M^{\perp}$ is the set of $y$ in $S^{*}$ such that $\langle x, y \rangle=0$ for every $x$ in $M$. Notice that $M^{\perp}$ lives in the dual space $S^{*}$ rather than in $S$.
See for example Chapter 5 of David G. Luenberger's Optimization by Vector Space Methods.