Background: For $f:D\subset\mathbb{R}^d \to \mathbb{R}$ we call $$f^*:D^*\subset\mathbb{R}^d \to \mathbb{R}, ~ y\mapsto \underset{x\in D}{\sup}(\langle y, x \rangle - f(x))$$ the conjugate of $f$, where $D^*:=\{y\in\mathbb{R}^d | \underset{x\in D}{\sup}(\langle y, x \rangle - f(x)) < \infty \}$. From the definition we directly get the Fenchel inequality: $$ \langle y, x \rangle \leq f(x) + f^*(y) $$ If we set $D=\mathbb{R}^d, f(x)=\lVert x \rVert$ for some vector norm $\lVert \cdot \rVert$ and consider the dual norm $ \lVert y \rVert^* = \underset{\lVert x \rVert=1}{\sup}\langle y, x \rangle$ (which in fact is a dual norm thanks to Riesz' representation theorem) we get $$ f^*(y) = \begin{cases} 0, &\lVert y \rVert^* \leq 1 \\ \infty, &\mathrm{otherwise}, \end{cases} $$ and therefore $\langle y, x \rangle \leq \lVert x \rVert$ whenever $\lVert y \rVert^* \leq 1$.
My questions: The author of a paper I'm reading defines something he calls a matrix dual norm by $$ \lVert W \rVert^* := \underset{\lVert u \rVert = \lVert v \rVert = 1}{\sup} u^TWv $$ for some vector norm $\lVert \cdot \rVert$ and $W\in\mathbb{R}^{d\times d}$. He then claims that $$ \langle A, B \rangle_{\mathrm{F}} \leq \lVert A \rVert\qquad (*) $$ by the Fenchel-Young inequality whenever $\lVert B \rVert^* \leq 1$, where $\langle \cdot, \cdot \rangle_{\mathrm{F}}$ is the Frobenius SP and I'm guessing (this is not clearly defined) $\lVert \cdot \rVert$ is the matrix norm induced by the aforementioned vector norm $\Vert \cdot \Vert$.
- How is the defined matrix norm $\lVert \cdot \rVert^*$ a dual norm? I'm only familar with $\mathbb{R}^{d\times d}$ as a Hilbert space when equipped with the Frob. SP, but the dual norm defined here does not seem to coincide with the dual norm induced by the Frobenius SP.
- Since the author does not use the Frobenius norm as dual norm, can we still apply Fenchel-Young to achieve the inequality (*)?
Thanks a lot!
Edit: I just realized that it might be somewhat exaggerated to use Fenchel-Young here, since for all $x \neq 0, y \in \mathbb{R}^d$ with $\lVert y \Vert^* \leq 1$ we have $$ \langle x, y \rangle = \langle x/\lVert x \rVert, y \rangle \lVert x \rVert \leq \lVert y \Vert^* \lVert x \rVert \leq \lVert x \rVert. $$ My second question therefore reduces to whether the following is true: $$ \langle A, B \rangle_F = \langle A/\lVert A \rVert, B \rangle_F \lVert A \rVert \overset{\mathrm{?}}{\leq} \lVert B \Vert^* \lVert A \rVert. $$ I'm pretty sure it is not since by choosing $\lVert \cdot \lVert = \lVert \cdot \lVert_1$ we have $$ \lVert B \Vert^* = \underset{\lVert u \rVert_1 = \lVert v \rVert_1 = 1}{\sup} u^TBv = \underset{i,j}{\max}|b_{i,j}| $$ but choosing $A$ appropriately (one entry per column +/-1, otherwise zero s.t. $\lVert A \rVert_1$ = 1) $$ \langle A, B \rangle_F = \sum_{j=1}^d \underset{i=1, ..., d}{\max} |b_{ij}| > \lVert B \Vert^* $$ Could someone confirm?
If we define the dual norm on $\mathbb{R}^{m\times n}$ as $$ \lVert X \rVert^* = \sup\{\operatorname{Tr}(X^\top Y)~\vert~\lVert Y \rVert \leq 1\} = \sup\{\lvert\operatorname{Tr}(X^\top Y)\rvert~\vert~\lVert Y \rVert \leq 1\}, $$ thus $$ \langle X,Y\rangle_F = \langle X,Y/\lVert Y\rVert\rangle_F\lVert Y\rVert\leq \lVert X \rVert^*\lVert Y\rVert $$
Furthermore, we have an induced dual matrix norm $\lVert A \rVert^*$ equivalently defined as $$ \lVert A \rVert^* = \sup_{\lVert x\rVert^* = 1,\lVert y\rVert=1}x^\top Ay. $$
Proof: From the definition of dual norm on vectors: $$ \lVert z\rVert^* = \sup_{\lVert y \rVert=1}\lvert y^\top z\rvert. $$
Thus $$ \lVert A\rVert^* = \sup_{\lVert x\rVert^*=1}\lVert Ax\rVert^* = \sup_{\lVert x\rVert^*=1,\lVert y\rVert=1}\lvert y^\top Ax\rvert. $$
EDIT 1: To complement the answer a bit, we also have from Fenchel-Young on vectors $$ x^\top A^\top By\leq \lVert Ax\rVert^*\lVert By\rVert \leq \lVert A\rVert^*\lVert B\rVert \lVert x\rVert^*\lVert y\rVert. $$
The dual norm of the spectral norm $\lVert A \rVert_2$ is not itself. It is in fact the so-called nuclear norm, and a proof is available here: https://math.stackexchange.com/a/1145246/443030
$\lVert A\rVert_1$ is the dual of $\lVert A\rVert_\infty$, for matrices, and the inequality still holds.