In this problem we're optimizing over variables $X\in \text{PSD}_n$ and $Y\in\mathbb R^{d\times n}$ for some $d\le n$. \begin{align} &\text{Maximize}&&\langle A_0, Z\rangle\\ &\text{Subject to}&&\langle A_i, Z\rangle\le b_i\\ &&&Z=\begin{bmatrix}X & Y^T\\ Y & I \end{bmatrix}\succeq 0\\ &&&\text{Trace}(X)=\Vert Y\Vert_F^2. \end{align}
The Lagrangian is $$L(X,Y; W, \alpha,\lambda)=-\sum_i\alpha_ib_i+\langle A_0-W, Z\rangle + \sum_{i}\alpha_i\langle A_i, Z\rangle + \lambda(\text{Trace}(X)-\Vert Y\Vert_F^2).$$
Then I get that $$g(\alpha, \lambda)=\min_{X,Y}L(X, Y, \alpha, \lambda)=\begin{cases}-\sum_i\alpha_ib_i+\min_{X,Y}\lambda(\text{Trace}(X)-\Vert Y\Vert_F^2)& \text{if } A_0+\sum_i\alpha_iA_i\succeq 0\\ -\infty&\text{otherwise.} \end{cases}$$
Of course, it's the last constraint that's giving me trouble. I tried putting it into a nicer form so that I could possibly include into the PSD constraint in the dual: $$\text{Trace}(X)-\Vert Y\Vert_F^2=\left\langle \begin{bmatrix} X&Y^T\\ Y&0 \end{bmatrix}, \begin{bmatrix} I&-Y^T/2\\ -Y/2&0 \end{bmatrix} \right\rangle=0,$$
but this ends up not helping at all (as far as I can see). Is there a way to find a nice form of the dual of the above program?