On page 338 of Williamson & Shmoys's The Design of Approximation Algorithms, the presentation of the ARV algorithm for the sparsest cut over a graph $G(V,E)$ has the following formulation
$$\begin{array}{ll} \text{minimize} & \frac{1}{n^2} \sum\limits_{e=(i,j) \in E} c_e \lVert{v_i-v_j}\rVert^2\\ \text{subject to} & \sum\limits_{i,j \in V: i \neq j } \lVert{v_i-v_j}\rVert^2 = n^2 \\ & \lVert{v_i-v_j}\rVert^2 \leq \lVert{v_i-v_k}\rVert^2 + \lVert{v_k-v_j}\rVert^2, \quad \forall i,j,k \in V \\ & v_i \in {\Bbb R}^n \quad \forall i \in V\end{array}$$
where $c_e$ is the weight of each edge. The authors state that this is a semidefinite (SDP) relaxation of the original problem. Why is this the case?
To be an SDP relaxation the objective must be expressed as
$$\operatorname{tr} (C^T X) = \sum_{i=1}^n\sum_{j=1}^nC_{ij}X_{ij} $$
where $X$ is symmetric and positive semidefinite (that is $X \ge 0)$. In my attempts matrix $C$ is defined as
$C=[C_{ij}] \quad i=1..n,j=1..n$ where $C=\begin{cases} c_e & \text{if } e=(i,j) \in E \\ 0 & \text{otherwise} \end{cases}$.
My concern is with the definition of $X$. My first attempt was to define it as $$ X = [X_{ij}] = [\lVert{v_i-v_j}\rVert^2], $$ but this matrix is not positive semidefinite (it can be shown with a counterexample on $z^TXz \ge 0$ with $z_1z_2 \lt 0$ and $z_i=0$ otherwise).
My second and more promising attempt was to work a bit on the objective using that, $$ \lVert{v_i-v_j}\rVert^2 = \lVert{v_i}\rVert^2 + \lVert{v_j}\rVert^2 -2\langle v_i,v_j \rangle, $$ and define $X$ as $$ X = [X_{ij}] = [ \langle v_i,v_j \rangle ]. $$ This could work if I added the constraint that $\lVert{v_i}\rVert=1 ,\quad \forall i \in V$. In that case the objective would be a linear combination of inner products and $X$ would have a Cholesky decomposition of $X=V^TV \text{ with } V=[v_1...v_n]$.
But the authors explicitly state "the vector $v_i$ are not constrained to be unit vectors". Am I missing something here, should I somehow use the first constraint $\sum_{i,j \in V: i \neq j } \lVert{v_i-v_j}\rVert^2 = n^2$?
You were on the right track with the parametrization $\lVert{v_i-v_j}\rVert^2 = \lVert{v_i}\rVert^2 + \lVert{v_j}\rVert^2 -2\langle v_i,v_j \rangle$ you proposed. Note that using your defintion $X = [X_{ij}] = [ \langle v_i,v_j \rangle ]$, each term $\lVert{v_i-v_j}\rVert^2$ becomes a linear function of $X$: $\lVert{v_i-v_j}\rVert^2 = X(i,i) + X(j,j) - 2 X(i,j)$, so the objective is linear with respect to $X$.
To express all the squared distances in matrix form , define the distance matrix $D\in R^{N\times N}, D(i,j) = \lVert{v_i-v_j}\rVert^2$. Then you get $D = e\,diag(X)^T + diag(X)\,e^T - 2X$, where $e$ is the vector of all ones, and your objective function is $Tr(C^T\,D) = Tr(C^T\,e\,diag(X)^T) + Tr(C^T\,diag(X)\,e^T) - 2 Tr(C^T\,X)$.
Using the cyclic property of the trace, you have that $Tr(C^T\,e\,diag(X)^T) = Tr(diag(X)^T C^T\,e\,)$. Then we can use the property that for (suitable sized) matrices $A$ and $B$ and a vector $c$, $Tr(c^T A \,diag(B)) = Tr(Diag(c^T A)\, B)$, where $diag(X)\in R^N$ is the diagonal of $X$ and $Diag(c)\in R^{N\times N}$ is a diagonal matrix whose diagonal is $c$. Putting it all together, the objective can be reformulated as:
$Tr(C^T\,D) = Tr(Diag(C^T\,e)\,X) + Tr(Diag(C\,e)\,X) - 2 Tr(C^T\,X) = Tr(\hat{C}^T X)$, for $\hat{C} = Diag(C^T\,e) + Diag(C\,e) - 2 C$
and you get the objective of a standard SDP, and using the same tools, you can also express the constraints as linear (in)equalities w.r.t. $X$.