How is $W = \frac{1}{2}(W+W^T)$ although $W$ is not necessarily symmetric?

50 Views Asked by At

I am currently studying Bayesian Reasoning and Machine Learning by David Barber, the 4th chapter exercise 4.3 (p 79). The exercise is the following:

Show that for the Boltzmann machine defined on binary variables $x_i$ with $$p(\mathbf{x})= \frac{1}{Z(\mathbf{W},\mathbf{b})}\exp(\mathbf{t}^T\mathbf{W}\mathbf{x}+\mathbf{x}^T\mathbf{b})$$ one may assume, without loss of generality, $\mathbf{W} = \mathbf{W}^T$

Here's the solution from the solution manual manual:

The only place $\mathbf{W}$ eneters is through the expression $\phi \equiv \mathbf{x}^T\mathbf{W}\mathbf{x}$. Since this is a scalar, we can take the transpose, so that $\phi\equiv \mathbf{x}^T\mathbf{W}^T\mathbf{x}$ and $\phi = \mathbf{x}^T\frac{1}{2}(\mathbf{W}+\mathbf{W}^T)\mathbf{x}$. Hence any non-symmetric $\mathbf{W}$ is essentially converted to a symmetric form, so that we may therefore assume this symmetry with loss of generality.

I don't completely understand the solution. I have a couple of questions:

  1. Couldn't the solution just say that since $\mathbf{x}^T\mathbf{W}\mathbf{x}$ is a scalar, then the transpose of a scalar is the same scalar and hence we can write $\mathbf{x}^T\mathbf{W}\mathbf{x}$ as $\mathbf{x}^T\mathbf{W}^T\mathbf{x}$, and we are done. Why would they continue?
  2. I know about the Toeplitz decomposition where any square matrix can be represented as $X = \frac{1}{2}(X+X^T) + \frac{1}{2}(X-X^T)$, where the first summand is a symmetric matrix. But I don't understand why the solution writes out $W$ only in terms of the symmetric matrix. I went through 1 calculation and verified that it's true, but I still don't understand why we can write it only in terms of the first (symmetric) matrix. Am I missing something about $W$? Is it supposed to be symmetric?