Matrices and their operations were always a bit unintuitive to me, and I was wondering why they appear in this formation so often:
$x^T Q x$
where $x$ is a $n$ length vector and $Q$ is a $n$ by $n$ matrix. Is there a reason why this formation is so common (aside from keeping dimensions consistent with matrix multiplication rules)? I've always thought of this as a way to sort of multiply a matrix by the square of a vector, but I've put this hypothesis together only due to there being two $x$'s and one $Q$.
If it is relevant, I am encountering this formation very often while learning about optimization.
The fact that the expression $x^TQx$ occurs so frequently in a variety of fields including optimization is very likely due to its close connection to the Taylor expansion of a real function of $n$ variables.
Let us reestablish this connection. Let $f : \mathbb{R}^n \rightarrow \mathbb{R}$ be a smooth function, let $x \in \mathbb{R}^n$ be a fixed point, let $h \in \mathbb{R}^n$ be a fixed direction and let $\phi : \mathbb{R} \rightarrow \mathbb{R}$ be given by $$ \phi(t) = x + th.$$ Then the function $g : \mathbb{R} \rightarrow \mathbb{R}$ given by $$g = f \circ \phi$$ is differentiable by the chain rule and $$ g'(t) = Df(\phi(t))\phi'(t)$$ where $Df(y)$ is the gradient of $f$ at the point $y$, i.e., the row vector $$ Df(y) = \begin{pmatrix} \frac{\partial f}{\partial x_1}(y), \frac{\partial f}{\partial x_n}(y), \dotsc, \frac{\partial f}{\partial x_n}(y) \end{pmatrix} $$ In particular, we have $$g'(0) = Df(x)\phi'(0) = Df(x)h.$$ We shall continue to differentiate. We have $$g'(t) = \sum_{j=1}^n \frac{\partial f}{\partial x_j}(x+th)h_j$$ and by the chain rule we have $$g''(t) = \sum_{j=1}^n \sum_{i=1}^n \frac{\partial f}{\partial x_i \partial x_j}(x+th)h_jh_i.$$ In particular, we have $$ g''(0) = \sum_{j=1}^n \sum_{i=1}^n \frac{\partial f}{\partial x_i \partial x_j}(x)h_jh_i = h^T Q h$$ where $Q = [q_{ij}] \in \mathbb{R}^{n \times n}$ is the matrix given by $$ q_{ij} = \frac{\partial f}{\partial x_i \partial x_j}(x)$$ Now suppose that we seek information about the function $f$ in the vicinity of a point $x$. It is natural that we consider all possible directions $h$ and the associated functions $g$. We have the usual Taylor expansion $$g(t) = g(0) + g'(0)t + g''(0)t^2 + O(t^3), \quad t \rightarrow 0, \quad t \not =0$$ which in our case translates to $$g(t) = f(x) + Df(x)th + (th)^T Q (th) + O(t^3), \quad t \rightarrow 0, \quad t \not = 0.$$ In the case of optimization we are probably looking at the stationary points of $f$ characterized by $Df(x) =0$ and at these points the term $h^TQh$ dictates the local behavior of $f$.