A lot of classes shows that Harris corner detector is actually PCA (example, page 5), but I'm having trouble with that since Harris doesn't have mean subtraction in the algorithm (not in the paper and not in the OpenCV implementation).
This means that if, for example, there is a part of the image which is a slanted plane in both $x$ and $y$ directions, then Harris can recognize it as a corner since $\Sigma I_x^2$, $\Sigma I_y^2$ and $\Sigma I_xI_y$ are all large.
However, with PCA, since we do mean subtraction, the covariance matrix would be small in all elements.
So what don't I get in this nuance of Harris?
The structure tensor is the locally averaged outer product of the gradient with itself:
$$\sum \nabla I \nabla^T I = \left[\begin{array}{r} \sum {I_x}^2 & \sum I_xI_y\\ \sum I_xI_y & \sum{I_y}^2 \end{array}\right]$$
When averaging the gradient directly, the opposing vectors at each side of a line will cancel out. One approach to avoid this cancelling is doubling the angle (such that vectors pointing in opposing directions become identical). This is a mapping of $\mathbb{R}^2 \to \mathbb{R}^2$. After applying this mapping, however, vectors originally at 90 degrees will cancel out when averaged. The structure tensor is a different mapping $\mathbb{R}^2 \to \mathbb{R}^3$, namely $(I_x,I_y) \to (I_x^2,I_y^2,I_xI_y)$, which also allows averaging vectors. The first two components of the mapping accomplish approximately the same thing as doubling the angle. The third component, $I_xI_y$, preserves the information about the spread in angles within the set of averaged vectors.
In short, the structure tensor is not the covariance matrix of the gradient in a neighborhood, though it has similarities to it. And so the eigen analysis of the structure tensor is not a PCA of the gradient. However, I do understand why some people like to bring up PCA when explaining Harris. I just wish they didn't claim equality.