I'm trying to understand if a 3D convolution of the sort performed in a conv layer of a cnn is associative. Specifically, is the following true:
$X$ ⊗ ($W$ · $Q$) == ($X$ ⊗ $W$) · $Q$
where ⊗ is a convolution, $X$ is a 3D input to a convolution layer, $W$ is a 4D weights matrix reshaped into 2 dimensions, and $Q$ is a PCA transformation matrix.
To elaborate: say I take my 512 convolutional filters of shape (3x3x512), flatten across these three dimensions to give a (4096x512) matrix $W$, and perform PCA on that matrix, reducing it to say dimensions of (4096x400), before reshaping back into (400) 3d filters and performing convolution. Is this the same as when I convolve $X$ with $W$, and then perform PCA on that output using the same transformation matrix as before?
I know that matrix multiplication is associative i.e. $A(BC)=(AB)C$, and I have found that convolution operations can be rewritten as matrix multiplication. So my question is, if I rewrite the convolution as matrix multiplication, is it associative with respect to the PCA transformation (another matrix multiplication) ?
E.g. does $X'$ · ($W'$ · $Q$) == ($X'$ · $W'$) · $Q$? Where $X'$ and $W'$ represent the matrices necessary to compute the convolution in matrix multiplication form.
To try and figure it out, I looked to see how convolutions could be represented as matrix multiplications, since I know matrix multiplications are associative. I've seen a few posts/sites explaining how 2D convolutions can be rewritten as matrix multiplication using Toeplitz matrices (e.g. https://github.com/alisaaalehi/convolution_as_multiplication/blob/master/Convolution_as_multiplication.ipynb, https://ai.stackexchange.com/questions/11172/how-can-the-convolution-operation-be-implemented-as-a-matrix-multiplication), however I'm having trouble expanding on it for my question.
I've also coded out simple convolutions with a $W$ matrix of 4x3, an $X$ matrix of 4x2, and using sklearn decomp PCA to reduce $W$ to 4x2. If I do this both ways, the output is not the same, leading me to think this kind of associativity does not exist. But how can I explain this with linear algebra?
Can anyone explain whether this is or is not the case, with a linear algebra explanation?