What does it mean to convolve matrices of finite dimension?

361 Views Asked by At

If one is given two matrices $I$ and $K$ what does the notation:

$$ I * K $$

mean rigorously/precisely? I do know the definition of convolution:

$$ s[i, j] = (I * K)[i, j] = \sum_m \sum_n I[m,n] K[i - m, j - n]$$

However, I have some issues/doubts that I can't resolve between the two. First is that the definition of the convolution is in terms of functions rather than matrices or vectors. So I am having some difficulty understanding what $ I * K $ should mean if I am given two matrices alone. This seems to be common in convolutional neural networks for example and its confusing me.

I am trying to come up with an example to make this concrete and not confusing.

For example, if I had: $$ I= \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ \end{bmatrix} $$

and I had

$$ K= \begin{bmatrix} 1 & 2 \\ 5 & 6 \end{bmatrix} $$

what would a convolution between the two mean? What type of mathematical object should it return? Would it be using the usual equation for convolution as I defined above with $s[i,j]$? Some of the issue I have with this is that the negative sign in the convolution is suppose to reflect around the y-axis, but with finite structures I don't think that makes sense. There is also a translation and the indices are discrete and I am not sure what translating means in this context. Just making cyclic shifts doesn't make sense to me as translation. All of those make sense with functions (and infinite dimensional vectors) to me, but I am not sure it makes sense to me in this context. Actually, after some though, I think one of the things that bother me the most about convolutions with finite vectors (that is NOT an issue with inner products) is this translation thing. The thing is that for example, for finite vectors, translating doesn't make sense to me because they are not indexed on a real line but with inner products, this issue never arises because we only compute products of corresponding elements!

Also, if anyone can make explicit what the result would be of $ I * K$ with the specific example I provided, it would be very helpful. I suspect that the result would a matrix, but if one can show what the matrix is (and possible a few of its entries explicitly computed, it would be tremendously useful).


For reference, here is where I saw this type of operations/notation being used:

http://people.idsia.ch/~masci/papers/2011_icann.pdf

and the section that uses this type of notation:

enter image description here

1

There are 1 best solutions below

2
On

For this answer, my matrix elements will be indexed starting from 0. And matrices can be viewed to extend to all sides infinitely filled with $0 $.

An easy example would be the matrix with only one one, analogous to an impulse function: $$I = \begin{bmatrix}1\end{bmatrix}$$

Then the convolution $I*K$ will be an identity operation: $$(I*K)[i,j] = \sum_{m=0}^0\sum_{n=0}^0I[m,n]\ K[i-m,j-n] = K[i,j]$$


Back to your more complicated example: $$ A= \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ \end{bmatrix},\quad B= \begin{bmatrix} 1 & 2 \\ 5 & 6 \end{bmatrix} $$

$A*B$ can be viewed as applying the "effect" of $B$ to every element of $A$. Or as the aggregated response from every element of $A$, if every element of $A$ undergoes the same transformation by $B$. The roles $A$ and $B$ can be reversed, as convolution is still commutative here.

As an example, I would like to find the value of $(A*B)[1,1]$. First, look at the $A[0,0]$ element. Imagine that $B$ is placed at that location:

$$A*B = 1\cdot\begin{bmatrix} 1&2&0&0&\cdots\\ 5&\underline6&0&0&\cdots\\ 0&0&0&0&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+\cdots$$

This is the response by one of the elements of $A$. Including the next few on the first row of $A$: $$= 1\cdot\begin{bmatrix} 1&2&0&0&\cdots\\ 5&\underline6&0&0&\cdots\\ 0&0&0&0&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+ 2\cdot\begin{bmatrix} 0&1&2&0&\cdots\\ 0&\underline5&6&0&\cdots\\ 0&0&0&0&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+ 3\cdot\begin{bmatrix} 0&0&1&2&\cdots\\ 0&\underline0&5&6&\cdots\\ 0&0&0&0&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+\cdots$$

I hope you get the basic idea for the first row. Then for the second row,

$$\cdots + 5\cdot\begin{bmatrix} 0&0&0&0&\cdots\\ 1&\underline2&0&0&\cdots\\ 5&6&0&0&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+ 6\cdot\begin{bmatrix} 0&0&0&0&\cdots\\ 0&\underline1&2&0&\cdots\\ 0&5&6&0&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+ 7\cdot\begin{bmatrix} 0&0&0&0&\cdots\\ 0&\underline0&1&2&\cdots\\ 0&0&5&6&\cdots\\ \vdots&\vdots&\vdots&\vdots&\ddots \end{bmatrix}+\cdots$$

The only responses that will affect the resultant element $[1,1]$ are all listed, so the value at $[1,1]$ will be $$\begin{align*}(A*B)[1,1] &= \sum_{m=-\infty}^{+\infty}\sum_{n=-\infty}^{+\infty}A[1-m,1-n]\ B[m,n]\\ &= \sum_{m=0}^1\sum_{n=0}^1 A[1-m,1-n]\ B[m,n]\\ &= A[0,0]\ B[1,1] + A[0,1]\ B[1,0] + A[1,0]\ B[0,1] + A[1,1]\ B[0,0]\\ &= 1\cdot6 + 2\cdot5 + 5\cdot2 + 6\cdot1 \end{align*}$$

And if I keep doing this, then the third row of $A$ will also affect the fourth row, and the fourth column will affect the fifth column, which are zero in $A$. Hence the resultant matrix is usually larger. But if desired, the result can also be clipped at or wrapped around the matrix boundary.


It is possible to lift the index constraint. It is perfectly fine to have a $3\times3$ matrix with origin at the centre; as another example, this matrix: $$C=\frac19\begin{bmatrix} 1&1&1\\1&1&1\\1&1&1 \end{bmatrix}$$ considers the effect of an element and all of its neighbours, and hence a convolution with $C$ performs uniform averaging.


Either one of flipped convolution or non-flipped convolution (otherwise known as cross-correlation) may seem intuitive to you, and I feel that is normal. From my own words, that represents two ways of approaching the problem:

  • Flipped: For each of the input elements, if I apply the same response function/matrix, what is the output?
  • Not flipped: For each of the output elements, how is it composed as a weighted average of its surrounding input elements?

Added: What I wrote is how the idea of convolution may have come up: if the operation $(*B)$ is viewed as applying shifted and scaled $B$ for every element of input $A$, and take the aggregated result, then the output is the convolution $A*B$. Notice in this basic definition, there is no flip for $B$, because this accounts for the effect of each input element.

This definition logically becomes the common definitions (that have a flip as what OP described) defining how each output element can be calculated, no matter whether for convolution between two continuous-domain functions, two discrete-domain functions or two matrices.

This can be seen from the above $A*B$ example; again consider $A$ and $A*B$ as a pair of input and output. For each output element $[i,j]$, it has the effects:

  • from the northwestern input ($A[i-1,j-1]$) by the southeastern element of $B$ ($B[1,1]$)
  • from the northern input ($A[i-1, j]$) by the southern element of $B$ ($B[1,0]$)
  • from the western input ($A[i, j-1]$) by the eastern element of $B$ ($B[0,1]$)
  • from the central input ($A[i, j]$) by the central element of $B$ ($B[0,0]$)

This introduces the apparent flip of $B$. And if the matrix $B$ is larger, then every pair of elements $A[m,n]$ and $B[p,q]$ will contribute to $(A*B)[m+p,n+q]$. This is how the following similar definitions are derived:

$$(A*B)[i,j] = \sum_{m}\sum_{n} A[m,n]\ B[i-m,j-n]\\ (f*g)(t) = \int_{\tau} f(\tau)\ g(t-\tau)\ d\tau$$