Structured efficient linear layers - ACDC - Diagonal matrices scale signal

75 Views Asked by At

I'm reading this paper:

Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas. ACDC: A Structured Efficient Linear Layer.

http://arxiv.org/abs/1511.05946

When they write equation (6) for deep SELLs,

$$ \Phi = \prod_{k=1}^{K} \mathbf{A}_k \mathbf{F} \mathbf{D}_k \mathbf{F}^{-1} $$

then they say that $\mathbf{A}$ is diagonal and scales the signal in real domain and $\mathbf{D}$ is diagonal and scales it in the Fourier domain. Given that I have little knowledge in signal processing, what does this sentence mean?

1

There are 1 best solutions below

1
On BEST ANSWER

Here I let all vectors be row vectors as in the paper, and all vectors have size $n$.

Multiplying a row vector by a diagonal matrix is the same as componentwise multiplying the vector by another vector. So $\mathbf{xA}$ is just the vector $(x_1 A_{11}, \ldots, x_n A_{nn}).$

$\mathbf{F}$ is the Fourier transform ($F_{ij} = e^{-2\pi\sqrt{-1}ij/n}$). One of the things to know is that convolution $\mathbf{z} = \mathbf{x}*\mathbf{y}$ (defined by $z_i = \sum_{j=0}^{n-1} x_j y_{i-j\operatorname{mod}n}$) is the same as componentwise multiplication of the respective Fourier transforms: $$(\mathbf{x}*\mathbf{y})\mathbf{F} = \mathbf{xF} \odot \mathbf{yF},$$ where $\odot$ is componentwise multiplication. Another way to write it is $$\mathbf{x}*\mathbf{y} = (\mathbf{xF} \odot \mathbf{yF})\mathbf{F}^{-1}.$$ Now let $\mathbf{D}$ be a diagonal matrix, with entries taken from $\mathbf{yF}$. Then as I said first, componentwise multiplication is the same as multiplication by a diagonal matrix, so we can rewrite the convolution as $$\mathbf{x}*\mathbf{y} = \mathbf{xFDF}^{-1}.$$

So to summarise, when you write $\mathbf{AFDF}^{-1}$ with $\mathbf{A}$ and $\mathbf{D}$ diagonal, what it's really saying is that it's just componentwise multiplication ($\mathbf{A}$) followed by a convolution ($\mathbf{FDF}^{-1}$).