In Intriguing properties of neural networks on page 9 it says that
If $ W $ denotes a generic 4-tensor, implementing a convolutional layer with $ C $ input features, $ D $ output features, support $ N \times N $ and spatial stride $ \Delta $: $$ Wx = \left\{ \sum_{c=1}^C x_c \star w_{c,d}(n_1 \Delta, n_2 \Delta); d = 1, \ldots, D \right\} \, , $$ where $ x_c $ denotes the $ c $-th input feature, and $ w_{c,d} $ is the spatial kernel corresponding to input feature $ c $ and output feature $ d $, by applying Parseval's formula we obtain that its operator norm is given by $$ \| W \| = \sup_{\xi \in [0, N \Delta^{-1})^2} \| A(\xi) \| \, , $$ where $ A(\xi) $ is a $ D \times (C \cdot \Delta^2) $ matrix whose rows are $$ A(\xi)_d = \left( \Delta^{-2} \widehat{w_{c,d}} (\xi + l \cdot N \cdot \Delta^{-1}); c = 1 \cdot C, l = (0 \ldots \Delta-1)^2 \right) \, , $$ and $ \widehat{w_{c,d}} $ is the 2-D Fourier transform of $ w_{c,d} $: $$ \widehat{w_{c,d}} = \sum_{u \in [0, N)^2} w_{c,d}(u) e^{- 2 \pi i (u \cdot \xi) / N^2} \, . $$
My question is: which Parseval's formula is being referred here and how to use it to derive the equation for $ \| W \| $? I know about Parseval's identity and Parseval's theorem, but I do not see how either of these could be applied.