Casting convolution to cosine similarity

436 Views Asked by At

I am trying to cast convolution to cosine similarity for one experiment, and I am having trouble implementing it efficiently, it makes sense in my head, but I can't seem to make the math work. The idea is to take the cosine distance of each pixel in the image to each kernel/image.

I have a image tensor, X, with dimensions = [B, D, H, W] I have the weight/kernel/filter tensor, Y, with dimensions = [N, D, 1, 1]

The convolution of the two gives the output whose dimension is = [B, N, H, W]

Cosine similarity is computed as: $\frac{X.Y}{||X||.||Y||}$

The convolution takes care of the numerator (X.Y), since all N, 1x1xD filters are slid across the input.

But how do I do the normalization/denominator? If I norm the X and Y tensors manually, then: X takes shape = [B, 1, H, W] Y takes shape = [N, 1, 1, 1]

I can't multiply the two since N != B. Is this even possible?

1

There are 1 best solutions below

0
On

Never mind, I figured it out. I needed to convolve the two norms as well.

Since ||X|| has shape = [B, 1, H, W] and ||Y|| has shape = [N, 1, 1, 1], convolving the two gives a tensor of shape [B, N, 1, 1] which can directly be used to normalize the dot product output of shape [B, N, H, W].