Below is the schematic diagram from a very popular paper in this field of retinotopy.
In the diagram above, $g$ is a bivariate Gaussian function of the form
$$ g (x,y) = \exp \left( - \left( \frac{ (x - x_0)^2 + (y - y_0)^2}{2 \sigma^2} \right) \right) $$
Initially, we multiply the functions $g$ and $s$ and then sum over $x$ and $y$,
$$ r (t) = \sum_{x,y} s (x, y, t) g (x,y) $$
Then, we compute the convolution of $r$ and $h$ (the hemodynamic response function (HRF)),
$$ p (t) = (r \ast h) (t) $$ Question:
My main problem is in the following part of the above diagram:
As per the above snippet, at any time $t$, the function $r(t)$ represents a $2$-dimensional slice in spatial domain, though I may be wrong. But if I am right, how can we convolve $r$ with $h$?
Or could it be that the function $r(t)$ is representing a single value which is basically the area under the pizza-slice?
I think that $r(t)$ can be written as $r(x,y,t)$. Is it so? If so, it simply means that at any time $t$, the function $r(x,y,t)$ represents a $2$-dimensional matrix and the HRF function represent a single scalar value. But the result of convolving a 2D matrix with a single scalar value would not be a single value.
I need some explanation how are we constructing the function $p(t)$ by convolving $r(t)$ with HRF?

