I am reading this paper and came across an equation I don't really understand. I learned the maximum likelihood method last semester.
However the functions that I saw always contained the parameter $\Theta$. Then I took the log of the function, derived it and solved for $\Theta$.
I don't understand what parameter they are trying to find. This is the equation (it is on page 5 of the paper):

According to the paper ${\hat y}_i$ and ${\hat \sigma}$ belong to a class of functions parameterized by $\theta$. So, $\theta$ is the vector of all parameters in ${\hat y}_i$ and ${\hat \sigma}$. For instance, if you consider a deep NN, $\theta$ contains all of the elements of $W_i$ and $b_i$ ($i \in \{1,2,\cdots,L\}$) in the expression above your equation.