Some low hanging fruit here for you, dear reader:
Can someone please help shed light on the following snippet, boxed in red?
I don't see how the Gaussian distribution is monotonic in $\mu$ or $\sigma^2$. But it seems like a pivotal idea in the derivation of the Maximum Likelihood Estimator. Thanks in advance!
^snippet from Optimal Estimation of Dynamic Systems, 2nd Ed. Crassidis & Junkins. $\S 2.5$

This excerpt is, in my opinion, poorly written, confusingly presented, and lacking in rigor.
First, we do not need any specific distributional assumptions to motivate the maximization of the log-likelihood. Rather, we can appeal to more general regularity requirements. To keep the discussion simple, if $f : S \to \mathbb R^+$ is a differentiable function with support $S \subseteq \mathbb R$, then the critical points of $f$ satisfying $f'(x) = 0$ are also critical points of $g = \log f$, since $$g'(x) = \frac{f'(x)}{f(x)},$$ and whenever $f > 0$ (which is the case on the support), the RHS is well-defined.
The text in the red box really makes no sense. It reads like someone is just regurgitating something they heard, with no justification. The only way I can think of interpreting the statement that a Gaussian density is "monotone" in the variance, for instance, is that the likelihood $$\mathcal L(\sigma^2 \mid \tilde y, \mu) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\sum (y_i - \mu)^2/(2\sigma^2)}$$ has a nonnegative derivative with respect to $\sigma^2$; i.e., $\mathcal L(\sigma_1^2) \le \mathcal L(\sigma_2^2)$ for all $0 < \sigma_1^2 \le \sigma_2^2$. And this is simply false: treating $\tilde y, \mu$ as constants and letting $v = \sigma^2$, then the function $$h(v) = e^{-C/v^2}/v$$ has derivative $$h'(v) = \frac{2C-v}{2v^{5/2}} e^{-C/t}$$ which has an obvious critical point at $v = 2C$, and $h'(v) > 0$ for $0 < v < 2C$, and $h'(v) < 0$ for $v > 2C$, where $C$ is some function of $\tilde y$ and $\mu$ that is constant with respect to $v$. Even if the claim in the red box were true, it is not at all related to the aforementioned justification that we can take the logarithm of the likelihood to find local maxima.
Finally, I would also like to point out that
is very sloppy. The likelihood function is generally written with the unknown parameters first, then the vertical line, then followed by the data and any known parameters, as I illustrated above. This is because the symbol $\mid$ represents a conditional statement in which quantities preceding it are conditioned upon quantities following it. Writing $p(\tilde{\bf y} \mid \bf x)$ means that $\bf x$, the parameters, are known or given, hence $p$ is a (probability) function of the data $\tilde{\bf y}$ given those parameters. The same mistake is repeated in Equation $(2.135)$. Moreover, this equation is incorrect if the data have a nontrivial correlation structure--that is to say, the $y_i$ are not independent.
I believe the above should provide sufficient rationale as to why I claim the excerpt is poorly written.