Is $f(x|y)$ always a representative of probability function? Or it can be any function. What should I interpret when seeing this symbol, $f(x|y)$?
For example, I read about MM algorithm and it said...
"Let $\theta(m)$ represent a fixed value of the parameter $\theta$, and let $f(\theta$ | $\theta(m)$) denote a real-valued function of $\theta$ whose form depends on $\theta(m)$."
From this context, because of my weak math background, I don't know what kind of function I should infer here.
Judging by the Wikipedia article, $f(x|x_0)$ represents a memeber from a family of functions, selected by the parameter $x_0$, where $x_0$ is from the domain which $x$ is a member.
In that case particular $f(\theta|\theta(m))$ is a function after the $m$th step of an algorithm. The procedure by which you derive $\theta(m)$ is algorithm dependent.
Using this notation for a simpler algorithm suppose $g(x) = x^2 + x$ and $T(x|x_0)$ be a line that is tangent to $g$ at $x_0$. Then we can set $\theta(0) = 5$ and define $\theta(n+1)$ to be the $x$-intercept of $f(\theta|\theta(n))$.
$$\theta(0)=5 \textrm{ so that } T(\theta|5) = 11\theta - 25$$ Solving for the root $$\theta(1)=\frac{25}{11} \textrm{ so that }T(\theta|\frac{25}{11}) = \frac{61}{11}\theta - \frac{625}{121}$$
Solving for the root, $\theta(2) = \frac{625}{671} = 0.9314$. As you can see when $m$ gets larger we approach the root of $g(x)$. This is Newton–Raphson method.
Note this has nothing to do with MM, but shares the quality of:
I'm having a hard time explaining things in the comments, so I'm going to do it here. Since your StackOverflow account suggests you know your way around Python, I'm going to use it for syntax. Don't take this as condescension, I feel code is more precise sometimes.
As you know functions can have multiple input values. Suppose you have a function
blah(option, x)which calculated a functionblahwhose exact calculation is dependent on an $x$ and an option. (Having 2 arguments is sometimes called having arity 2. I will use this later)A technique known as currying takes an expression
F(a, b)and creates a functionFCurrysuch that for allaandb,F(a, b) == FCurry(a)(b). Let's do that toblah:The function
bCurryencodes a collection of 1-arity functions, selected byoption. Instead of writingblah(option, x)you can typebCurry(option)(x).You might want to think of currying as just an ugly way of hiding multiple arguments, but it has theoretical value (I'm told, but can't provide an example) and like many useful theoretical constructs it has ample metaphorically equivalent structures occurs elsewhere. One of my favorite examples of currying turning up naturally is
LinearSolve[m]in Mathematica, and it leads to reduced wasteful calculations. Noting that Mathematica uses square brackets to call a function, we doIs vastly more efficient than doing
Because work can be reused. Even in parts of programming that have no interest in showing off theoretical swagger currying finds use. In Linux's command line, many complex applications will often pack multiple binaries behind a single command line tool:
These two git commands (
cloneandpush) behave completely differently, using different logic to execute. The first argument isn't so much an argument togitas much as a option to choose the correct subcommand, and then apply it to the remaining arguments. The mental model I am pushing is that a function with arity-2 is sometimes encoding a family of 1-arity functions.And that is what happening with $f(x|x_0)$. Yes you can think of it as simply a function on two arguments, but in some sense $x_0$ is a option which selects the correct "sub-function" from a family of them. In the case of tangent lines of $g$, Let $T(x|x_0)$ be the line tangent to the curve $g$ at $x$-coordinate $x_0$, the function would be
So $T(x|x_0)$ is the same idea as
t(x0)(x).With regard to the issue with $\theta$, the symbol is used in two different ways. Sometimes it is used as the argument in the subfunction. And sometimes it is used to select the sub-function on the $m$th step of the algorithm $\theta(m)$. This is an abuse of notation--it uses theta as both a constant and a sequence. The only value this has is it drives home the fact (as I mentioned before) that the argument variable and the option variable come from the same set.
Could someone chime in here, I'm not sure what's the conceptional gap? Sorry for the rant.