Does the function written as f(x|y) always refer to probability function p(x|y) (x given y)?

77 Views Asked by At

Is $f(x|y)$ always a representative of probability function? Or it can be any function. What should I interpret when seeing this symbol, $f(x|y)$?

For example, I read about MM algorithm and it said...

"Let $\theta(m)$ represent a fixed value of the parameter $\theta$, and let $f(\theta$ | $\theta(m)$) denote a real-valued function of $\theta$ whose form depends on $\theta(m)$."

From this context, because of my weak math background, I don't know what kind of function I should infer here.

1

There are 1 best solutions below

4
On

Judging by the Wikipedia article, $f(x|x_0)$ represents a memeber from a family of functions, selected by the parameter $x_0$, where $x_0$ is from the domain which $x$ is a member.

In that case particular $f(\theta|\theta(m))$ is a function after the $m$th step of an algorithm. The procedure by which you derive $\theta(m)$ is algorithm dependent.

Using this notation for a simpler algorithm suppose $g(x) = x^2 + x$ and $T(x|x_0)$ be a line that is tangent to $g$ at $x_0$. Then we can set $\theta(0) = 5$ and define $\theta(n+1)$ to be the $x$-intercept of $f(\theta|\theta(n))$.

$$\theta(0)=5 \textrm{ so that } T(\theta|5) = 11\theta - 25$$ Solving for the root $$\theta(1)=\frac{25}{11} \textrm{ so that }T(\theta|\frac{25}{11}) = \frac{61}{11}\theta - \frac{625}{121}$$

Solving for the root, $\theta(2) = \frac{625}{671} = 0.9314$. As you can see when $m$ gets larger we approach the root of $g(x)$. This is Newton–Raphson method.

Note this has nothing to do with MM, but shares the quality of:

  • There is a family of functions on some set $X$. (in our case the real numbers.)
  • You select a member of this family from a value in $X$ (in our case the line which passes through the curve at a real number.)
  • You incrementally select new members of $X$, in our case $\theta(m)$

I'm having a hard time explaining things in the comments, so I'm going to do it here. Since your StackOverflow account suggests you know your way around Python, I'm going to use it for syntax. Don't take this as condescension, I feel code is more precise sometimes.

As you know functions can have multiple input values. Suppose you have a function blah(option, x) which calculated a function blah whose exact calculation is dependent on an $x$ and an option. (Having 2 arguments is sometimes called having arity 2. I will use this later)

A technique known as currying takes an expression F(a, b) and creates a function FCurry such that for all a and b, F(a, b) == FCurry(a)(b). Let's do that to blah:

def bCurry(option):
    def blahHelp(x):
        return blah(option, x)
    return blahHelp

The function bCurry encodes a collection of 1-arity functions, selected by option. Instead of writing blah(option, x) you can type bCurry(option)(x).

You might want to think of currying as just an ugly way of hiding multiple arguments, but it has theoretical value (I'm told, but can't provide an example) and like many useful theoretical constructs it has ample metaphorically equivalent structures occurs elsewhere. One of my favorite examples of currying turning up naturally is LinearSolve[m] in Mathematica, and it leads to reduced wasteful calculations. Noting that Mathematica uses square brackets to call a function, we do

myLinSolver = LinearSolve[m]
myLinSolver[b1]
myLinSolver[b2]
myLinSolver[b3]

Is vastly more efficient than doing

LinearSolve[m, b1]
LinearSolve[m, b2]
LinearSolve[m, b3]

Because work can be reused. Even in parts of programming that have no interest in showing off theoretical swagger currying finds use. In Linux's command line, many complex applications will often pack multiple binaries behind a single command line tool:

$ git clone BLAH
$ git push WHATEVER

These two git commands (clone and push) behave completely differently, using different logic to execute. The first argument isn't so much an argument to git as much as a option to choose the correct subcommand, and then apply it to the remaining arguments. The mental model I am pushing is that a function with arity-2 is sometimes encoding a family of 1-arity functions.

And that is what happening with $f(x|x_0)$. Yes you can think of it as simply a function on two arguments, but in some sense $x_0$ is a option which selects the correct "sub-function" from a family of them. In the case of tangent lines of $g$, Let $T(x|x_0)$ be the line tangent to the curve $g$ at $x$-coordinate $x_0$, the function would be

def g(x):
    return x*x+x

def t(x0):
    m = 2*x0+1
    b = g(x0) - m*x0           
    def line(x):
        return m*x + b
    return line

So $T(x|x_0)$ is the same idea as t(x0)(x).

With regard to the issue with $\theta$, the symbol is used in two different ways. Sometimes it is used as the argument in the subfunction. And sometimes it is used to select the sub-function on the $m$th step of the algorithm $\theta(m)$. This is an abuse of notation--it uses theta as both a constant and a sequence. The only value this has is it drives home the fact (as I mentioned before) that the argument variable and the option variable come from the same set.

Could someone chime in here, I'm not sure what's the conceptional gap? Sorry for the rant.