What is a Local Kernel?

100 Views Asked by At

I'm reading this paper on Machine Learning explainability, which talks about the LIME algorithm on page 2.

To explain the LIME algorithm, it gives an overview of its objective function:

enter image description here

In their description of the function, they say that $\pi_{x'}$ is a "local kernel", I've underlined this in red.

If you read the LIME paper, on page 3, in section 3.2, they say that $\pi_{x'}$ is a proximity measure, which tells you how close a point $z'$ is to the point $x'$.

This makes intuitive sense to me, as $\pi_{x'}$ is used as a weighting for the objective function. But I don't understand how that makes $\pi_{x'}$ a "local kernel".

I understand the kernel of a matrix to be its null space, but that doesn't seem appropriate here. What does a local kernel mean in this situation?

1

There are 1 best solutions below

0
On BEST ANSWER

If you continue reading the first paper you linked, on page 5, they discuss $\pi_{x'}$ again and describe it as a weighting kernel. Keep this in mind.

Next consider the first desirable property that they establish for SHAP-like models:

The first desirable property is local accuracy. When approximating the original model $f$ for a specific input $x$, local accuracy requires the explanation model to at least match the output of $f$ for the simplified input $x'$ (which corresponds to the original input x).

Remember that $\pi_{x'}$ is fed directly into a loss function $L$, which is given by:

$$ L(f, g, \pi_{x'})$$

where $f$ is your original model, and $g$ is your "explanation model". $L$ calculates the difference between the predictions of $f$ and $g$ and records it as a loss.

This is where $\pi_{x'}$ comes in. $\pi_{x'}$ is similar to a matrix's kernel. It's an equation for sample weights that ensures the loss function, $L$, becomes zero when you feed in the sample $x'$.

That's where the term "local kernel" comes from. The equation, $\pi_{x'}$, is local to the sample $x'$, it ensures that the difference at $x'$ is zero, but it doesn't ensure that the difference is zero for every other sample.

For example, if I have two samples $x_1'$ and $x_2'$, and the corresponding kernels $\pi_{x_1'}$, $\pi_{x_2'}$, then $L(f, g, \pi_{x_1'}) = 0$ at $x_1'$, but it is not necessarily equal to zero at $x_2'$.

A general definition could be derived from the kernel of function, which is the set of inputs that a function sends to $0$. A function's kernel does this for all cases, but a "local kernel" only does this around a specific locality.