Looking for (overkill) usages of indicator functions

2.5k Views Asked by At

I am going to give a presentation about the indicator functions, and I am looking for some interesting examples to include. The examples can be even an overkill solution since I am mainly interested in demonstrating the creative ways of using it.

I would be grateful if you share your examples. The diversity of answers is appreciated.

To give you an idea, here are my own examples. Most of my examples are in probability and combinatorics so examples from other fields would be even better.

  1. Calculating the expected value of a random variable using linearity of expectations. Most famously the number of fixed points in a random permutation.

  2. Showing how $|A \Delta B| = |A|+|B|-2|A \cap B|$ and $(A-B)^2 = A^2+B^2-2AB$ are related.

  3. An overkill proof for $\sum \deg(v) = 2|E|$.

9

There are 9 best solutions below

2
On

Whether it's overkill is open to debate, but I feel that the inclusion-exclusion principle is best seen through the prism of indicator functions.

Basically, the classical formula is just what you get numerically from the (clear) identity: $$ 1 - 1_{\bigcup_{i=1}^n A_i} = 1_{\bigcap_{i=1}^n \overline A_i} = \prod_{i=1}^n (1-1_{A_i}) = \sum_{J \subseteq [\![ 1,n ]\!]} (-1)^{|J|} 1_{\bigcap_{j\in J} A_j}.$$

1
On

The Levi-Civita Epsilon and its identities with the Kronecker Delta $\delta_{ij}=\mathbb{1}_{i=j}$ are frequently used in vector calculus.

Formally, in $n$ dimensions, for the $n$-tuple $\sigma= (\sigma_1,\dots,\sigma_n)$ with elements $\sigma_i\in\{1,\dots,n\}$, we define

$$ \epsilon_\sigma = \left\{ \begin{align} (-1)^{\text{sgn}(\sigma)} &\text{ if } \sigma\in S_n\\ 0 \quad \quad&\text{ otherwise } \end{align} \right. $$

For example, in $3$ dimensions, $\epsilon_{123}=\epsilon_{231}=1$ because $\sigma=231$ is an even permutation (and $123$ is the identity), while $\epsilon_{213}=-1$ and $\epsilon_{112}=0$.

To give an example of its versatility, in $3$ dimensions, the simple identity $\epsilon_{ijk}\epsilon^{pqk}=\delta_i^p\delta_j^q - \delta_i^q\delta_j^p$ can be used to summarise $3^4=81$ identities because each of $i,j,p,q$ can take values in $\{1,2,3\}$. (Note: Einstein summation convention is used to sum over $k\in \{1,2,3\}$.

0
On

Indicator functions are often very useful in conjunction with Fubini’s theorem.

Suppose you want to show: $$\newcommand\dif{\mathop{}\!\mathrm{d}} \int_Y \int_{X_y} f(x, y) \dif x \dif y = \int_X \int_{Y_x} f(x,y) \dif y \dif x$$ where the two subsets $X_y \subseteq X$ and $Y_x \subseteq Y$ describe the same relation $x \in X_y \iff y \in Y_x$.

Because of the variable in the inner integral’s domain, you cannot use Fubini right away to permutate the two sums directly.

But you can do it if you use an indicator function to describe the set $$Z = \left\{ (x,y) \in X \times Y \mid x \in X_y \right\} = \left\{ (x,y) \in X \times Y \mid y \in Y_x \right\}.$$

Finally: \begin{align*} \int_Y \int_{X_y} f(x, y) \dif x \dif y & = \int_Y \int_X 1_Z(x,y) f(x,y) \dif x \dif y \\ & = \int_X \int_Y 1_Z(x,y) f(x,y) \dif y \dif x \\ & = \int_X \int_{Y_x} f(x,y) \dif y \dif x. \end{align*}

2
On

This example is not really an overkill, but since it is an interesting use of the indicator function, I think it is worth sharing here. Here I will use $\chi_A$ to indicate the indicator function of the set $A$.

Let $E \subset \Bbb R^n$ be a measurable set, and $f\colon \Bbb R^n \to \Bbb R^m$ be a sufficiently nice function (e.g. Lipschitz, or just continuous, say) that maps $\mathscr L^n$-measurable (Lebesgue measurable) sets to $\mathscr H^n$-measurable (Hausdorff measurable) sets. We are interested in the function $g \colon \Bbb R^m \to \Bbb N \cup \{0\}$ defined as $$ g(y) = \#\left(E \cap f^{-1}(\{y\})\right), $$ which is the number of points in $E$ that is mapped to the point $y \in\Bbb R^m$. We can think of $g$ as the indicator function of $f(E)$ but with multiplicity (Indeed, if $f$ is injective, then $g=\chi_{f(E)}$).

Now, how do we prove that $g$ has nice properties (e.g. that it is measurable, or that its integral on $\Bbb R^m$ can be controlled by the size of $\mathscr L^n(E)$ etc.)? The idea is that we can subdivide $\Bbb R^n$ into small (disjoint) $n$-cubes, and inspect the behaviour of the indicator functions associated with the image of those cubes. In particular, let $$ M^{k} := \left\{ Q | Q = (a_1,b_1] \times \cdots \times (a_n,b_n]\text{ where } \ a_i = \frac{j}{2^k}, b_i = \frac{j+1}{2^k}; j \in \Bbb Z \right\} $$ be the family of dyadic cubes of size $\frac1{2^k}$ in $\Bbb R^n$. It is easy to see that $\Bbb R^n$ is the (disjoint) union of the cubes in $M^k$, i.e. $$ \Bbb R^n = \bigcup_{Q \in M^k} Q. $$

While $\chi_{f(E)}(y)$ tells you if there is any point in $E$ that got sent to $y\in\Bbb R^m$ or not, it cannot "see" how many points. However, if we consider $$ g_k := \sum_{Q\in M^k} \chi_{f(E\cap Q)}, $$ this function $g_k$ can register if distinct parts of $E$ got mapped by $f$ to the same point to a certain extent. It can "stack up" contributions from different cubes, but it still cannot tell if points in the same cube got mapped to the same point in $\Bbb R^m$. However, as $k\to\infty$, the cubes in $M^k$ get smaller and smaller, so the corresponding $g_k$ gives us "better resolution", so to speak.

It is not hard to show that $g_k \to g$ monotonically as $k\to\infty$, hence we can deduce many properties of $g$ from its approximation $g_k$. Fatou's lemma also applies very nicely, so we can "just integrate". This strategy of splitting up our set into finer and finer pieces is essential in proving many fundamental results in measure theory, such as the area/coarea formulas and the change of variables formula. This is because when $f$ is (approximately) differentiable, we understand the behaviour of $\chi_{f(E\cap Q)}$ very well when $Q$ is very small since $f(E\cap Q)$ is essentially an affine distortion of $E \cap Q$, determined by $\nabla f$ on that set.

1
On

One can explicitly compute the distributional derivative of an indicator function to prove the divergence theorem. In particular, if $1_A$ is an indicator function on a bounded, open set $A$, then $$\langle \partial_j 1_A, \varphi\rangle = \int_{\partial A} \nu_j(x) \varphi(x) \,\mathrm dS(x)$$ where $\nu$ is the unit normal.

2
On

Convoluted overkill proof for the vertex degree sum formula

Take the adjacency matrix $A$ of a simple undirected graph $G=(V,E)$: $$A := \big[ 1_E (\{v,w\}) \big]_{v,w=1}^n.$$ Let $x$, $y\in\mathbb R^n$. Then, compute $$y^{\rm T}\!Ax = \sum_{v\in V} \, \sum_{w\in V} y_v A_{v,w} x_w \tag{1}$$ and note that $$y^{\rm T}\!Ax = \sum_{v\in V} y_v \Big( \sum_{w\in V} 1_E (\{v,w\}) \, x_w \Big). \tag{2}$$ But also note that, since $A$ is symmetric and has as much as $2|E|$ non-zero entries, $(1)$ can be rearranged as $$y^{\rm T}\!Ax = \sum_{\{v,w\}\in E} (y_v x_w + y_w x_v). \tag{3}$$ Plugging $y = x = \vec 1$ into $(1)$ has the effect of counting how many non-zero entries $A$ does have: substituting at $(2)$ and $(3)$ and equating them gives $$\sum_{v\in V} \color{blue}{\Big( \sum_{w\in V} 1_E (\{v,w\}) \Big)} = \color{red}{\sum_{\{v,w\}\in E} (\color{green}{1+1})}.$$ Or equivalently, $$\sum_{v\in V} \color{blue}{\deg(v)} = \color{green}{2} \color{red}{|E|}. \tag*{$\square$}$$


Alternative phrasing that doesn’t require the adjacency matrix

Let $f:V\times V\to\mathbb R$, then the double sum over the vertices can be split as $$\sum_{v\in V} \sum_{w\in V} f(v,w) = \sum_{v\in V} \sum_{w\in V,\\ w<v} \big(f(v,w)+f(w,v)\big) +\sum_{v\in V} f(v,v).$$ Then, note that a sum over the edges can be seen as $$\sum_{\{v,w\}\in E} g(v,w) = \sum_{v\in V} \sum_{w\in V,\\ w<v} 1_{\Gamma(v)} (w) \cdot g(v,w)$$ for a suitable $g:V\times V\to\mathbb R$. Here, $\Gamma(v) \subsetneq V$ denotes the set of vertices that are neighbours of $v$, but $1_{\Gamma(v)} (w)$ may as well be thought as $1_E (\{v,w\})$. Now, choosing $f(v,w) := 1_{\Gamma(v)} (w)$ (so that $1_{\Gamma(v)} (w) = 1_{\Gamma(w)} (v)$ and $f(v,v) = 0$) leads to the choice of $g(v,w) = 2$ to relate all three sums: \begin{align} \sum_{v\in V} \color{blue}{ \sum_{w\in V} 1_{\Gamma(v)} (w) } &= \sum_{v\in V} \sum_{w\in V,\\ w<v} \big(1_{\Gamma(v)} (w) + 1_{\Gamma(w)} (v)\big) \\ &= \sum_{v\in V} \sum_{w\in V,\\ w<v} 1_{\Gamma(v)} (w) \cdot 2 \\ &= \color{red}{\sum_{\{v,w\}\in E} } \color{green}{2}.\end{align} Or equivalently, $$\sum_{v\in V} \color{blue}{\deg(v)} = \color{green}{2} \color{red}{|E|}. \tag*{$\square$}$$


Edit: yo, I remembered another one

Let $D := \big[ \delta_{v,w} \cdot \deg(v) \big]^n_{v,w=1}$ be the degree matrix and $A := \big[ 1_E (\{v,w\}) \big]^n_{v,w=1}$ the adjacency matrix for a simple, undirected graph $G=(V,E)$, so that $L:=D-A$ is its Laplacian matrix. Then $$x^{\rm T}\!Lx = \sum_{\{v,w\}\in E} (x_v-x_w)^2.$$

Proof using indicator functions

Note that the $v$th entry of $Lx$ is \begin{align} (Lx)_v &= \deg(v) \, x_v - \sum_{w\in\Gamma(v)} x_w \\ &= \sum_{w\in\Gamma(v)} (x_v-x_w) \\ &= \sum_{w\in V} 1_{\Gamma(v)} (w) \cdot (x_v-x_w). \end{align} Then, \begin{align} y^{\rm T}\!Lx &= \sum_{v\in V} y_v (Lx)_v \\ &= \sum_{v\in V} \sum_{w\in V} 1_{\Gamma(v)} (w) \cdot y_v(x_v-x_w). \end{align} Pick $f(v,w) := 1_{\Gamma(v)} (w) \cdot y_v(x_v-x_w)$ so that $1_{\Gamma(v)} (w) = 1_{\Gamma(w)} (v) = 1_E (\{v,w\})$ and $f(v,v)=0$, then spread the sum over half the indexes: \begin{align} y^{\rm T}\!Lx &= \sum_{v\in V} \sum_{w\in V, \\ w<v} \big( 1_{\Gamma(v)} (w) \cdot y_v(x_v-x_w) + 1_{\Gamma(w)} (v) \cdot y_w(x_w-x_v) \big) \\ &= \sum_{v\in V} \sum_{w\in V, \\ w<v} 1_E (\{v,w\}) \cdot \big( y_v(x_v-x_w) + y_w(x_w-x_v) \big). \end{align} Pick $g(v,w) = y_v(x_v-x_w) + y_w(x_w-x_v) = (y_v-y_w)(x_v-x_w)$ to relate the halved double sum with a single sum over the edges: $$y^{\rm T}\!Lx = \sum_{\{v,w\}\in E} (y_v-y_w)(x_v-x_w).$$ Letting $y=x$ yields the result. $\ \square$

2
On

Here we use the indicator function with a technique introduced in section 3.2 Floor/Ceiling Applications of Concrete Mathematics by R. L. Graham, D. E. Knuth and O. Patashnik. We show the following is valid for $n\in\mathbb{Z}, n>0$: \begin{align*} \color{blue}{\sum_{k=1}^\infty\left\lfloor\frac{n}{2^k}+\frac{1}{2}\right\rfloor=n}\tag{1} \end{align*}

Let $n=\sum_{j=0}^Na_j2^j$ be the binary representation of $n$ with $a_j\in\{0,1\}, 0\leq j\leq N$. We obtain \begin{align*} \color{blue}{\sum_{k=1}^\infty}\color{blue}{\left\lfloor\frac{n}{2^k}+\frac{1}{2}\right\rfloor} &=\sum_{k=1}^\infty\sum_{m=1}^\infty m\cdot1_{\{m\}}\left(\left\lfloor\frac{n}{2^k}+\frac{1}{2}\right\rfloor\right)\tag{2}\\ &=\sum_{k=1}^\infty\sum_{m=1}^\infty m\cdot1_{[m,m+1)}\left(\frac{n}{2^k}+\frac{1}{2}\right)\\ &=\sum_{k=1}^\infty\sum_{m=1}^\infty m\cdot1_{\left[m-\frac{1}{2},m+\frac{1}{2}\right)}\left(\frac{1}{2^k}\sum_{j=0}^Na_j2^j\right)\tag{3}\\ &=\sum_{j=0}^Na_j\sum_{k=1}^{j+1}\sum_{m=1}^\infty m\cdot1_{\left[m-\frac{1}{2},m+\frac{1}{2}\right)}\left(2^{j-k}\right)\tag{4}\\ &=\sum_{j=0}^Na_j\left(\sum_{k=1}^j2^{j-k}+1\right)\tag{5}\\ &=\sum_{j=0}^N a_j\left(\sum_{k=0}^{j-1}2^k+1\right)\tag{6}\\ &=\sum_{j=0}^Na_j 2^j\tag{7}\\ &\,\,\color{blue}{=n} \end{align*} and the claim (1) follows.

Comment:

  • In (2) we introduce a series summing over $m$ and use the Indicator function to get rid of the floor-function.

  • In (3) we use the binary representation of $n$.

  • In (4) we use the linearity of the $\sum$ operator. We also restrict the upper limit of the second left-most sum with $k=j+1$ since other values of $k$ do not contribute.

  • In (5) we observe that $m$ takes the value $2^{j-k}$ iff $1\leq k\leq j$ and $m=1$ if $k=j+1$.

  • In (6) we shift the index by one to start with $k=0$ and we also change to order of summation $k\to j-1-k$.

  • In (7) we use the finite geometric series formula $\sum_{k=0}^{j-1}2^k=2^j-1$ and get the binary representation of $n$.

Note: In the book mentioned, Don Knuth favours the use of Iverson brackets, which can be conveniently used instead of the indicator function.

0
On

The optimization problem $$\min_{x \in S} f(x)$$ where $S \subset \mathbb R^n, f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is equivalent to the unconstrained problem $$\min_{x \in \mathbb R^n} f(x) + (1-1_S(x)) \cdot(+ \infty) $$ here we use the convention $(+\infty) \cdot 0 = 0$. The reduction allows to build duality theory only for unconstrained problems, constrained problem case then comes "for free".

Also, approximating $(1-1_S(x)) \cdot(+ \infty)$ term with more adequate and feasible family of functions is the main idea of penalty methods. Namely: exterior penalty method and interior penalty method.

1
On

Given an integer $n\ge 2$, prove that $$\lfloor \sqrt n \rfloor + \lfloor \sqrt[3]n\rfloor + \cdots +\lfloor \sqrt[n]n\rfloor = \lfloor \log_2n\rfloor + \lfloor \log_3n\rfloor + \cdots +\lfloor \log_nn\rfloor$$

Proof: For any $0\le x\le n$, we have $[x]=\sum_{i=1}^n\chi(x\ge i)$, where $\chi(\cdot)$ is the indicator function.

\begin{align} LHS&=\sum_{j=2}^n\sum_{i=1}^n\chi(\sqrt[j]{n}\ge i)\\ &=\sum_{i=1}^n\sum_{j=2}^n\chi(\sqrt[j]{n}\ge i)\\ &=n-1+\sum_{i=2}^n\sum_{j=2}^n\chi(n\ge i^j)\\ &=\sum_{i=2}^n\sum_{j=1}^n\chi(n\ge i^j)\\ &=\sum_{i=2}^n\sum_{j=1}^n\chi(\log_i^n\ge j)\\ &=\sum_{i=2}^n[\log_i^n]=RHS. \end{align}