There exists a neighborhood of the identity matrix such that every matrix admits $n$-th root

299 Views Asked by At

Proves that there exists a neighborhood $V$ of the identity matrix such that every matrix $Y \in V$ admits $n$-th root, that is, there exists a matrix $X$ such that $X^n = Y$.

In the above question I know it is true that there exists a neighborhood $V$ of the identity matrix so that every matrix $Y \in V$ is non-singular by means of the function $det$ which is continuous (here: Neighborhood of the identity matrix ). But I can not make the connection between the array be singular and admit an nth root.

To solve the question in general I'm trying to define a function to apply the Implicit Function Theorem. Any suggestions please?

2

There are 2 best solutions below

2
On BEST ANSWER

I will show this using the inverse function theorem. Let $f:M_n(\mathbb{R})\to M_n(\mathbb{R})$ with $f(X)=X^k$, then, noting that every matrix commutes with the identity, you can apply Newton's binomial fórmula to the following: $$ f(I+H)=(I+H)^k=\sum_{i=0}^k\binom{k}{i} H^i=I+k\cdot H+\sum_{i=2}^k\binom{k}{i} H^i$$

By definition of derivative (https://en.wikipedia.org/wiki/Derivative#Total_derivative,_total_differential_and_Jacobian_matrix), if $f$ is differentiable, $df_I:M_n(\mathbb{R})\to M_n(\mathbb{R})$ (or as denoted in the webpage, $f'(I)$) is the (unique) linear tranformation such that, for any norm $\left|\cdot \right|$ in $M_n(\mathbb{R})$: \begin{equation} \frac{\left| f(I+H)-f(I)-df_I(H)\right|}{\left|H\right|}\overset{\left|H\right|\to 0}\longrightarrow 0. \end{equation}

As a candidate for $df_I(H)$ we can choose the linear term in $H$, that is, $k\cdot H$. Using the norm for matrix given by $$\left|A\right|=\sup_{\left|v\right|=1}\left|Av\right|$$ we can conclude that $\left|AB\right|\leq \left|A\right|\left|B\right|$ and this implies $\left|A^k\right|\leq \left|A\right|^k$ (for more details https://en.wikipedia.org/wiki/Matrix_norm#Matrix_norms_induced_by_vector_norms). Then, using multiple triangle inequality for norms, we can write: \begin{equation} \frac{\left| f(I+H)-f(I)-k\cdot H\right|}{\left|H\right|}=\frac{\left| \sum_{i=2}^k\binom{k}{i} H^i\right|}{\left|H\right|}\leq\frac{ \sum_{i=2}^k\left|\binom{k}{i}\right| \left|H^i\right|}{\left|H\right|}\leq \sum_{i=2}^k\binom{k}{i}\frac{\left|H\right|^i}{\left|H\right|} \end{equation} Rewriting and putting $i=j+1$, we have $$\sum_{i=2}^k\binom{k}{i}\left|H\right|^{i-1}=\sum_{j=1}^k\binom{k}{j+1}\left|H\right|^j$$ that is a polinomial depending of $\left|H\right|$, then converges to $0$ as $\left|H\right|\to 0$.

Then, by uniqueness of $df_I$, we can conclude that $df_I(H)=k\cdot H$, that is an isomorphism from $M_n(\mathbb{R})$ to itself. By inverse theorem function, there is a neighborhood $U,V\subset M_n(\mathbb{R})$ containing $I$ and $f(I)=I$, respectivelly, and a $C^\infty$ function $g:V\to U$ such that $g$ is the inverse of $f$ in $U$. Therefore, putting $X=g(Y)$, then $X^k=Y$.

1
On

In general, the implicit function theorem states the following:

Let $E_1,E_2,E_3$ be real Banach spaces (in particular finite-dimensional vector spaces are Banach spaces). Let $f:E_1 \times E_2 \to E_3$ be a $C^1$ map. Suppose that at a point $(\alpha, \beta) \in E_1 \times E_2$, it is true that $f(\alpha, \beta) = 0$, and $\partial_2f_{(\alpha,\beta)}$ is an invertible element of $L(E_2, E_3)$. Then, there exists an open neighbourhood $V \subset E_1$ of $\alpha$, and an open subset $W \subset E_2$ of $\beta$, such that for every $\xi \in V,$ there exists a unique $\eta \in W$, which satisfies $f(\xi,\eta) = 0$.

(There's also more which can be said, but for this question, that's all we need)

In your particular case, let $E_1=E_2=E_3$ be the vector space of $k \times k$ matrices with entries in $\Bbb{R}$; let's denote this space by $E$. Now, define the map $f: E \times E \to E$ by \begin{equation} f(Y,X) = X^n - Y \end{equation} It is clear that $f(I_k, I_k) = 0$. Also, $\partial_2f_{(I_k,I_k)}: E \to E$ is the linear map defined by the rule $A \mapsto nA$ (I'll prove this later). This is clearly invertible, with inverse defined by $B \mapsto \dfrac{1}{n}B$. Hence, all the hypotheses of the theorem are satisfied.

With this, we can conclude that there is an open set $V \subset E$ containing $I_k$, and an open set $W \subset E$ containing $I_k$, such that for each $Y \in V$, there is a unique $X \in W$ such that \begin{align} f(Y, X) &= 0, \end{align} or equivalently, $Y=X^n$. This is precisely what we wanted to prove.


Now, when computing $\partial_2f_{(I_k,I_k)}: E \to E$, we just have to keep in mind what this means. This is the linear transformation we get when we fix the first argument of $f$ to be $I_k$, and we vary the second argument of $f$ around $I_k$. in other words, if we define $g: E \to E$, \begin{equation} g(X) := f(I_k,X) = X^n - I_k \end{equation}

then by definition, we have \begin{equation} \partial_2f_{(I_k,I_k)} = dg_{I_k} \end{equation}

So, what we have to do is compute $dg_{I_k}$. Since we know $g$ is differentiable at $I_k$, the derivative can be computed using the directional derivative formula: \begin{equation} dg_{I_k}(A) = \dfrac{d}{dt}\bigg|_{t=0} g(I_k + tA) \end{equation} So, to do this, we just need a formula for $g(I_k + tA)$. In this case, it is easy: \begin{align} g(I_k + tA) &= (I_k + tA)^n - I_k \\ &= \sum_{r=0}^n \binom{n}{r}(I_k)^{n-r} (tA)^{r} - I_k \tag{*}\\ &= \sum_{r=0}^n \binom{n}{r} t^r A^r - I_k \\ &= I_k + t(nA) + \dots - I_k \\ &= t(nA) + \dots \end{align} In (*) I made use of the fact that since $I_k$ commutes with $tA$, we can apply the standard binomial expansion formula which you learn in high school for real numbers (to prove this rigorously, use induction). Also the $\dots$ refer to terms with $t^2$ or higher powers of $t$. We can ignore them, because when differentiating wrt $t$, after we evaluate at $t=0$, those higher order terms vanish. Hence, it follows that \begin{equation} dg_{I_k}(A) =\dfrac{d}{dt}\bigg|_{t=0} g(I_k + tA) = \dfrac{d}{dt}\bigg|_{t=0} (t(nA) + \dots) = nA + 0 = nA \end{equation} This is precisely what I claimed above.


Alternatively, $X^n$ is the product of $n$ copies of $X$, so we have by the "generalised product rule" \begin{align} dg_{X}(A) = A X^{n-1} + X A \cdot X^{n-2} + X^2 A X^{n-3} \dots X^{n-2} A X + X^{n-1} A \tag{**} \end{align}

This is the formula for the derivative of a multilinear function of $n$-variables. It might make more sense if we abuse notation, and write: \begin{align} d(X^n) = dX \cdot X^{n-1} + X \cdot dX \cdot X^{n-2} + \dots X^{n-2} \cdot dX \cdot X + X^{n-1} \cdot dX \end{align} The formula above should look a lot like the formula $d(uv) = du \cdot v + u \cdot dv$ you learn in elementary calculus for the product rule, except it has n terms.

Here $dX$ is interpreted to mean the linear map $A \mapsto A$, with this understanding we recover $(**)$ again. Now, notice that if in $(**)$, we substitute $X=I_k$, we get the formula above, namely \begin{align} dg_{I_k}(A) = nA. \end{align}

Hopefully this alternative look at how to obtain the derivative $\partial_2f_{(I_k,I_k)} = dg_{I_k}$ has been helpful.


So the key in this problem is really to see to what spaces and function to apply the implicit function theorem, and then to be really comfortable with differential calculus on normed spaces (at first sight it may seem very daunting to differentiate such a beast, but with some practice, it becomes just as natural as regular single variable differentiation).