I am reading through Fast Matrix Multiplication by Markus Blaser.
I am trying to prove Lemma 5.3 from page 19. It states the following: For any tensor $T\in \mathbb{F}^{n\times m\times t}$, and any permutation $\pi:\{1,2,3\}\rightarrow \{1,2,3\}$, we have: $$rank(T) = rank(\pi T)$$
In an effort to solve it, I have developed the following chain of equalities:
Let us denote $rank(T)=r$. Then, for $1\leq e_1\leq n, 1 \leq e_2 \leq m, 1 \leq e_3 \leq t$, we have:
$$T_{e_1,e_2,e_3} =_{(1)} (\sum_{j=1}^rt_j)_{e_1,e_2,e_3} =_{(2)} \sum_{j=1}^r(t_j)_{e_1,e_2,e_3} =_{(3)} \sum_{j=1}^r a_{j1e_1}\otimes a_{j2e_2}\otimes a_{j3e_3} =_{(4)} \sum_{j=1}^r a_{j1e_1}\cdot a_{j2e_2}\cdot a_{j3e_3} =_{(5)} \sum_{j=1}^r \prod_{i=1}^{3} a_{je_i} =_{(6)} \sum_{j=1}^r \prod_{i=1}^{3} a_{j\pi(i)e_{\pi(i)}} =_{(7)} T_{e_{\pi(1)},e_{\pi(2)},e_{\pi(3)}} =_{(8)} (\pi T)_{e_1,e_2,e_3}$$
With the following justifications: |transition| Explanation | |--| --- | |1 | Definition of tensor rank | |2 | Definition of tensor | |3 | Translation to a triad of tensor product | |4 | The triad has degree 1, by the definition of tensor rank. We can thus use scalar multiplication. | |5 | Notation | |6 | Scalar multiplication is associative and commutative | |7 | Back-propagation towards a tensor through transitions 1,2,3,4 in reverse | |8 | Definition of a permutation for a vector |
Using this chain, I can say that upon permutation, the triads never change, because elements are just permutating. Thus, a minimal set of $r$ triads such that $T=\sum_{j=1}^r t_j$ remains the same when considering $\pi T$.
But there's something peculiar, and I think wrong here. Let us observe the following consequence from the chain above:
For any tensor $T\in \mathbb{F}^{n\times m\times t}$, and any permutation $\pi:\{1,2,3\}\rightarrow \{1,2,3\}$, we have: $$T_{e_1,e_2,e_3} = T_{e_{\pi(1)},e_{\pi(2)},e_{\pi(3)}}$$
This allows for the following problematic scenario:
Let $n=2,m=3,t=4$, and $\pi'$ be the following permutation: $$\pi'(1)=2, \pi'(2)=3, \pi'(3)=1$$
Then, for $e_1=1,e_2=2,e_3=3$, I get: $$T_{1,2,3} = T_{2,3,1}$$ This cannot be true as $T$ is a general tensor. A permutation should have an effect on the axis, which is not represented here. What am I doing wrong? Maybe transition 7?
I read your referenced paper.
Yes, some problem with (7) and (8). Applying permutation on a tensor defined there is like a kind of generalized transpose, then it would be more intuitive to work with matrix.
For matrix transpose, the permutation $\pi = (12)$ transposes a matrix $\boldsymbol{\mathrm{A}}_{m\times n}$ to be $\boldsymbol{\mathrm{A}}^T,$ i.e. $\pi \boldsymbol{\mathrm{A}} = \boldsymbol{\mathrm{A}}^T.$
Then, the derivation to (7) and (8) should give $\boldsymbol{\mathrm{A}}_{e_1e_2} = (\boldsymbol{\mathrm{A}}^T)_{e_2e_1} = (\pi\boldsymbol{\mathrm{A}})_{e_2e_1}.$
So, in (7), by forcing $e_{\pi(1)}$ to be the first tensor index, $e_{\pi(2)}$ the 2nd, $e_{\pi(3)}$ the 3rd, the tensor is either effectively changed to its "transpose" corresponding to the permutation $\pi$ or if it remains as $T$ there, the permutation is fixed to be this specific $\pi = (1)(2)(3).$
If it is written between (6) and (7) similar steps from (3) to (6) in the reverse order, it would be clearer to see that different orders of the terms in the products $a_{j1}\otimes a_{j2}\otimes a_{j3}$ generally give different tensors, which have the same shape but oriented differently, e.g., it could be $k\times m\times n,$ $k\times n\times m,$ or $n\times n\times k,$ so on. The paper illustrates this using a figure of a vertical brick and its horizontal copy.
Scalar product is abelian but tensor/matrix multiplication is generally not.
$$ \begin{aligned} a_{j1e_1}a_{j2e_2}a_{j3e_3} = (a_{j1}\otimes a_{j2}\otimes a_{j3})_{e_1e_2e_3} &= (a_{j2}\otimes a_{j1}\otimes a_{j3})_{e_2e_1e_3} = a_{j2e_2}a_{j1e_1}a_{j3e_3}\\ a_{j1}\otimes a_{j2}\otimes a_{j3} &\neq a_{j2}\otimes a_{j1}\otimes a_{j3}\\ (k\times m\times n) &\neq (m\times k\times n) \end{aligned} $$
I'd like to give a proof for Lemma 5.3 on p. 20. The author said that the proof is obvious but as I didn't see it obvious, rephrasing a detailed proof shows that he's right.
A permutation is a bijective mapping.
To each triad $t_j = a_{j1}\otimes a_{j2}\otimes a_{j3}$ is a triad $s_j = \pi t_j= a_{j\pi^{-1}(1)}\otimes a_{j\pi^{-1}(2)}\otimes a_{j\pi^{-1}(3)}$ under a given permutation $\pi.$ And for the inverse permutation, $\pi^{-1},$ as $\pi^{-1}(\pi) = (1)(2)(3),$ to each $s_j$ is a $\pi^{-1}(s_j) = \pi^{-1}(\pi t_j) = (\pi^{-1}\pi)t_j = t_j.$ So, as there are at least $R(t)$ triads $t_j\text{'s}$ making up $t,$ there are also $R(t)$ triads $s_j\text{'s}$ making up $\pi t.$ Thus, $R(\pi t)\le R(t).$ As there are at least $R(\pi t)$ triads $s_j\text{'s}$ making up $\pi t,$ there are also $R(\pi t)$ triads $t_j\text{'s}$ making up $t.$ Hence, $R(t)\le R(\pi t).$
Therefore, $R(t) = R(\pi t).$
By definition, $t_j$ is a triad, so is $\pi t_j.$ So, as $\pi t_j$ is a triad, so is $$\pi^{-1}(\pi t_j) = a_{j\pi(\pi^{-1}(1))}\otimes a_{j\pi(\pi^{-1}(2))}\otimes a_{j\pi(\pi^{-1}(3))} = a_{j1}\otimes a_{j2}\otimes a_{j3}= t_j.$$
$$ \begin{aligned} t &= \sum_{j = 1}^{R(t)}t_j\Rightarrow \pi t = \sum_{j = 1}^{R(t)}\pi t_j\Rightarrow R(\pi t)\le R(t).\\ s &= \pi t = \sum_{j = 1}^{R(\pi t)}\pi t_j\Rightarrow t = \pi^{-1}s = \pi^{-1}(\pi t) = \sum_{j = 1}^{R(\pi t)}\pi^{-1}(\pi t_j)\\ &= \sum_{j = 1}^{R(\pi t)}t_j\Rightarrow R(t)\le R(\pi t). \end{aligned} $$
So, $R(t) = R(\pi t).$
For convenience, define the dimension function $$ \begin{aligned} d:\,\{1, 2, 3\}&\rightarrow \{k, m, n\}\\ 1&\mapsto k\\ 2&\mapsto m\\ 3&\mapsto n \end{aligned} $$
From the definition of tensor permutation there, as there are at least $R(t)$ triads $t_j = a_{j1}\otimes a_{j2}\otimes a_{j3},\,1\le j\le R(t),$ for tensor $t,$ then there also exist $R(t)$ triads making up $(\pi t)\in K^{d(\pi^{-1}(1))\times d(\pi^{-1}(2))\times d(\pi^{-1}(3))}$ as $\pi t$ is defined to be the sum of $\pi t_j = a_{j\pi^{-1}(1)}\otimes a_{j\pi^{-1}(2)}\otimes a_{j\pi^{-1}(3)},\,1\le j\le R(t).$ So, $R(\pi t)\le R(t).$
There are also at least $R(\pi t)$ triads making up $s :=\pi t\in K^{d(\pi^{-1}(1))\times d(\pi^{-1}(2))\times d(\pi^{-1}(3))},$ $s = \sum_{j = 1}^{R(\pi t)}s_j,$ where $s_j = b_{j\pi^{-1}(1)}\otimes b_{j\pi^{-1}(2)}\otimes b_{j\pi^{-1}(3)},$ $b_{j\pi^{-1}(1)}\in K^{d(\pi^{-1}(1))},\,b_{j\pi^{-1}(2)}\in K^{d(\pi^{-1}(2))},\,b_{j\pi^{-1}(3)}\in K^{d(\pi^{-1}(3))},\,1\le j\le R(\pi t).$
Then $t = \pi^{-1}(s) = \sum_{j = 1}^{R(\pi t)}\pi^{-1}s_j,$ where $\pi^{-1}s_j = b_{j\pi(\pi^{-1}(1))}\otimes b_{j\pi(\pi^{-1}(2))}\otimes b_{j\pi(\pi^{-1}(3))} = b_{j1}\otimes b_{j2}\otimes b_{j3},$ where $b_{j1}\in K^{d(1) = k},\,b_{j2}\in K^{d(2) = m},\,b_{j3}\in K^{d(3) = n},$ which make up $K^{k\times m\times n}.$ So, $R(t)\le R(\pi t).$
Finally, $R(t) = R(\pi t).$