How exactly do tensor products correspond with "tensors" as understood in tensor calculus & mathematical physics?

1.2k Views Asked by At

This is a mathematical construct that I've found rather hard to construct a more intuitive description of than what is usually given - heck, I've sooner with enough digging found what I'd consider a more intuitive description for topological spaces that would explain the formalism behind them better, in terms of "rulers" - see Dan Piponi's answer to this mathoverflow question:

https://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets

but the tensor product is one of those things that still seems to resist my attempts. You see, what I'd like to say is that the tensor product of two vector spaces "is the space of all tensors on the two spaces" which, of course, begs the question of just what a "tensor" is. And the definition I have in mind is that a tensor is a bilinear functional on the two vector spaces, which gels with the usual ideas from how they are applied in physics and engineering contexts.

However, I find that a mathematical tensor product which, despite what I've said, feels in other regards like a more "elegant" construction, is hard to relate to this. In particular, I've found that a tensor product is actually not isomorphic in general to the space of bilinear functionals, but rather to the dual of that space, which means that for infinite-dimensional vector spaces, there is a real difference - namely, there must be strictly more tensors than there are bilinear functionals. Moreover, I have not found anything to suggest whether or not the isomorphism in the finite case admits a canonical one (my hunch is that it doesn't, but I could be wrong).

Indeed, looking at it more closely, it appears that the mathematical tensor product is, in a way, built "inside-out" compared to how one thinks of it in a physics context. What I mean by this is related to its characterization by the universal property. In a physics context, we consider a tensor as something we apply to vectors, to get a result such as a metric length or angle in general relativity. But in the case of the mathematical tensor product, tensors become something that maps are applied to to get bilinear maps. Namely, any given bilinear map $f: V \times W \rightarrow Z$ to an arbitrary vector space $Z$ can be written as

$$f(v, w) = f_\mathrm{tens}(v \otimes w)$$

where $f_\mathrm{tens}: V \otimes W \rightarrow Z$ is unique (this is the definition of the universal property). Here it seems the "tensor" doesn't play any role greater than simply being a "package" to combine the operands $v$ and $w$, which leaves us hopeless to make sense of something like what is done in general relativity when we consider, say, the line element

$$ds^2 = g_{\mu \nu} dx^\mu dx^\nu$$

where the metric tensor $g$ takes now a much more active role, operating on (here infinitesimal) vectors to produce an (infinitesimal) arc length $ds$. What can we do to make sense of the latter if we take $g \in \mathbb{R}^4 \otimes \mathbb{R}^4$ (technically it should be a tensor product of tangent spaces $T_x M$ on the space-time manifold in the full-blown physics context, but I'm keeping this simple)? How do we make sense of "applying" a tensor to the two vectors in this way (ignore the infinitesimal stuff; just think about in in terms of regular vectors)?

After thinking about this for a while, I thought that it seems what we're really after here is a form of tensor action upon vector spaces, in the same way that we can talk of a group action upon a set - note that a group abstracts the idea of a set of bijective self-maps under composition, but only cares about their composing structure - a group action in effect is a way to realize this composing structure in terms of actual maps. Here, the tensor product seems to do something similar with tensors. If the missing notion is tensor action, then what I'd want to know is two things:

  1. what the most satisfactory definition of such, and
  2. given that such a thing is quite likely non-unique, what are the implied choices made in physics?
  3. conversely, what does the (presumed) freedom in choosing a tensor action correspond to?

What would you say?

5

There are 5 best solutions below

4
On BEST ANSWER

I am going to now actually answer my own question because I feel I may have just come up with a candidate solution. I also just noted, after coming up with this, an answer by @asdq that seems like it might be similar - let's see how this goes. I'd appreciate feedback if any of these steps are problematic.

The relevant insight was to see if there is some way we can relate the tensor product back to things we are already familiar with that are broadly related to the idea of tensors we have in mind - bilinear functionals - and things similar to them, i.e. linear functionals more generally.

And one thing that comes to mind is to think about the following. Forget about a tensor product $V \otimes W$ of two vector spaces - up your game to a tensor product of three - $V \otimes W \otimes X$. We should expect the elements of this 3-fold tensor product to relate somehow to trilinear functionals taking elements from all three of $V$, $W$, and $X$. So hold that - and now go the other way - what should the elements of a 1-fold tensor product be? Well, they should then be linear functionals ... but a 1-fold tensor product has only one input, $V$. The space of linear functionals though is the dual space $V^{*}$, and $V^{*}$ is not guaranteed isomorphic to $V$, which mirrors what I mentioned in the question details about possibly not having a unique tensor action.

So, we can either think of this two ways: one of these is to violate convention that a unary application should just leave nothing done and instead somehow causes $V$ to become its dual, or we say that in fact what we instead should do is only interpret tensor products this way when their inputs are dual spaces. I.e. suppose that instead of tensoring $V$, we tensor $V^{*}$. If $V^{*}$ goes into the unary tensor product, then the "unary tensor product" of $V^{*}$ is just $V^{*}$, and the map to unilinear functionals is immediate.

So what then in the first nontrivial case? Well, think about how a simple tensor acts in its finite dimensional case when we can write it as a Kronecker product: This is not often mentioned, but if $\mathbf{u}$ and $\mathbf{v}$ are vectors in the ordinary finite dimensional space $\mathbb{R}^n$ under its standard basis, then we have a bilinear functional

$$B(\mathbf{a}, \mathbf{b}) := \mathbf{a}^T (\mathbf{u} \otimes \mathbf{v}) \mathbf{b}$$

which becomes

$$B(\mathbf{a}, \mathbf{b}) = (\mathbf{a}^T \mathbf{u}) (\mathbf{b}^T \mathbf{v})$$

and note where the transposes - which reflect some mapping of vectors to their dual vectors - are! This means that the better way to generalize this is as a bilinear on the dual spaces, and generally we should have that the simple tensor

$$\mathbf{u} \otimes \mathbf{v}$$

is actually a bilinear on $V^{*} \times W^{*}$, namely now we define the tensor action

$$(\mathbf{u} \otimes \mathbf{v})(\mathbf{a}^T, \mathbf{b}^T) := (\mathbf{a}^T \mathbf{u})(\mathbf{b}^T \mathbf{v})$$

And then the combining rules entail the action for all tensors. Or, in other words, each element of the tensor product $V \otimes W$ acts naturally, not on $V \times W$ itself, but on the product of dual spaces, $V^{*} \times W^{*}$.

This is, then, exactly what @asdq mentioned.

Alternatively, though, we may "flip the Ts around", and take

$$(\mathbf{u}^T \otimes \mathbf{v}^T)(\mathbf{a}, \mathbf{b}) := (\mathbf{u}^T \mathbf{a})(\mathbf{v}^T \mathbf{b})$$

which gives the same, and shows that the tensor product on dual spaces, i.e. $V^{*} \otimes W^{*}$, should likewise have a natural relation to bilinears on $V \times W$. (Presumably, we should more strictly suggest a relation to the double dual product $V^{**} \times W^{**}$, but $V$ and $W$ are naturally contained within their double duals, and what that would show is there can be bilinears out of the double-dual that are not within the tensor product of the original spaces.)

(Note that these two agree so long as we have a map to the duals, as we do with a set basis like we've been using here.)

Alternatively, if we want to define a tensor action generically, we can - the most general tensor action is just this:

Def: A general tensor action is a linear map from a tensor product $V \otimes W$ to the space of bilinear maps $X \times Y \rightarrow Z$ on vector spaces $X$, $Y$, and $Z$, where all spaces involved here are over the same field $F$.

Then it is a theorem there is a canonical tensor action when $X = V^{*}$, $Y = W^{*}$, and $Z = \text{(type coerce)}F$, the base field of the spaces, which we just proved above. Alternatively, when we consider $V^{*} \otimes W^{*}$, but when $X = V$ and $Y = W$.

So how does this relate to physics?

The answer is to pay more heed to the indices. There's a reason it's written as

$$ds^2 = g_{\mu \nu} dx^\mu dx^\nu$$

where the indices on $g$ are on the bottom and $dx$ on the top. Lower indices mean, in physics notation, dual vectors, while upper indices mean ordinary vectors. The $g_{\mu \nu}$ here is transforming like a tensor on the dual space. Hence, it lives in $V^{*} \otimes W^{*}$, and indeed you'll see

$$g = g_{\mu \nu} (\mathbf{e}^\mu \otimes \mathbf{e}^{\nu})$$

(where we've used the Einstein summation convention) and the upper indices on basis vectors mean the basis vectors are in the dual space, viz. that a "normal" (as in "ordinary") vector is written

$$\mathbf{v}\ \text{(i.e. $\in V$, not $V^{*}$)} = v^i \mathbf{e}_i$$

with the basis now lower-indexed. So indeed, tensors are elements of tensor products, tensors relate to bilinear maps in a manner analogous to how group elements relate to bijective self-maps, and there is a natural correspondence when there is suitable play between a tensor product, vector spaces, and their dual spaces.

4
On

I should preface this by pointing out that just about all of my experience actually using tensors has been with finite dimensional ones. Four dimensions of spacetime, and so forth. I never quite mastered all the subtleties in the later parts of Korevaar's Mathematical Methods, for example.

Tensors are a general category, with both covariant and contra variant ranks. A tensor of rank $(2,3)$ takes $2$ vectors and $3$ linear functionals as inputs, and outputs a scalar. But that is only one of its uses. You can combine tensors of any rank to make an outer product, and some combine (contract) to make an inner product.

The general definition of a tensor is "a things whose components transform, when switching to another coordinate system, in a specific way." I'd have to go flipping through books a while to find the formula and get all my subscripts and superscripts right.

A rank $2$ tensor (a matrix) can be rank $(0,2)$ or $(1,1)$ or $(2,0)$. It can take in two contravariant vectors (column vectors, rank $(1,0)$) or two covariant vectors (row vectors, rank $(0,1)$) or one of each. These versions are I believe transformed into each other by the metric, which can be used to raise or lower indices.

For example, if you take a tensor that inputs two vectors to give a scalar, and input only one vector, you have a linear functional, an object that will take one vector as input and return a scalar. Two actions of the metric on a rank $(2,0)$ tensor can produce a rank $(0,2)$ tensor, which takes two linear functionals as inputs instead of two vectors. I believe the "dual space" of the space of vectors is the space of linear functionals and vice versa.

A matrix can multiply a vector, "act on" it, to produce a new vector. That is contraction. A matrix can also multiply another matrix, so it can act on that as well. So a matrix can be acted on as well as act.

An outer product of a matrix (representing a rank $(1,1)$ tensor, say) with a vector would produce a tensor of rank $(1,2)$, as it would require a vector, a functional, and another vector to contract all the way to a scalar.

Taking the simplest case, the inner product of a $(0,1)$ tensor and a $(1,0)$ tensor is a number, while the outer product is a rank $(1,1)$ tensor. I think of that as multiplying a row times a column, versus a column times a row.

0
On

If $V$ and $W$ are $k$-vector spaces, then their tensor product $V\otimes W$ admits a natural map $V\otimes W\to \operatorname{Bil}(V^\ast\times W^\ast, k)$, so tensors act on the corresponding dual spaces. I think this is also how one should interpret the physical formula you wrote down, keeping in mind that $dx$ is actually a differential form.

0
On

Between physics and math, the term tensor is used in a myriad of ways, referring to objects with a wide range of structure, from just elements of a tensor product of modules to sections of certain vector bundles to "something that transforms like a tensor". I think what you're getting mixed up in, however, is just exactly where duals are being taken. A tensor product of modules (say, vector spaces $V,W$) is yet another module that satisfies a universal property. Since the universal property makes the resulting object unique, it is completely fine to think of this in terms of a particular construction: namely, the free module over the base ring generated by the set $V \times W$, modded out by a particular quotient. This allows one to concretely consider elements of a tensor product as the set of objects of the form $\sum_i v_i \otimes w_i$, where $v_i \in V$ and $w_i \in W$, with some rules for when two such expressions are the same. Given bases $\{e^V_i\}$ and $\{e^W_j\}$ of $V$ and $W$, the explicit construction even allows us to describe the basis of the tensor product space: the set $\{e^V_i \otimes e^W_j\}$. No duals or maps acting on the tensor product space need be considered at this point from this explicit perspective.

Given a vector space $V$ over a number field, say the reals, we give a special name to elements of a particular class of tensor products, namely the repeated tensor products of $V$ with its dual $V^* = \{ \mathbb{R} \text{-linear maps } V \to \mathbb{R}\}$. That is, we give the name $(k,q)$-tensors to elements of $V^{\otimes k} \otimes (V^*)^{\otimes q}$. In some contexts where these are used pretty much exclusively, the term "tensor" is used for objects only of this form. If $V$ has a basis $\{e_i\}$ and $V^*$ is given the corresponding dual basis $\{e_j^*\}$, the $(k,q)$-tensors are just linear combinations of the basis elements

$$e_{i_1}\otimes e_{i_2} ...\, \otimes e_{i_k} \otimes e_{j_1}^* \otimes e_{j_2}^* ... \otimes e_{j_q}^* .$$

Notice that, as an element of the dual space $V^*$, $e_j^*$ is a $\mathbb{R}$-linear map $V \to \mathbb{R}$. That is, $e_j^*$ takes in a vector and outputs a number. In the same way, $e_{j_1}^* \otimes e_{j_2}^* ... \otimes e_{j_q}^*$ takes in a collection of $q$ vectors and outputs a number, and so the above basis element can be said to take in $q$ vectors and outputs a number times $e_{i_1}\otimes e_{i_2} ...\, \otimes e_{i_k}$ (i.e. it leaves a collection of $k$ vectors behind).

In the case of differential geometry, the vector space $V$ in question when speaking of $(k,q)$-tensors is the tangent space $T_p M$ to a manifold $M$ at some point $p \in M$ (here, the terms "$(k,q)$-tensor" or just "tensor" very often actually refer to smooth sections of the appropriate vector bundle built out of the tangent bundle as with $V$ above, but that's not too important of a distinction for us). In some coordinate system $(x^i)$ in a neighborhood of $p \in M$, the tangent space has natural basis $\{\frac{\partial}{\partial x^i}\}$, to which the dual basis is what we denote by $\{dx^j\}$. A metric tensor $g$ is just a $(0,2)$ tensor satisfying some particular properties, so it belongs specifically to $V^* \otimes V^*$ (not $V \otimes V$), and in particular it is a linear combination of the basis elements $dx^i \otimes dx^j$. Denoting by $g_{ij}$ the coefficients of this basis expansion, we then have

$$g = g_{ij} dx^i \otimes dx^j.$$

So $g$ "takes in two vectors" in the sense that any $(0,2)$-tensor does, simply because these objects are built out of two copies of the dual to the underlying vector space, and each copy of the dual takes in one vector. This structure of eating two vectors is not afforded simply by the fact that $g$ lies in a tensor product, but by what it is a tensor product of, i.e. the dual space to $V = T_p M$.

This is all somewhat muddled by the natural map given in asdq's answer, which some mathy physics books will use to define tensors without ever writing down what a tensor product of vector spaces is. That fact indicates that an element of a tensor product is automatically able to be "applied" to something, but importantly it's to objects in the duals to your original vector spaces! So one must be careful to keep track of where your duals are and where they're not. The discrepancy between finite and infinite dimensional intuition comes from the fact that the natural map is no longer an isomorphism in infinite dimensions, so the mathy physicist definition is no longer equivalent.

0
On

My view of tensor products is a bit unusual in that it is slightly computational. Also, this isn't meant to be completely rigorous, just an intuition pump.

The idea is this: suppose you have a pair of vectors $u\in U$ and $v\in V$ and know that at some future point you are going to compute a bilinear product (eg. dot product, cross product, outer product, or something else) of those vectors, what could you precompute and store now? Suppose we have finite bases for $U$ and $V$ so we can write $u$ and $v$ in terms of components $(u_i), (v_i)$. Then any of the products $u_iv_j$ might appear in the computation of a bilinear product. So we ought to store the array of all of those. It'd be wasteful to store anything else in addition.

Let's interpret the diagram and the universal property text at wikipedia in the light of what I've said above:

$\require{AMScd}$ \begin{CD} V\times W @>\varphi>> V\otimes W\\ @V \cong V V @VV \tilde{h} V\\ V\times W @>>h> Z \end{CD}

The object $V\otimes W$ is a vector space (informally some data type).

The associated bilinear map $\varphi$ is the the method you use to build the array. (And yes, it turns out is is in fact bilinear).

"any bilinear map $h:V\times W\rightarrow Z$ ... factors through $\varphi$ uniquely" pretty much says that $\varphi$ is the correct precomputation - we can use it to get any bilinear product. But that "uniquely" is also important as it's a way of expressing "It'd be wasteful to store anything else" above. Suppose in addition to our array of $(u_iv_j)$ we also stored some other number, call it $x$. We could try to define our tensor product space as the set of arrays accompanied by an additional real. We would then define the tensor product $u\otimes v$ to be the array of values $(u_iv_j)$ with the number zero stored in $x$. But now we have multiple maps $\tilde{h}$ we could use to emulate any individual bilinear product because we could add any multiple of $x$ to $h$ and get the same bilinear product. So the "uniqueness" property is a sort of mathematical way to express "don't store any more data than you need".

So the universal property is actually a lot like a specification of some software. The "factor through" means "store enough data to be able to compute what we want". and the "unique" means "don't store too much". Importantly the specification doesn't have to specify any of the implementation details. For example it doesn't mention bases and makes sense even if you don't have bases. It just says what you need to be able to do with the result. Note this also has another consequence: because it's not specifying what is stored in $U\otimes V$ you don't actually have to store that table at all, because as long as your implementation satisfies the spec it'll be isomorphic to any other. But it's often a convenient representation for the finite dimensional case.

Let's just say this is all just an analogy for now. But we could probably make the description above rigorous with a bit of extra work.