This is a mathematical construct that I've found rather hard to construct a more intuitive description of than what is usually given - heck, I've sooner with enough digging found what I'd consider a more intuitive description for topological spaces that would explain the formalism behind them better, in terms of "rulers" - see Dan Piponi's answer to this mathoverflow question:
https://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets
but the tensor product is one of those things that still seems to resist my attempts. You see, what I'd like to say is that the tensor product of two vector spaces "is the space of all tensors on the two spaces" which, of course, begs the question of just what a "tensor" is. And the definition I have in mind is that a tensor is a bilinear functional on the two vector spaces, which gels with the usual ideas from how they are applied in physics and engineering contexts.
However, I find that a mathematical tensor product which, despite what I've said, feels in other regards like a more "elegant" construction, is hard to relate to this. In particular, I've found that a tensor product is actually not isomorphic in general to the space of bilinear functionals, but rather to the dual of that space, which means that for infinite-dimensional vector spaces, there is a real difference - namely, there must be strictly more tensors than there are bilinear functionals. Moreover, I have not found anything to suggest whether or not the isomorphism in the finite case admits a canonical one (my hunch is that it doesn't, but I could be wrong).
Indeed, looking at it more closely, it appears that the mathematical tensor product is, in a way, built "inside-out" compared to how one thinks of it in a physics context. What I mean by this is related to its characterization by the universal property. In a physics context, we consider a tensor as something we apply to vectors, to get a result such as a metric length or angle in general relativity. But in the case of the mathematical tensor product, tensors become something that maps are applied to to get bilinear maps. Namely, any given bilinear map $f: V \times W \rightarrow Z$ to an arbitrary vector space $Z$ can be written as
$$f(v, w) = f_\mathrm{tens}(v \otimes w)$$
where $f_\mathrm{tens}: V \otimes W \rightarrow Z$ is unique (this is the definition of the universal property). Here it seems the "tensor" doesn't play any role greater than simply being a "package" to combine the operands $v$ and $w$, which leaves us hopeless to make sense of something like what is done in general relativity when we consider, say, the line element
$$ds^2 = g_{\mu \nu} dx^\mu dx^\nu$$
where the metric tensor $g$ takes now a much more active role, operating on (here infinitesimal) vectors to produce an (infinitesimal) arc length $ds$. What can we do to make sense of the latter if we take $g \in \mathbb{R}^4 \otimes \mathbb{R}^4$ (technically it should be a tensor product of tangent spaces $T_x M$ on the space-time manifold in the full-blown physics context, but I'm keeping this simple)? How do we make sense of "applying" a tensor to the two vectors in this way (ignore the infinitesimal stuff; just think about in in terms of regular vectors)?
After thinking about this for a while, I thought that it seems what we're really after here is a form of tensor action upon vector spaces, in the same way that we can talk of a group action upon a set - note that a group abstracts the idea of a set of bijective self-maps under composition, but only cares about their composing structure - a group action in effect is a way to realize this composing structure in terms of actual maps. Here, the tensor product seems to do something similar with tensors. If the missing notion is tensor action, then what I'd want to know is two things:
- what the most satisfactory definition of such, and
- given that such a thing is quite likely non-unique, what are the implied choices made in physics?
- conversely, what does the (presumed) freedom in choosing a tensor action correspond to?
What would you say?
I am going to now actually answer my own question because I feel I may have just come up with a candidate solution. I also just noted, after coming up with this, an answer by @asdq that seems like it might be similar - let's see how this goes. I'd appreciate feedback if any of these steps are problematic.
The relevant insight was to see if there is some way we can relate the tensor product back to things we are already familiar with that are broadly related to the idea of tensors we have in mind - bilinear functionals - and things similar to them, i.e. linear functionals more generally.
And one thing that comes to mind is to think about the following. Forget about a tensor product $V \otimes W$ of two vector spaces - up your game to a tensor product of three - $V \otimes W \otimes X$. We should expect the elements of this 3-fold tensor product to relate somehow to trilinear functionals taking elements from all three of $V$, $W$, and $X$. So hold that - and now go the other way - what should the elements of a 1-fold tensor product be? Well, they should then be linear functionals ... but a 1-fold tensor product has only one input, $V$. The space of linear functionals though is the dual space $V^{*}$, and $V^{*}$ is not guaranteed isomorphic to $V$, which mirrors what I mentioned in the question details about possibly not having a unique tensor action.
So, we can either think of this two ways: one of these is to violate convention that a unary application should just leave nothing done and instead somehow causes $V$ to become its dual, or we say that in fact what we instead should do is only interpret tensor products this way when their inputs are dual spaces. I.e. suppose that instead of tensoring $V$, we tensor $V^{*}$. If $V^{*}$ goes into the unary tensor product, then the "unary tensor product" of $V^{*}$ is just $V^{*}$, and the map to unilinear functionals is immediate.
So what then in the first nontrivial case? Well, think about how a simple tensor acts in its finite dimensional case when we can write it as a Kronecker product: This is not often mentioned, but if $\mathbf{u}$ and $\mathbf{v}$ are vectors in the ordinary finite dimensional space $\mathbb{R}^n$ under its standard basis, then we have a bilinear functional
$$B(\mathbf{a}, \mathbf{b}) := \mathbf{a}^T (\mathbf{u} \otimes \mathbf{v}) \mathbf{b}$$
which becomes
$$B(\mathbf{a}, \mathbf{b}) = (\mathbf{a}^T \mathbf{u}) (\mathbf{b}^T \mathbf{v})$$
and note where the transposes - which reflect some mapping of vectors to their dual vectors - are! This means that the better way to generalize this is as a bilinear on the dual spaces, and generally we should have that the simple tensor
$$\mathbf{u} \otimes \mathbf{v}$$
is actually a bilinear on $V^{*} \times W^{*}$, namely now we define the tensor action
$$(\mathbf{u} \otimes \mathbf{v})(\mathbf{a}^T, \mathbf{b}^T) := (\mathbf{a}^T \mathbf{u})(\mathbf{b}^T \mathbf{v})$$
And then the combining rules entail the action for all tensors. Or, in other words, each element of the tensor product $V \otimes W$ acts naturally, not on $V \times W$ itself, but on the product of dual spaces, $V^{*} \times W^{*}$.
This is, then, exactly what @asdq mentioned.
Alternatively, though, we may "flip the Ts around", and take
$$(\mathbf{u}^T \otimes \mathbf{v}^T)(\mathbf{a}, \mathbf{b}) := (\mathbf{u}^T \mathbf{a})(\mathbf{v}^T \mathbf{b})$$
which gives the same, and shows that the tensor product on dual spaces, i.e. $V^{*} \otimes W^{*}$, should likewise have a natural relation to bilinears on $V \times W$. (Presumably, we should more strictly suggest a relation to the double dual product $V^{**} \times W^{**}$, but $V$ and $W$ are naturally contained within their double duals, and what that would show is there can be bilinears out of the double-dual that are not within the tensor product of the original spaces.)
(Note that these two agree so long as we have a map to the duals, as we do with a set basis like we've been using here.)
Alternatively, if we want to define a tensor action generically, we can - the most general tensor action is just this:
Then it is a theorem there is a canonical tensor action when $X = V^{*}$, $Y = W^{*}$, and $Z = \text{(type coerce)}F$, the base field of the spaces, which we just proved above. Alternatively, when we consider $V^{*} \otimes W^{*}$, but when $X = V$ and $Y = W$.
So how does this relate to physics?
The answer is to pay more heed to the indices. There's a reason it's written as
$$ds^2 = g_{\mu \nu} dx^\mu dx^\nu$$
where the indices on $g$ are on the bottom and $dx$ on the top. Lower indices mean, in physics notation, dual vectors, while upper indices mean ordinary vectors. The $g_{\mu \nu}$ here is transforming like a tensor on the dual space. Hence, it lives in $V^{*} \otimes W^{*}$, and indeed you'll see
$$g = g_{\mu \nu} (\mathbf{e}^\mu \otimes \mathbf{e}^{\nu})$$
(where we've used the Einstein summation convention) and the upper indices on basis vectors mean the basis vectors are in the dual space, viz. that a "normal" (as in "ordinary") vector is written
$$\mathbf{v}\ \text{(i.e. $\in V$, not $V^{*}$)} = v^i \mathbf{e}_i$$
with the basis now lower-indexed. So indeed, tensors are elements of tensor products, tensors relate to bilinear maps in a manner analogous to how group elements relate to bijective self-maps, and there is a natural correspondence when there is suitable play between a tensor product, vector spaces, and their dual spaces.