I'm reading a paper "Multilinear Factorization Machines for Multi-Task Multi-View Learning". In section 3.1, there're two equations (7) and (8) I found difficult to understand.
- All vectors are column vectors.
- $\langle·,·\rangle$ denotes inner product. The inner product of matrices or tensors is the sum of the products of corresponding elements, just like the inner product of two vectors.
- $◦$ denotes tensor product (outer product)
Let $e_t$ be the task indicator vector, $e_t=[0,...,0,1,0,...,0]^T$, which is an one-hot encoded vector where only the $t$-th element is 1 and the rest elements are all zeros. So if we have a weight matrix $W$ for all tasks, and $w_t$ is the weight vector for task $t$, then $w_t=We_t$.
The prediction function equation (7) is: $$f_t(x)=x^Tw_t=x^TWe_t=\langle W,x◦e_t \rangle$$, where $x$, $w_t$ and $e_t$ are column vectors, $W$ is a matrix.
My main question is, How is the following part derived?
$x^TWe_t=\langle W,x◦e_t \rangle$
Equation (8) looks similar: $$f_t(\{x^{(1)}, x^{(2)}\})=x^{(1)^T}W_tx^{(2)}=\langle \mathcal{W}, x^{(1)}◦x^{(2)}◦e_t \rangle$$
Is it in Eq.(8), $f_t(\{x^{(1)}, x^{(2)}\})$ means $f_t(x^{(1)}◦x^{(2)})$, i.e., the outer product is the input?