Consider a neural network layer with $n$ inputs and $p$ outputs, $h=g(W^Tx+b)$. We may replace this with two layers, with one layer using weight matrix U and the other using weight matrix $V$. If the first layer has no activation function, then we have essentially factored the weight matrix of the original layer based on $W$. The factored approach is to compute $h=g(V^TU^Tx+b)$. If $U$ produces $q$ outputs, then U and V together contain only $(n+p)q$ parameters, while $W$ contains $np$ parameters. For small $q$, this can be a considerable saving in parameters. - Deep Learning, Bengio and al. page 192
So I supposed the original layer has a weight matrix W of size (p, n). The idea is to factorize this matrix W into two matrices U and V. Suppose U is a weight matrix of size (q, n) and V is a weight matrix of size (p, q), where q is a value smaller than n and p.
For instance, I supposed we have a neural network layer to predict wartime casualties as a function of several variables. We have a set of historical data including information on military deployments, manpower, available resources, etc., which we can use to predict wartime casualties. So I suppose we have a neural network layer with 10 input variables (n = 10) and we want to predict the level of losses (p = 1) in wartime. The weight matrix of this layer is W, of size (1, 10).
But how can I model when it's worth decomposing the matrix? I tried to draw a 3D representation in desmos. Or I guess I can't draw it as it is in 4th dimension? Unless I use something like a size to represent one parameter?
Sorry if my question is dumb, I come from computer science backgroumd and I am a slow learner in Mathematics.