In an algorithm improvement problem, I was thinking that the cosine similarity along with the euclidean distance could be obtained in a way that the number of times it needs to calculate a square and a square root is reduced.
The idea was to store in a database a set of vectors as tuples that contains the unit vector components, the magnitude and the square of the magnitude.
$$ \mathbf{V}^n = [\hat{v_1}, \hat{v_2} \dots \hat{v_n}, |V|, |V|^2]$$ So the database will contain the following data.
$$ \mathbf{V}^n_m= \begin{array}( (\hat{v_{11}}, \hat{v_{12}} \dots \hat{v_{1n}}, M_1, S_1)\\ (\hat{v_{21}}, \hat{v_{22}} \dots \hat{v_{2n}}, M_2, S_2)\\ \vdots \\ (\hat{v_{m1}}, \hat{v_{m2}} \dots \hat{v_{mn}}, M_m, S_m) \end{array}$$
Where
$$ M_m = |V_m| $$ $$ S_m = |V_m|^2$$
In this way, for any two given vectors, I can obtain the cosine similarity as the dot product of the corresponding unit vectors:
$$ \cos_\ \theta_{V_1V_2} = \hat{v_1} \cdot \hat{v_2} = \hat{v_{11}}\hat{v_{21}} + \hat{a_{12}}\hat{b_{22}}+ \dots +\hat{a_{1n}}\hat{b_{2n}} \\ \cos_\ \theta_{V_1V_3} = \hat{v_1} \cdot \hat{v_3} = \hat{v_{11}}\hat{v_{31}} + \hat{a_{12}}\hat{b_{32}}+ \dots +\hat{a_{1n}}\hat{b_{3n}} \\ \vdots \\ \cos_\ \theta_{V_1V_n} = \hat{v_1} \cdot \hat{v_n} = \hat{v_{11}}\hat{v_{n1}} + \hat{a_{12}}\hat{b_{n2}}+ \dots +\hat{a_{1n}}\hat{b_{nn}} \\ \vdots $$
Now, intuitively, I just thought that having the cosine, I could obtain the euclidean distance by means of the law of cosines, thus:
$$ \mathbf{d_{V_1V_2}} = \sqrt{S_1+S_2-2M_1M_2\cos\ \theta_{V_1V_2}} \\ \mathbf{d_{V_1V_3}} = \sqrt{S_1+S_3-2M_1M_3\cos\ \theta_{V_1V_3}} \\ \vdots \\ \mathbf{d_{V_1V_n}} = \sqrt{S_1+S_n-2M_1M_n\cos\ \theta_{V_1V_n}} \\ \vdots $$
In this way, the cost to calculate both metrics between any two given vectors is reduced to a single square root calculation. Now, I said intuitively because I can visualize the law of cosines in a 2-dimension and 3-dimension space but I am not sure about higher dimensions.
Is there a flaw in the reasoning? Is the law of cosines valid for dimensions higher than 3?
Your reasoning about cosines is fine; I think there's quite a bit of math in which we actually define the angle between two vectors by taking the arc cosine of their inner product.
I think that answers your question; the rest of what follows is essentially a comment and suggestion relative to your proposed method.
Let's see what happens if you store the data this way: $$ \mathbf{V} = \left( \begin{array}{c} (v_{11}, v_{12}, \ldots v_{1n}, S_1)\\ (v_{21}, v_{22}, \ldots v_{2n}, S_2)\\ \vdots \\ (v_{m1}, v_{m2}, \ldots v_{mn}, S_m) \end{array} \right),$$ where $S_i = \lVert \mathbf V_i\rVert^2 = v_{i1}^2 + v_{i2}^2 + \cdots v_{in}^2$ for each $1 \leq i \leq m.$ Then for any $1 \leq i < j \leq m,$
$$ \lVert \mathbf V_i\rVert \lVert \mathbf V_j\rVert \cos \theta_{V_iV_j} = \mathbf V_i \cdot \mathbf V_j = v_{i1}v_{j1} + v_{i1}v_{j1} + \cdots + v_{i1}v_{j1}. $$
Instead of performing $n + 2$ multiplications and $n - 1$ additions to obtain $M_1 M_2 \cos \theta_{V_1V_2}$ from $M_1,$ $M_2,$ and the $2n$ components of the unit vectors $\hat {\mathbf V}_1$ and $\hat {\mathbf V}_2,$ you could perform $n$ multiplications and $n - 1$ additions to obtain $\lVert \mathbf V_i\rVert \lVert \mathbf V_j\rVert \cos \theta_{V_1V_2}$ from the $2n$ components of the original vectors ${\mathbf V}_1$ and ${\mathbf V}_2.$ Then every time you compute
$$ \mathbf{d_{V_iV_j}} = \sqrt{S_i + S_j - 2 \mathbf V_i \cdot \mathbf V_j} $$
it would cost you $n + 1$ multiplications (including the multiplication by $2$), $n$ additions, one subtraction, and one square root, rather than $n + 3$ multiplications, $n$ additions, one subtraction, and one square root. Moreover, you save the cost of computing $M_i,$ $M_j,$ and all the $\hat v_{ik}$ and $\hat v_{jk}$ components in the first place.
Granted, if the number of vectors is much larger than the number of dimensions, the number of operations saved is a small fraction of the total cost of computing all the distances, but the point is that storing vectors in the form $$ (\hat v_{11}, \hat v_{12}, \ldots \hat v_{1n}, M_1, S_1)$$ rather than $$(v_{11}, v_{12}, \ldots v_{1n}, S_1)$$ saves you nothing (and actually incurs some added cost) for the purpose of this exercise.
That's not to say there isn't some application where precomputing all the values of $\lVert \mathbf V_i\rVert$ would lower the cost of computation. I've considered that trick myself from time to time. It just doesn't pay off for this particular application.