This question as irked me since finishing Linear Algebra.
Question I: With regards to computational runtime – given some large matrix A, which is the fastest way to calculate the inverse:
I. Calculating the inverse of some matrix A using row reduction, without using any orthonormal transformation.
II. Using Gram-Schmidt to calculate orthonormal matrix, then calculating the inverse by transposing, i.e. $Q^T = Q^{-1}$.
I am sure (II) is faster, but is it practical in applications like regression analysis to always use this shortcut? For example, we know that Ordinary Least Squares (OLS) has the closed form solution:
$$y=(A^TA)^{-1}A^T-b$$
But, can we retain the accuracy of our solution for y and improve of computational run-time by finding the orthonormal matrix first?
$$make\_matrix\_orthonormal(A)$$ $$y=(A^TA)^{-1}A^T-b$$ $$=(A^{-1}A)^{-1}A^{-1}-b$$ $$=(I)^{-1}A^{-1}-b$$ $$=(I)A^{-1}-b$$ $$=A^{T}-b$$
Question II:
a. Is there any downside for always using the orthonormal matrix in regression analysis to avoid ever having to calculate the more computationally burdensome inverse, or do we loose something like accuracy?
b. Am I wrong in thinking that orthonormal matrixes should be used for reducing runtime.
c. What are some data-science algorithms capitalize on this $Q^T = Q^{-1}$ property, and use Gram-Schmidt first to reduce runtime?