Does anybody know about the memory complexity of triangular solver?
PyTorch Documentation: https://pytorch.org/docs/stable/generated/torch.triangular_solve.html
More about Triangular solver: http://homepages.math.uic.edu/~jan/mcs572/pipetriangular.pdf
I'm trying to write some code for computation of kernel related gaussian processes and I'm facing some memory issues. Currently I'm not aware if the problem is because I am doing something wrong or if the triangular solver requires more than linear memory.
The pseudocode for my case is:
lambda t:f64[20000,20000] u:f64[20000,10]:
v:f64[20000,10] = triangular_solve[
conjugate_a=False
left_side=True
lower=False
transpose_a=True
unit_diagonal=False
] t u
w:f64[20000,10] = triangular_solve[
conjugate_a=False
left_side=True
lower=False
transpose_a=False
unit_diagonal=False
] t v
v w
A triangular linear solve can be performed in-place, as the coefficients of the triangular matrix are read off and used to perform row operations on the right-hand-side. In this way, the memory usage will only be about as large as the storage of the triangular matrix and right-hand-side, plus possibly some small additional memory needed for intermediate arithmetic at each row operation.
With that being said, are you performing the LU decomposition of a $20000 \times 20000$ matrix? And is that matrix dense? If you're using double precision, then just storing that matrix in memory costs $(20000)^2 \cdot 8 = 3.2 \times 10^9$ bytes, or 3 GB of memory. On modern machines this might not be that prohibitive, but if this is just one step out of many then that might explain your memory issues.