Whenever I work with matrices for the SVD decomposition in the following manner:
X = np.random.randn(100,20)
U,S,V = np.linalg.svd(X)
plt.imshow(U) plt.figure() plt.imshow(V)
The larger matrix always exhibits this diagonally dominant behavior across the larger matrix (in this case $U$) as follows:
Currently I am not sure why this tends to appear at all. If the matrices were of a similar size such as if I did:
X = np.random.randn(100,95)
Then no diagonal component is observed in either $U$ or $V$. I am wondering why this is so.
Is this due to an artifact of the algorithm chosen by Python (in this case), or is there a strong mathematical reason to believe this always appears regardless of algorithmic method chosen to compute the SVD? Or perhaps is it a combination of both?

The trailing columns may be any orthonormal basis that spans the orthogonal complement of the column space of $X$. Hence it is an artifact of the algorithm used. The question now becomes: what in the algorithm causes this diagonal behavior, and why does it stop occurring when the number of rows and columns are similar?
I did a little detective work about what algorithms numpy and matlab use, and in summary, the diagonal behavior is caused by a Householder QR decomposition that is performed as a pre-processing step if the matrix is significantly taller than it is wide. This is before the main SVD calculations begin.
Numpy and Matlab's svd code both wrap the dgesdd Fortran routine in LAPACK. On lines 616-618 of dgesdd.f, the comment says that if the matrix is significantly taller than it is wide, it is first reduced in dimension via full QR factorization. Here I quote lines 614-635:
Mathematically, what they are doing is first factoring $$X = QR = \begin{bmatrix}Q_1 & Q_2\end{bmatrix}\begin{bmatrix}R_1 \\ 0\end{bmatrix}$$ where $Q$ is orthonormal, and $R_1$ is square and upper triangular. Then the SVD only needs to be performed on the smaller matrix $R_1$: $$R_1 = U' \Sigma' V'^T$$ from which they recover the SVD of $X$ as follows: $$X = U \Sigma V^T = \begin{bmatrix}Q_1 U' & Q_2\end{bmatrix} \begin{bmatrix}\Sigma' \\ 0\end{bmatrix} V'^T.$$ The diagonal behavior you are seeing is in the matrix $Q_2$.
The QR routine used is dgeqrf, which does QR via Householder reflectors. This works as follows. First, define $$u := x - \text{sign}(x_0)||x||e_1, \qquad v := u / ||u||, \qquad Q^{(1)} := I - 2 v v^T$$ where $x$ is the first column of $X$, $\text{sign}(x_0)$ is the sign of the first entry of $x$, and $e_1=(1,0,0,\dots,0)$. Then the action of $Q^{(1)}$ zeros out the first column of $X$: $$Q^{(1)} X = \begin{bmatrix}||x|| & \text{stuff} \\ 0 & X'\end{bmatrix}.$$ Because of how $Q^{(1)}$ is constructed from $x$, the first column and row of $Q^{(1)}$ will be a normalized version of $x$, while the rest of the matrix $Q$ will be the identity plus the outer product of a random vector with itself. In other words, a perturbation of the identity by small noise. Here is some code and a picture for a small example:
The process then repeats on the smaller matrix $X'$, and so on, until $X$ has been transformed into $$R = Q^{(k)} Q^{(k-1)} \dots Q^{(1)} X$$ by actions of a sequence of orthogonal matrices of the form $$Q^{(i)} = \begin{bmatrix}I & 0 \\ 0 & I - 2 v v^T\end{bmatrix}.$$ Here the $v$ is the analogous vector for the smaller submatrix that is being worked on in the ith step. By multiplying these matrices together to form $Q = Q^{(k)} Q^{(k-1)} \dots Q^{(1)}$, we end up simply adding random noise repeatedly to the identity in the bottom right submatrix, which yields the diagonal behavior observed.