I am reading a book which takes the following matrix representation of linear operators as common knowledge. I am hoping someone can help me understand why it works, and why can we reduce it to row echelon form in the usual way?
Let $T: V \to W$ be a bounded linear operator between hilbert spaces. Then we consider the hilbert space decompositions $$V = V_0 \oplus V_1 \mbox{ where } V_0 = \ker T,V_1 =(\ker T)^\perp $$ $$W = W_0 \oplus W_1 \mbox{ where } W_1 = \mbox{im } T,W_0 =(\mbox{im } T)^\perp $$ and $T$ has the matrix represention $$ T = \begin{pmatrix} T_{00} & T_{01} \\ T_{10} & T_{11} \end{pmatrix} \mbox{ where } \begin{cases} T_{00} : V_0 \to W_0 \\ T_{10} : V_1 \to W_0 \\ T_{01} : V_0 \to W_1 \\ T_{11} : V_1 \to W_1 \\ \end{cases} $$ Furthermore, we may perform row reduction on this matrix as if the entries were ordinary numbers
This is a notational convenience. What you need to understand is how the notation works, then you will be able to parse and prove statements about the notation. I will briefly describe how to get a "matrix of operators" and how such a matrix defines a linear operator.
We take two Hilbert spaces $V,W$ as our data as well as two decompositons $V= V_1\oplus V_2\oplus ... \oplus V_n$ and $W= W_1\oplus ... \oplus W_m$ into mutually orthogonal subspaces of $V$ (resp. $W$).
You may write every element of $v\in V$ uniquely in the form $v=v_1+...+v_n$ with $v_i\in V_i$ and $v_i\perp v_j$ for $i\neq j$. For convenience you can write $v$ as a row vector: $$v=\begin{pmatrix} v_1 \\ \vdots \\ v_n\end{pmatrix}.$$ Similarly any $w\in W$ may be expanded into its $W_i$ components and also may be, for convenience, written as a row vector: $$w=\begin{pmatrix} w_1 \\ \vdots \\ w_m\end{pmatrix}.$$
Now if $T:V\to W$ is a linear operator, you have that $T(v) = T(v_1) + ... + T(v_n)$. Each $T(v_i)$ is an element of $W$ and may in turn be decomposed into the $W_j$ components, $T(v_i) = T(v_i)_1+...+T(v_i)_m$. Introduce the notation $T_{ji}(v) := T(v_i)_j$. Then:
$$T(v)= \sum_{i=1}^nT( v_i) = \sum_{i=1}^n \sum_{j=1}^m T_{ji}(v_i)= \begin{pmatrix}T_{11}(v_1)+...+T_{1n}(v_n)\\ T_{21}(v_1)+...+T_{2n}(v_n)\\ \vdots\\ T_{m1}(v_1)+...+T_{mn}(v_n)\end{pmatrix}=\begin{pmatrix}T_{11} & ... & T_{1n}\\ T_{21}&...& T_{2n}\\ \vdots & & \vdots\\ T_{m1}&...& T_{mn}\end{pmatrix}\cdot \begin{pmatrix}v_1 \\ v_2\\ \vdots\\ v_n\end{pmatrix}$$
This is how, given an operator $T:V\to W$ and orthogonal decompositions of $V$ and $W$ the matrix of $T$ wrt this decomposition is defined. The $T_{ij}$ are maps (a priori not necessarily linear) from $V_j$ to $W_i$. The maps $T_{ij}$ are infact actually linear, which can be proven via some calculation (for example if $P_i:W\to W_i$ is the orthogonal projection onto $W_i$ then $T_{ij}= P_i\circ T\lvert_{V_i}$, which is then linear as a composition of linear maps).
In the same way if you have a matrix of linear maps $T_{ij}: V_j\to W_i$ then defining for any $v\in V$ with $v=v_1+...+v_n$: $$T(v):= \sum_{ij}T_{ij}(v_j) =\begin{pmatrix}T_{11} & ... & T_{1n}\\ T_{21}&...& T_{2n}\\ \vdots & & \vdots\\ T_{m1}&...& T_{mn}\end{pmatrix}\cdot \begin{pmatrix}v_1 \\ v_2\\ \vdots\\ v_n\end{pmatrix}$$ will give you a linear map $V\to W$.
Due to the way matrix multiplication works this can result in some conceptual or notational simplification when you are interested in looking at linear operators between two Hilbert spaces that have some orthogonal decomposition. For example if you have $T: V\to W$, $U: W\to Z$ then with $V,W, Z$ admitting an orthogonal decomposition then you have in the usual way $(U\circ T)_{ij}= \sum_k U_{ik}T_{kj}$.
As a final remark much of this also works if $V$ and $W$ are decomposed into infinitely many orthogonal subspaces. But now you need to add some summability considerations into the mix.