The PDE is $$\partial_t u=\partial_x (a(x)\partial_x u)\\ u(0)=0=u(1)$$
Using the $\theta-method$ I get the following discretization:
$$U_j^{n+1}-U_j^n=\theta\mu\{a_{j+1/2}(U_{j+1}^{n+1}-U_j^{n+1})-a_{j-1/2}(U_j^{n+1}-U_{j-1}^{n+1})\}+(1-\theta)\mu\{a_{j+1/2}(U_{j+1}^n-U_j^n)-a_{j-1/2}(U_j^n-U_{j-1}^n)\}$$
$$\implies U_j^{n+1}-\theta\mu\{a_{j+1/2}(U_{j+1}^{n+1}-U_j^{n+1})-a_{j-1/2}(U_j^{n+1}-U_{j+1}^{n+1})\}\\ =U_j^n+(1-\theta)\mu\{a_{j+1/2}(U_{j+1}^n-U_j^n)-a_{j-1/2}(U_j^n-U_{j-1}^n) \}$$
Which implies that we can only begin with $j=1$, since if we begin with $j=0$ then we would meet with $U_{-1}$, which is undefined. The first row of the resulting matrix is for some reason supposed to be $\{1, 0, \dots, 0\}$. But I don't understand why this should be the case, because we get the following from the above expression in the vector $U^n$ (in the expression $AU^{n}=U^{n+1}$):
\begin{bmatrix} U_1^n\\ U_2^n-U_1^{n}\\ U_1^n-U_0^n\\ \vdots\\ \vdots\\ \end{bmatrix}
Would someone please clarify this for me?
I assume that the spatial grid points are indexed as $j=0,1,\dots,m$. You are correct that you only use the equation that you wrote down to $j=1$ and up to $j=m-1$. At the spatial position $j=1$, the spatial differencing asks you for $U_0$, which you can substitute in using the Dirichlet BC, implementing that equation separately. Similarly, at the spatial position $j=m-1$, the spatial differencing asks you for $U_m$, which you substitute in using the other Dirichlet BC.
The way your reference and/or instructor is suggesting is a different way to proceed. This amounts to leaving $U_0$ and $U_m$ as "trivial variables", giving them an equation that just reads $U_0=0,U_m=0$ (where the $0$s are coming from the Dirichlet BC). You can do this by making the zeroth and $m$th rows be $\begin{bmatrix} 1 & 0 & \cdots & 0 \end{bmatrix}$ and $\begin{bmatrix} 0 & 0 & \cdots & 0 & 1 \end{bmatrix}$ respectively, with the corresponding entries of the right side vector coming from the Dirichlet BC (so in your case they will both be zero). This approach has the advantage that the middle $m-1$ equations will all have the same structure.
Let us look at the BTCS method from a somewhat higher level. First we have:
$$\frac{\partial u}{\partial t}=\frac{\partial^2 u}{\partial x^2},u(0)=0,u(1)=0.$$
We discretize the right side, obtaining:
$$\frac{dU_j}{dt}=(AU)_j \quad j=1,\dots,m-1 \\ U_0=0 \\ U_m=0$$
where $A$ is a $(m+1) \times (m+1)$ matrix, and rows $1,2,\dots,m-1$ of $A$ implement centered differencing, i.e. they have diagonal entries of $-2m^2$ and off-diagonal entries of $m^2$. The other two rows of $A$ are irrelevant (and, in computer implementation, can be replaced by equations defining the BCs).
This is now an $(m+1)$-dimensional ODE, but we want a difference equation. That means we need to choose a temporal discretization. In BT methods, we do that this way:
$$\frac{U^{n+1}-U^n}{\Delta t}=(AU^{n+1})_j \quad j=1,2,\dots,m-1 \\ U^n_0=0 \\ U^n_m=0.$$
This is a "BT" method because we take our spatial differencing at time $n+1$. Such a method is necessarily implicit because we need to use spatial differences in $U^{n+1}$ in order to compute $U^{n+1}$ in the first place. Accordingly, the problem gets rearranged into a matrix equation:
$$(A-\Delta t^{-1} I)U^{n+1}=-\Delta t^{-1} U^n$$
or equivalently
$$(I-\Delta t A)U^{n+1}=U^n.$$
Again this is only to be evaluated for $j=1,\dots,m-1$. For $j=0,j=m$, we just have $U^{n+1}_0=0,U^{n+1}_m=0$ from the BC.