I am watching Wolfgang Bangerth's nice videos on the finite element method, as well as reading Langtangen's book on the same subject. But I am having a little trouble understanding what the actual $\phi_j$ basis function looks like?
So in the finite element setup, you have a function $u(x)$ that you want to approximate as $u_h(x)$ with a finite set of basis functions. Since the basis functions are local, you will have a span of basis functions that cover the full domain, but which have little overlap. The approximation condition looks like
$$ u_h(x) = \sum_{j=1}^{N} c_j\phi_j(x) $$
But I am not clear on what the $\phi_j(x)$ looks like. I imagine it is a vector, but what are the elements of that vector? Key question: How many elements are in that vector, and how is each element computed? Is this a Lagrange polynomial where each element corresponds to a function of the interpolation points or knots--meaning $h_i$.
I did look through the book for an explicit definition of the basis vector, but it is kinda hard to find through all of the notation. I would really appreciate it if someone could clarify.
$\phi_j$ is an actual bona fide function of the continuous variable $x$. This is a property of finite elements that is not shared by finite differences, which only inherently provide you with function values at grid points and require a separate interpolation method to extend them to the entire domain.
Strictly speaking, these $\phi_j$ can be whatever linearly independent set of functions* you like, as long as the integrals needed for the problem can be computed when you replace the solution candidate with $\phi_j$. But usually what they look like is a piecewise polynomial supported on a small number of cells of the mesh, which moves around the mesh as you vary the index. This allows for the integrals in the method to be relatively cheap to compute, and allows for the system for the $c_j$ to be relatively sparse.
For example in 1D with a uniform grid given by $x_j=hj,j=0,1,\dots,n$ and homogeneous Dirichlet boundary conditions, a commonly used system of Lagrange elements would look like $\phi_j(x)=\max \left \{ 1-\frac{|x-x_j|}{h},0 \right \}$ for $j=1,2,\dots,n-1$, $\phi_0(x)=\begin{cases} \frac{x}{h} & x<x_1 \\ 0 & \text{otherwise} \end{cases}$ and $\phi_n(x)=\begin{cases} \frac{x_n-x}{h} & x>x_{n-1} \\ 0 & \text{otherwise} \end{cases}$. These elements generate all continuous piecewise linear functions whose slope changes only at the nodes and which vanish at the endpoints. It's worth plotting these as well as some linear combinations of them, to help you get a feel for them.
* Technicality: if there are so called "essential" boundary conditions, which in most problems is the same as Dirichlet boundary conditions, then the $\phi_j$ will need to satisfy those as well.