I am having trouble understanding the relationship between the definition of the adjoint of a linear operator $\mathcal{A}$ $($given a non-degenerate bilinear form $\langle \ , \ \rangle)$ and the matrices representing $\mathcal{A}$ and $\langle \ , \ \rangle.$
Let $\mathcal{A} : \mathbb{R}^N \to \mathbb{R}^N, \ \langle \ , \ \rangle : \mathbb{R}^N \times \mathbb{R}^N \to \mathbb{R}$ be represented by matrices $A$ and $B$, both given by a fixed basis for $\mathbb{R}^N$. The definition of the adjoint of $\mathcal{A}$ is the unique linear operator $\mathcal{A}^*$ such that $\langle \mathcal{A}u, v \rangle = \langle u, \mathcal{A}^* v \rangle$ for every $u,v \in \mathbb{R}^N$.
On the other hand, I know that $A^* = A^T$, the transpose matrix and that we can represent everything as matrix multiplication, i.e.,
$$\langle u,v \rangle = u^TBv $$
for every $u,v \in \mathbb{R}^N$. Shouldn't this tell us that we have
$$u^T A^T B v = (Au)^TBv = \langle \mathcal{A}u,v \rangle = \langle u, \mathcal{A^*}v \rangle = u^T BA^T v, $$
for every $u,v \in \mathbb{R}^N$? This seems like this implies that $A^TB = BA^T$ for every matrix $A \in M_N(\mathbb{R})$, which seems like nonsense. Am I missing something or assuming something wrong somewhere?
I think something crucial here is the distinction between the pre-existence of an inner product on our space. How can you speak about choosing an 'orthornormal basis' if there is no notion of an inner product?
In this answer, we will therefore consider $\langle \cdot ,\cdot \rangle$ to be an inner product on some $N$-dimensional space $V$ and $\mathcal{B}: V\times V\longrightarrow \mathbb{R}$ to be some non-degenerate bilinear form on $V$. Similarly, we will let $\mathcal{A}:V\longrightarrow V$ be a linear map on $V$.
Choose some basis $\beta=\{\mu_1,\dots,\mu_n\}$ of $V$ and let $B \in \mathbb{R}^{N\times N}$ be given by $B_{ij}=\mathcal{B}(\mu_i,\mu_j)$. Then for any $u,v\in V$ it holds that $\mathcal{B}(u,v)=\mathbf{u}^TB\,\mathbf{v}$, where $\mathbf{u},\mathbf{v}$ are the representations of $u,v$ in base $\beta$. We will heretofore refer to them as simply $u,v$.
We can similarly obtain a mqtrix representation $A$ of $\mathcal{A}$ in base $\beta$. Assume for the moment that $\mathcal{A}^*$ is uniquely defined with respect to $\mathcal{B}$ $($rather than $\langle\cdot,\cdot\rangle)$. We hence get that
$$u^T A^T B v = (Au)^TBv = \langle \mathcal{A}u,v \rangle = \langle u, \mathcal{A^*}v \rangle = u^T BA^* v, \tag{1}$$
where $A^*$ is the matrix representation of $\mathcal{A}^*$ in base $\beta$. This implies that $u^T\left(A^TB-BA^*\right)v=0$ for all $u$ and all $v$. In other words, it yields
$$A^TB=BA^*.\tag{2}$$
We can try to concretely find out $A^*$ as follows. The $i$-th column of $A^*$ is simply $A^*\mu_i$. If $\beta$ is orthonormal, then we also have that, in base $\beta$,
$$A^*\mu_i=(\langle A^*\mu_i,\mu_1\rangle, \langle A^*\mu_i,\mu_2\rangle,\dots, \langle A^*\mu_i,\mu_N\rangle)$$
By $(1)$, it follows that
$$A^*\mu_i=({\mu_1}^TA^TB\mu_i,\,{\mu_2}^TA^TB\mu_i,\,\dots,\,{\mu_N}^TA^TB\mu_i)$$
Now, $B\mu_i$ is simply the $i$-th column of $B$. In words, the previous line says that:
It follows that $A^*=A^T$ if and only if $A^TB\mu_i=A^T\mu_i$ for all $i$, that is, if and only if
$$A^T(B-I)\mu_i=0$$
for all $i$. In other words, if and only if $\text{Im}(B-I)\subset \ker A^T=\text{Im}(A)^\perp$. Notice that the $\perp$ here refers to our pre-existing inner product.
A trivial consequence of our last observations is that when $B=I$ -- that is, when $\mathcal{B}$ is our pre-existing inner product --, then $A^*=A^T$. Observe that in this case, $(2)$ does hold, as it must.