I am trying to solve the linear programming problem
\begin{align*} (\text{P}):\,\text{Minimize }f({\bf x},{\bf y})&={\bf a}\cdot{\bf x}+{\bf b}\cdot{\bf y},\\ \text{subject to }A{\bf x}+B{\bf y}&={\bf c},\\ y_i\geq0&\forall1\leq i\leq n, \end{align*}
where ${\bf a}$ and ${\bf x}$ are $m$-dimensional vectors, ${\bf b}$ and ${\bf y}$ are $n$-dimensional vectors, and $c$ is a $k$-dimensional vector. I am considering the Lagrangian
$$\mathcal{L}={\bf a}\cdot{\bf x}+{\bf b}\cdot{\bf y}-\bf{\lambda}\cdot(A{\bf x}+B{\bf y}-{\bf c})-\bf{\mu}\cdot{\bf y}$$
and want to therefore show that the optimisation $\max_{{\bf{\lambda}},{\bf{\mu}}}\min_{{\bf x},{\bf y}}\mathcal{L}$ is equivalent to
\begin{align*} (\text{P}):\,\text{Maximise } g(\bf{\lambda})&={\bf c}\cdot\bf{\lambda},\\ \text{subject to }A^\textrm{T}\bf{\lambda}&={\bf a},\\ B^\textrm{T}\bf{\lambda}&={\bf b}. \end{align*}
I am very new to convex optimisation, so the optimisation knowledge I have is quite limited. Firstly, to solve for $\min_{{\bf x},{\bf y}}\mathcal{L}$, how would one write it formally? The hand-wavy physicist is tempted to write
$${\bf a}^\mathrm{T}{\bf x}+{\bf b}^\mathrm{T}{\bf y}-\bf{\lambda} ^\mathrm{T}(A{\bf x}+B{\bf y}-{\bf c})-\bf{\mu} ^\mathrm{T}{\bf y},$$
and "differentiate" with respect to ${\bf x}$ and ${\bf y}$ to get
$${\bf a}^\mathrm{T}-\bf{\lambda}^\mathrm{T}A={\bf 0}\,\textrm{and}\,{\bf b}^\mathrm{T}-\bf{\lambda}^\mathrm{T}B-\bf{\mu}^\mathrm{T}={\bf 0}.$$
I know this has absolutely zero rigour, so what is the formal way to write this when dealing with all the vectors and matrices, especially with the inequality? Furthermore for the second part, how exactly do I go about it? I have no idea at all. All help is appreciated!
The primal problem is $\min_{x,y} \max_{\lambda \in \mathbb{R}^k, \mu \geq 0} L(x,y,\lambda,\mu)$ (note that the value of $\max_{\mu \geq 0} L(x,y,\lambda,\mu)$ is $\infty$ when $y_i<0$ for some $y$, so the feasible region of the primal is $y \geq 0$; similarly you can get the equality constraint from $\max_{\lambda}$).
You can derive the Lagrange dual problem by swapping min and max and simplyfing $\min_{x,y} L(x,y,\lambda,\mu)$. Taking the derivative is a valid approach but it does not seem like you fully understand why. Since $L$ is linear in $x$, if the derivative is not $0$, $\min_{x,y} L(x,y,\lambda) = -\infty$. So: $$\max_{\lambda \in \mathbb{R}^k, \mu \geq 0}\min_{x,y} L(x,y,\lambda) = \max_{\lambda \in \mathbb{R}^k, \mu \geq 0} \begin{cases}c^T\lambda & \text{if } A^T\lambda=a, \; B^T\lambda +\mu = b \\ -\infty & \text{otherwise.} \end{cases}$$ It is not clear why $\mu$ is missing from your dual. You can also write $B^T\lambda \leq b$.
What is not part of your question but what I think you still want is an argument why strong duality holds, i.e., why the dual problem has the same optimal value as the primal problem. You could use Slater's condition.