So to provide some context, we have been given assignment which goes far beyond my current set of abilities in mathematics to date, hence I'm struggling to find even an appropriate starting point to tackle some of these questions. One of which is the following:
If $ A\in \mathbb{R}^{m\times n} : m>n $ and $ B \in \mathbb{R}^{k\times n} $ where $A$ has full column rank, what would a suitable expression for the constraint minimizer $p \in \mathbb{R}^{n}$ be and how would the Langrange multiplier $ \lambda $ of minimise $\mid\mid b-Ap\mid\mid^2$ such that $c^TBp=a$, where $b \in \mathbb{R}^{m} $ and $c \in \mathbb{R}^{k}$ be found?
Whilst a challenging assignment, I don't intend on giving up on it. I've a fairly strong knowledge on Linear Algebra 2. I'm of course aware of the elementary concepts of multi-variate calculus. However, due to poor instructions and lack of references to relevant resources to tackle this problem. Any references to resources and an icebreaker to starting the question would really help me here and would be greatly appreciated!
I also apologies for the vagueness of this post at an earlier stage, as I'm new to the exchange :)
First, I assume by $Ap$ you mean matrix multiplication, not dot product. I assume you meant $p \in \mathbb{R}^{n\times m}$ and $b \in \mathbb{R}^{m \times m}$ , otherwise the dimensions don't agree. Also, I assume $a \in \mathbb{R} $ is a constant, otherwise the constraint doesn't make sense.
I assume since you have this assignment, you're familiar with the Lagrange multiplier theorem. Let $\alpha, \beta \geq 1 $ be integers (I'm using them to not confuse them with the $n$ and $m$ in your problem statement). In the hypothesis, you must have $D \subset \mathbb{R}^{\alpha} \times \mathbb{R}^{\beta}$ an open set and a function $f: D \rightarrow \mathbb{R} $ and the function $g=(g_1, \ldots, g_{\beta}) : D \rightarrow \mathbb{R}^{\beta} $ ($g_1, \ldots, g_{\beta}$ are the components of g). You will be minimizing $f$ on the subset of $D$ where $g=0_{\mathbb{R}^{\beta}}$, or, if you'd like, on $D' := D \cap g_1^{-1}(0) \cap \ldots \cap g_{\beta}^{-1}(0)$. So, what do our $f$ and $g$ need to be?
Well, since you want to minimise $\left\lVert b- Ap \right\rVert$, $f$ has to be $f: \mathbb{R}^{n \times m} \rightarrow \mathbb{R}$, $f(p)=\left\lVert b- Ap \right\rVert^2$. What about $g$? Well, the codomain of our constraint (which takes $p$ as argument) is $\mathbb{R}$, because $c^T B p \in \mathbb{R}$. So $g=(g_1): \mathbb{R}^{n \times m} \rightarrow \mathbb{R}$, $g_1(p)=c^T B p - a$. Now you can see that the constraint is equivalent to $g_1(p)=0$. So the $\alpha$ and $\beta$ in the theorem statement above are here $\alpha = n \times m - 1 $ and $\beta = 1$. Also, $D$ is usually given, so because no $D$ was specified, you can assume $D$ to be $\mathbb{R}^{n \times m}$
Now, the theorem says that if $x \in D$ is a conditioned minimum point of $f|_{D'}$ (that means that if you take any vecinity of $x$ and intersect it with $D'$, then $x$ is a minimum of $f$ on that intersection) and if $rank (J_g(x)) = \textrm{maximum} = 1 $ in our case, then there exists a $\lambda_1, \ldots, \lambda_{\beta}$ (so in our case only one $\lambda$), such that, if $L(p)=f(p)+\lambda g(p), \forall p \in D$, then $\frac{\partial L}{\partial x_i}(x) = 0, \forall i \in \left\lbrace 1, \ldots, \alpha+\beta = n \cdot m \right\rbrace $.
You would use this theorem in reverse, saying that any minimum points of $f|_{D'}$ must be among the points $x$ that satisfy $\frac{\partial L}{\partial x_i}(x) = 0, \forall i \in \left\lbrace 1, \ldots, n\cdot m \right\rbrace $ for some $\lambda$, which varies with the point and which you'd find together with the point b. So what you have to do now is compute the derivatives of $L$, so the derivatives of $f$ and $g$.
Notice you need to compute $n \cdot m$ derivatives in $\mathbb{R}^{n \cdot m}$ - you are basically seeing $p$ not as a matrix but as a vector in $\mathbb{R}^{n \cdot m}$ when doing this. It will help if, for $k=i\cdot j \leq n\cdot m$, we write $\frac{\partial f}{\partial x_k}$ as $\frac{\partial f}{\partial x_{ij}}$.
Let's compute the derivatives for $f$, for $g$ it is similar but easier. It helps for clarity if we define the function $h:\mathbb{R}^{m^2} \rightarrow \mathbb{R}$ as simply $h(x)=\left\lVert x \right\rVert$ and write the coordinates on $\mathbb{R}^{m^2}$ as $y_{1,1}, \ldots, y_{m,m}$. Then you have by the chain rule $\displaystyle \frac{\partial f}{\partial x_{ij}}(x)=2\left\lVert b- Ax \right\rVert \cdot \sum_{k,s=1}^m \left[ \frac{\partial h(b-Ax)}{y_{k,s}} \cdot \frac{\partial (b-Ax)_{k,s}}{\partial x_{ij}}(x) \right]$. I assume you know how to take the partial derivatives of $h$.
Obviously, $\frac{\partial (b-Ax)_{k,s}}{\partial x_{ij}}(x) = -\frac{\partial (Ax)_{k,s}}{\partial x_{ij}}(x)$. For $\frac{\partial (Ax)_{k,s}}{\partial x_{ij}} (x)$ think where $x_{ij}$ appears in the matrix multiplication: $(Ax)_{kl}=\sum_s A_{ks} \cdot x_{sl}$ so $x_{ij}$ only appears on the $j$-th column of the result and the derivatives with respect to $x_{ij}$ on that column will be $A_{1,i}, \ldots, A_{m,i}$. Hence $\frac{\partial (Ax)_{k,s}}{\partial x_{ij}} (x)$ is the constant $\delta_{s,j} \cdot A_{k,i}$ ($\delta$ is the kronecker delta). Hence the derivative simplifies to
$$ \frac{\partial f}{\partial x_{ij}}(x)= -2\left\lVert b- Ax \right\rVert \cdot \sum_{k=1}^m \left[ \frac{(b-Ax)_{k,j}}{\left\lVert b-Ax \right\rVert} \cdot A_{k,i} \right] $$
You should be able to fill in the details from here. I hope that this helps and that I haven't made wrong assumptions in the beginning (although I don't see how to otherwise interpret your problem conditions).