I cannot understand how if a Lagrange multiplier is a scalar (meaning it is a constant value) that you can take a partial derivative of a Lagrangian function with respect to a constant (the Lagrange multiplier) when you need to find the gradient of a Lagrangian function.
If the derivative of a constant is zero, then how/why is it even possible to take the derivative of something with respect to a constant/Lagrange multiplier?
It's common to call something a "constant" in informal descriptions of a math problem, but it's important to always keep in mind: constant with respect to what?
Let's look at a concrete example: linear-constrained least squares,
$$\min_{\mathbf{x}}\ \sum_{i=1}^{n} (x_i - y_i)^2 \quad \mathrm{s.t.} \quad \sum_{i=1}^n x_i = b.$$ Here the vector $\mathbf{x}\in \mathbb{R}^n$ is the independent variable of the optimization problem, and $y_i$ and $b$ are constants with respect to $\mathbf{x}$.
We can reformulate this problem using the method of Lagrange multipliers: $$\underset{\mathbf{x},\lambda}{\operatorname{ext}} \ \sum_{i=1}^{n} (x_i - y_i)^2 - \lambda\left(\sum_{i=1}^n x_i - b\right).$$ In this new problem:
Notice that there are three different notions of being "a constant" here and whether or not $\lambda$ is a constant depends on which you mean.