Where in my work is my math falling apart when solving for the partial derivative of Residual sum of squares (Linear Algebra)

Question

Where in my work is my math falling apart when solving for the partial derivative of Residual sum of squares (Linear Algebra)

97 Views Asked by Bumbble Comm At 26 Mar 2026 - 12:37

I am given the following equation: $$RSS(B, \alpha) = \sum_{i=1}^{N} (y_{i} - B^{T}x_{i} - \alpha)^{2} $$

My steps are as follows. I provided the images to my work at the bottom and this is just a description of my thought process.

I define $\lambda_{i} = y_{i} - B^{T}x_{i} - \alpha$
I then notice that $\lambda^{T}\lambda = \lambda_{1}^{2} + \lambda_{2}^{2} + \ldots + \lambda_{n}^{2}$
Since I defined earlier that $\lambda_{i} = y_{i} - B^{T}x_{i} - \alpha$, I can substitute that back in and I am now able to represent the original summation.
I then look at the vector $\lambda$ and I notice that each individual element is: $$\begin{bmatrix}y_{1} - B^{T}x_{2} - \alpha \\ y_{2} - B^{T}x_{2} - \alpha \\ \vdots \\ y_{n} - B^{T}x_{n} - \alpha\end{bmatrix}$$ and that can be broken down into 3 different column vectors $Y, B^{T}X,\overline{\alpha}$
So $\lambda = Y - B^{T}X - \overline{\alpha}$ and I get that: $$RSS(B, \alpha) = (Y - B^{T}X - \overline{\alpha})^{T}(Y - B^{T}X - \overline{\alpha})$$
After distributing the transpose and multiplying everything out, once I take the partial derivative with respect to $B$ and $\alpha$ I get the following two equations.

$$\frac{\partial}{\partial B} = X^{T}Y - X^{T}XB - X^{T}\overline{\alpha}$$ $$\frac{\partial}{\partial \alpha} = Y - B^{T}X - \overline{\alpha}$$

but after solving for $B$ and $\alpha$ I get that $0 = 0$ and I am lost on how to continue from there. I know there is something wrong with my math but I am having trouble identifying it. I never learned how to take derivatives of matrices in school so I am basing all my knowledge on http://www.gatsby.ucl.ac.uk/teaching/courses/sntn/sntn-2017/resources/Matrix_derivatives_cribsheet.pdf

work part 1

work part 2

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

$\def\p{\partial}$ Let's use a naming convention where an uppercase Latin letter denotes a matrix, lowercase Latin a column vector, and lowercase Greek a scalar.

Then define the variables $$\eqalign{ J &= {\tt11}^T &\quad\big({\rm All\,Ones\,Matrix}\big)\\ C &= I-\tfrac 1nJ &\quad\big({\rm Centering\,Matrix}\big) \\ M &= X^+C \\ X &= \big[\,x_1\;x_2\;\ldots\;x_n\big]^T \\ b &= B \\ w &= Xb + \alpha{\tt1} - y &\quad\big({\rm Residual\,Vector}\big) \\ }$$ where $X^+$ is the Moore-Penrose inverse of $X$.

Write the RSS function in terms of these new variables and calculate its differential. $$\eqalign{ \rho &= w^Tw \\ d\rho &= 2w^Tdw \\ &= 2w^T(X\,db + {\tt1}\,d\alpha) \\ }$$ Holding $b$ constant (so that $db=0$) yields the gradient with respect to $\,\alpha$. $$\eqalign{ d\rho &= 2w^T{\tt1}\,d\alpha \\ \frac{\p \rho}{\p \alpha} &= 2(w^T{\tt1}) = 2({\tt1}^Tw) \\ &= 2\left({\tt1}^TXb +n\alpha -{\tt1}^Ty\right) \\ }$$ Set this gradient to zero and solve for the optimal $\alpha$. $$\eqalign{ \alpha &= \tfrac 1n\,{\tt1}^T\big(y-Xb\big) \\ \alpha{\tt1} &= \tfrac 1n\,J\big(y-Xb\big) \;=\; (I-C)\,(y-Xb) \\ y-\alpha{\tt1} &= Cy + (I-C)Xb \\ }$$ Similarly, holding $\alpha$ constant yields the gradient with respect to $b$. $$\eqalign{ d\rho &= 2w^TX\,db = 2(X^Tw)^Tdb \\ \frac{\p \rho}{\p b} &= 2X^Tw = 2\big(X^TXb +\alpha X^T{\tt1} -X^Ty\big) \\ }$$ Set the gradient to zero and solve for the optimal $b$. $$\eqalign{ &X^TXb = X^T(y-\alpha{\tt1}) \\ &b = X^+(y-\alpha{\tt1}) \\ &b = X^+\Big(Cy - (C-I)Xb\Big) \\ &\Big(I+X^+(C-I)X\Big)b = X^+Cy \\ &\Big(X^+CX\Big)b = X^+Cy \\ &b = \big(X^+CX\big)^+X^+Cy \\ }$$ The following parameter combinations will be very useful. $$\eqalign{ Xb &= \big(X^+C\big)^+\big(X^+C\big)y &\doteq\; M^+My \\ \alpha{\tt1} &= (I-C)\,(y-Xb) &\doteq (I-C)(I-M^+M)y \\ }$$ Substituting the optimal parameter values yields $$\eqalign{ w &= Xb +\color{red}{\alpha{\tt1}} -y \\ &= M^+My +\color{red}{(I-C)y +(C-I)M^+My} -y \\ &= C(M^+M-I)\,y \\ \rho &= w^Tw \\ &= y^T(M^+M-I)^TC^TC(M^+M-I)\,y \\ &= y^T\left(I-M^+M\right)C\left(I-M^+M\right)y \\ \\ }$$

So that's how you would solve the problem if you treat $\alpha$ as a separate variable. But what most people do instead is use augmented variables by prepending ${\tt1}$ to each $x_k$ vector and prepending $\alpha$ to the $b$ vector.

Then the algebra becomes much simpler, i.e.
$$\eqalign{ X &= \big[\,\hat x_1\;\hat x_2\;\ldots\;\hat x_n\big]^T,\qquad \hat x_k = \pmatrix{{\tt1}\\x_k},\qquad b = \pmatrix{\alpha\\B} \\ w &= Xb - y \\ \rho &= w^Tw \\ d\rho &= 2(X^Tw)^Tdb \\ \frac{\p \rho}{\p b} &= 2X^T(Xb-y) = 0 \\ b &= X^+y \\ w &= (XX^+-I)y \\ \rho &= y^T(I-XX^+)^T(I-XX^+)y \\ &= y^T(I-XX^+)y \\ }$$

Where in my work is my math falling apart when solving for the partial derivative of Residual sum of squares (Linear Algebra)

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRIX-CALCULUS

Related Questions in MULTIVALUED-FUNCTIONS

Trending Questions

Popular # Hahtags

Popular Questions