Suppose I have the following.
$\boldsymbol{y}=\left[\begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ \cdot \\ \cdot \\ \cdot \\ y_{n} \end{array}\right], \boldsymbol{X}=\left[\begin{array}{cccccc} 1 & X_{1,1} & X_{1,2} & . & . & X_{1, p} \\ 1 & X_{2,1} & X_{2,2} & \cdot & \cdot & \cdot \\ 1 & X_{3,1} & X_{3,2} & \cdot & \cdot & \cdot \\ 1 & \cdot & \cdot & \cdot & \cdot & \cdot \\ 1 & \cdot & \cdot & \cdot & \cdot & \cdot \\ 1 & X_{n, 1} & X_{n, 2} & \cdot & \cdot & X_{n, p} \end{array}\right]$ $\boldsymbol{\beta}=\left[\begin{array}{c} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \cdot \\ \cdot \\ \dot{\beta}_{p} \end{array}\right], \boldsymbol{\epsilon}=\left[\begin{array}{c} \epsilon_{1} \\ \epsilon_{2} \\ \epsilon_{3} \\ \cdot \\ \cdot \\ \cdot \\ \epsilon_{n} \end{array}\right]$
Constructing the linear model $y=X \beta+\epsilon$
$\left[\begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ \cdot \\ \cdot \\ \dot{y}_{n} \end{array}\right]=\left[\begin{array}{c} \beta_{0}+\beta_{1} X_{1,1}+\beta_{2} X_{1,2}+\cdots+\beta_{p} X_{1, p}+\epsilon_{1} \\ \beta_{0}+\beta_{1} X_{2,1}+\beta_{2} X_{2,2}+\cdots+\beta_{p} X_{2, p}+\epsilon_{2} \\ \beta_{0}+\beta_{1} X_{3,1}+\beta_{2} X_{3,2}+\cdots+\beta_{p} X_{3, p}+\epsilon_{3} \\ \cdot \\ \cdot \\ \beta_{0}+\beta_{1} X_{n, 1}+\beta_{2} X_{n, 2}+\cdots+\beta_{p} X_{n, p}+\epsilon_{n} \end{array}\right]$
Where $\boldsymbol{e} = y-\widehat{y}$ with $\widehat{\boldsymbol{y}}=\boldsymbol{X} \widehat{\boldsymbol{\beta}}$
Now estimating $\widehat{\boldsymbol{\beta}}$ by minimizing the sum of squared residuals $SSR= \boldsymbol{e}^{T} \boldsymbol{e}=(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})^{\boldsymbol{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})$
yields: $\widehat{\boldsymbol{\beta}}=\left(\boldsymbol{X}^{T} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{T} \boldsymbol{y}$
Now suppose I have the following model and want to estimate $\widehat{\boldsymbol{\beta}}$ and $\widehat{\boldsymbol{\alpha}}$ of the following model:
$y=X \beta+ Y\alpha+\epsilon$
With
$\boldsymbol{X}=\left[\begin{array}{ccccc} Y_{1,1} & Y_{1,2} & . & . & Y_{1, q} \\ Y_{2,1} & Y_{2,2} & . & . & . \\ Y_{3,1} & Y_{3,2} & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ Y_{n, 1} & Y_{n, 2} & . & . & X_{n, q} \end{array}\right]$ and $\boldsymbol{\alpha}=\left[\begin{array}{c} \alpha_{0} \\ \alpha_{1} \\ \alpha_{2} \\ \cdot \\ \cdot \\ \dot{\alpha}_{q} \end{array}\right]$ Where $p\neq q$
Constructing the SSR yields: $S S R=\boldsymbol{e}^{T} \boldsymbol{e}=(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}}-\boldsymbol{Y} \widehat{\boldsymbol{\alpha}})^{\boldsymbol{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}}-\boldsymbol{Y} \widehat{\boldsymbol{\alpha}})^{\boldsymbol{T}}$
How do I now obtain $\widehat{\boldsymbol{\beta}}$ and $\widehat{\boldsymbol{\alpha}}$? My initial thaught is to differentiate w,r,t $\widehat{\boldsymbol{\beta}}$ and $\widehat{\boldsymbol{\alpha}}$ and equating it to 0. However, how do I then solve such a system of equations?
Following the suggestion of agryavian I construct a new matrix:
$Z= \begin{bmatrix} 1 & X & X & X & Y & Y & Y \\ 1 & X & X & X & Y & Y & Y \\ 1 & X & X & X & Y & Y & Y \\ \end{bmatrix} $ and $\boldsymbol{\theta}=\left[\begin{array}{c} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \beta_p \cdot \\ \alpha_0 \\ \alpha_1 \\ \dot{\alpha}_{q} \end{array}\right]$
Should I now perform the SSR like this?:
$ SSR= \boldsymbol{e}^{T} \boldsymbol{e}=(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\theta}})^{\boldsymbol{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\theta}})$
yields:
$\widehat{\boldsymbol{\theta}}=\left(\boldsymbol{Z}^{T} \boldsymbol{Z}\right)^{-1} \boldsymbol{Z}^{T} \boldsymbol{y}$
The second model is the same as "combining" $X$ and $Y$ together into a wide matrix and performing linear regression with respect to it, since $$X\beta + Y \alpha = \left[\begin{array}{c|c}X & Y\end{array}\right]\left[\begin{array}{c} \beta \\ \hline \alpha \end{array}\right]$$
In general, the fitted $\hat{\beta}$ in your second model will not be the same as the $\hat{\beta}$ in your first model. (I think the exception is if the column space of $X$ is orthogonal to the column space of $Y$.)