Given a response $Y\in\mathbb{R}^n$ and design matrix $X\in\mathbb{R}^{n\times p}$ consider the regression estimator $$\hat{\beta}=\text{arg min}_{\beta\in\mathbb{R}^p}\frac{1}{2n}\|Y-X\beta\|_2^2+\lambda\|\beta\|_1+\frac{\gamma}{2}\|\beta\|_2^2$$ where $\lambda,\gamma>0$. Explain why the minimising $\hat{\beta}$, which you may assume exists, is unique. In the case where $X$ has two duplicate columns, argue that the corresponding coefficient estimates will be equal.
I know that strictly convex functions have unique minimisers, but (I think) $\|\cdot\|_1$ is convex but not strictly, and so without necessarily a unique minimiser. I don't see why the potential duplicate columns of $X$ affect the $\|\cdot\|_1$ part of the objective, and so I'm not sure how to argue that the minimiser is unique. I'm asking here also because it would be good to improve my general understanding of minimisers of non-strictly convex functions. Thanks!
The sum of a convex function and a strictly convex function is a strictly convex function is a strictly convex function. Indeed $\|\cdot \|_1$ is not strictly convex, but it doesn't matter because your are minimising the sum, which is strictly convex. Therefore $\hat\beta$ is unique.
Let $X = \big(\hspace{-.1cm}\begin{array}{c|c|c} X_1 & \ldots &X_p\end{array}\hspace{-.1cm}\big)$ be the columns of $X$. Then : $$X\beta =\sum_{i=1}^p \beta_i X_i$$
If $X_i = X_j$, then $\frac{1}{2n}\|Y-X\beta\|_2^2$ only depends on $\beta_i + \beta_j$.
On the other hand $\lambda\|\beta\|_1 + \gamma\|\beta\|_2^2$ depends on $\lambda (|\beta_i| +|\beta_j|) + \gamma(\beta_i^2 + \beta_j^2)$, which is minimal for $\beta_i = \beta_j$.
Edit :
On that last claim, without changing the sum $\beta_i + \beta_j$, we can replace $(\beta_i,\beta_j)$ by $(\beta_i + t, \beta_j -t)$. If $\beta$ is solution of the optimization problem, then the function : $$f(t) = \lambda (|\beta_i+t|+|\beta_j-t|) + \gamma((\beta_i+t)^2 + (\beta_j-t)^2)$$
is minimal at $t=0$.
The part with the absolute value is a continuous, piecewise affine function, with slope $0$ on the interval $I = (\min(\beta_j,-\beta_i),\max(\beta_j,-\beta_i)$, slope $-2\lambda$ for $t<\min(\beta_j,-\beta_i)$ and slope $2\lambda$ for $t>\max(\beta_j,-\beta_i)$.
The part with the squares is minimal at $t_* = (\beta_j-\beta_i)/2$. Since $t_* \in I$, it is the global (strict) minimum of $f$ and must therefore be equal to $0$. Therefore $\beta_i = \beta_j$.