I somewhat understand what l1 regularization is, however, the mathematical formula and how to use it are confusing me. I'm not really sure what a regularization term is and how I could apply it to a data set. All the explanations I have found online are very mathematically dense and I was hoping someone here could provide an intuitive explanation/example. I have only taken intro level Linear Algebra and Statistics courses.
EDIT: To clarify, what I'm asking is if there is someone that could explain the problem:
Minimize $\|Ax - b\|_2 + \gamma\|x\|_1$.
I understand that the first term is the residual and that the second term is the regularization term. However, I'm not sure what the second term does or how to use it. For example, if I had a signal and wanted to reduce the noise, how would I go about selecting a $\gamma$ value? I know this is really vague, I just can't find any examples online.
Putting constraint on a norm of $x$ encourages simpler solutions and relaxing it allows for more complicated solutions. Which norm you choose affect in which sense this simplicity or complicatedness will be. $l_1$ is a popular norm as good numerical methods optimize wrt it has popped up the last 10-15 years and it achieves a sparser $x$ vector (larger percent of all values = 0) than the $l_2$ norm. This is a desirable property in many applications as we can then automatically tell which parameters matter at all and which don't. The ones which don't, well we can just remove them and save space ( and complexity ).
In short: Not using any regularization could lead to getting overly complicated and overfitted solutions which require lots of space and are more difficult to understand. Using regularization we get smoother solutions, avoid overfitting and get a simpler representation of the solution as we force the model to become "simpler" in some sense.