First of all, I imagine that I will get many downvotes on this since the questions are probably considered "stupid." However, please understand that I'm taking a class that is way above my head, and I'm not allowed to drop it. I'm asking these questions so that I know exactly what I need to learn without wasting time.
I'm taking a class in Big Data/Data Mining and I've been given an assignment to "do" least squares regression on a data set in Matlab. The problem is that I have never learned linear algebra before. I'm spending all my spare time catching up, but I feel like I will not make the deadline unless I focus only exactly on what the assignment requires.
I took notes during the classes, but I didn't know linear algebra back then so I couldn't understand what exactly I was writing. After spending weeks learning about vectors, matrices and subspaces, I understand some of it, but it's not enough.
Therefore, I ask you to help me understand the following:
1.
If $x = [x(1), x(2), ... ,x(d)]$, then
$\bar{x} = [1, x(1), x(2), ..., x(d)] $
(these are a column vectors, but I don't know the latex code).
However, I watched a Khan Academy video on linear regression where $\bar{x}$ was described as the mean value of $x$. From what I know, 'mean value' refers to a single (scalar) value and not a vector, so that has me confused. So, what does $\bar{x}$ actually mean?
2. The dependent value (I know the difference between independent and dependent values) is defined as $y(x) = w0 + w1x(1)+w2x(2)+...+ wdx(d)$
What on earth does $w$ mean? It's defined as a column vector ranging from $w0$ to $wd$.
3. What does $w*$ mean?
4. After doing the least squares regression, I'm asked to print the value of $w$ generated by the regression. Is this also a column vector or something completely different?
5. What am I actually trying to figure out with linear regression? A formula? A value? Nothing has been said about it in the class based on the notes I've taken, and I can't find any explanation in any book either. The closest I've gotten is that the objective is to "minimize the residuals." I understand what 'residual' means, but what is the output exactly?
I mean, I have training data. I use the regression formula on that data set. What am I left with that I can use on the test data?
6. Do I need to know anything about Eigen-values or anything involving "Eigen" in order to solve this problem?
Any help is greatly appreciated. Thank you.
You are given a bunch of vectors $x_i \in \mathbb R^d$ and corresponding scalars $y_i \in \mathbb R$. Your goal is to find a vector $w \in \mathbb R^{d+1}$ such that $\bar x_i^T w \approx y_i$ for all $i$. We select $w$ to be the vector that minimizes the objective function $$ E(w) = \sum_i (\bar x_i^T w - y_i)^2. $$ But minimizing $E(w)$ is something that we learned how to do in multivariable calculus. Just set the gradient of $E$ equal to $0$, and solve for $w$.