I have started learning linear regression and the equation $h(X) = \Theta T X$ has puzzled me.
Let's say we have a training set of $m$ and $n$ features such that $X$ is a $m \times n$ matrix. $\Theta$ is a $n \times 1$ matrix, so $\Theta T$ is a $1 \times n$ matrix.
How can we multiply a $1 \times n$ matrix and a $m \times n$ one?
I hope I explained the question clearly. Although this is a very basic question, I am confused. Any explanation will be appreciated.
This question was with respect to linear regression in machine learning class. One of the mentors from my class (Tom Mosher) answered this:
When X is the whole matrix of training examples, then h = X * theta.
When x is a single training example, then h = theta' * x.
Note the use of upper and lower-case letters for x and X.
Thanks @martini and @MPW for your time.