Calculating Maximum a posteriori (MAP) for classifier

74 Views Asked by At

Given is a probability function for class y, together with a prior for weights w (samples are assumed to be i.i.d, and b is a positive scalar, Z ensures normalization):

$P(y | \textbf{x,w}) = \frac{1}{1+exp(-y \textbf{w}^T \textbf{x})}$

$P(\textbf{w};b) = \frac{1}{Z}*exp(\frac{-\sum_{i}^dw_i^4}{b})$

We have to find the MAP for $\prod_{i}^n P(y_i | \textbf{x_i,w})*P(\textbf{w};b)$

I would solve this like a Maximum Likelihood optimization problem: Take the log of the function, differentiate. The only difference to an ML problem I see is that we have a prior - or would I have to approach a MAP optimization differently? A brief sketch of my approach:

... = $\sum_{i}^n$ log$(\frac{1}{1+exp(-y_i \textbf{w}^T \textbf{x}_i)})$ + log$(\frac{1}{Z}*exp(\frac{-\sum_{i}^dw_i^4}{b})$ =

$\sum_{i}^n$ log$(1+exp(-y_i \textbf{w}^T\textbf{x}_i))$+log$(\frac{1}{Z})-\frac{1}{b}*(\sum_{i}^dw_i^4)$

Log $\frac{1}{Z}$ is a constant and can be ignored, which would mean that we can derive $\sum_{i}^n$ log$(1+exp(-y_i \textbf{w}^T\textbf{x}_i))$-$\frac{1}{b}*(\sum_{i}^dw_i^4)$ with respect to w.

Is this correct? Two additional questions regarding some notation: I find it confusing that the weight w is also indexed over i - as far as I understand, the index i for the weights just goes over the dimension of the feature space (and not like the index i of the datasamples over the number of datasamples n)..? Secondly, in $P(y | \textbf{x,w}) = \frac{1}{1+exp(-y \textbf{w}^T \textbf{x})}$, as I understand $\textbf{x}$ is a n*d matrix, containing the n datasamples of dimension d - is this correct?