I have understood that LDA can be used both for transforming the data (e.g. dimensional reduction) and for classification. But I would like to understand the link between these two operations.
Let W and B the intra-class and inter-class covariance matrices, respectively. I read that linear transformations are computed by eigendecomposition of $W^{-1}B$ in order to maximise $$\arg\max_{a \in \mathbb{R}^p} \frac{a^T B a}{a^T W a}$$ Then I read that the decision rule for an observation $x$ can be $$\arg\min_k (x-\bar{X}_k)W^T(x-\bar{X}_k)$$ But what is the link between these two optimizations ? Is the transformed space in LDA linked to the classification ? I suppose it is, and for exemple the sklearn class enables both transformation and prediction after training. But I don't manage to understand it clearly.
In the first case (projecting data), the transformed data is expressed by $z = \mathbf{w}_\mathrm{LDA}^T \mathbf{x}$.
In the second case (binary classification), the decision rule comes from the ratio $$ h(\mathbf{x}) = \log \frac{P(y=1|\mathbf{x})}{P(y=2|\mathbf{x})} = \log \frac{P(y=1)}{P(y=2)} + \log \frac{P(\mathbf{x}|y=1)}{P(\mathbf{x}|y=2)} $$ where $P(\mathbf{x}|y=c) = \mathcal{N} \left(\mathbf{x};\mathbf{m}_c,\mathbf{\Sigma} \right) $ If we assume equiprobable class, the decision rule writes $$ h(\mathbf{x}) = +\frac12 \left( \mathbf{x} - \mathbf{m}_2 \right)^T \mathbf{\Sigma}^{-1} \left( \mathbf{x} - \mathbf{m}_2 \right) -\frac12 \left( \mathbf{x} - \mathbf{m}_1 \right)^T \mathbf{\Sigma}^{-1} \left( \mathbf{x} - \mathbf{m}_1 \right) $$ and this justifies your statement about the decision rule for an observation $\mathbf{x}$. Note that your statement will be wrong in the general case.
This expression can be further simplified as $$ h(\mathbf{x}) = b + \mathbf{w}_\mathrm{LDA}^T \mathbf{x} $$ Thus LDA classification adds the information of a threshold in the transformed space to decide which class to attribute to an observation.