Expected loss for square error is minimized at actual probability

41 Views Asked by Bumbble Comm At 10 May 2026 - 6:42

(This might be an easily searchable question, but I might just not have the vocabulary for it.)

My question is:

Given a fixed probability distribution $p$ over $k$ classes (that is, a non-negative $k$-vector of reals which sum to 1), is the function $$ L(q) = \sum_{i=1}^k p_i(1-q_i)^2, $$ subject to the constraint that $q$ is also a probability distribution, minimized when $q = p$?

Then context is as follows: suppose we are trying to predict the class (out of $k$ possibilities), and $p$ is the true distribution of the data. Further suppose, when the true class is $i$ that a prediction $q$ is punished with loss $(1-q_i)^2$. Then $L(q)$ is the expected loss when making prediction $q$.

I would like to prove (or possibly disprove) that the best prediction you can make is $p$ itself. I thought I had a fairly easy proof of this, but there was a dumb arithmetic error in it.

If I set up a Lagrangian, using only the constraint that $\sum_i q_i = 1$, but not the constraints that $q_i \geq 0$, I obtain for all $i, j$ the condition $$ p_i(q_i-1) = p_j(q_j- 1). $$ In the case $k=2$, this reduces to $p_0q_1 = p_1q_0$, and it follows that the unique solution (subject to the constraint) is $p = q$ -- but I can't quite seem to figure it out when $k > 2$.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 18 May 2018 - 11:10

Turns out it's just false, and writing a quick search for a counterexample showed that. If we take $p = (1/5, 2/5, 2/5)$ and $q = (0, 2/5, 3/5)$ then $L(p) = 52/125$ while $L(q) = 51/125$. Thus what I tried to prove only holds when $k = 2$.

Expected loss for square error is minimized at actual probability

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in LAGRANGE-MULTIPLIER

Trending Questions

Popular # Hahtags

Popular Questions