When doing logistic regression, the derivative of the likelihood $L$ with respect to parameters $\theta$ is
$\frac{\partial L(\theta)}{\partial \theta_j} = \sum_{i=0}^n \big[y^{i} - \sigma(\theta^Tx^{i}) \big]x^i_ j$
Where $y^i \in \{0, 1\}$ is the true label for sample $i$, $\sigma$ is the sigmoid, $x^i$ the observed sample $i$ and $x^i_j$ is the j'th element in $x^i$.
A maxima is reached when the gradient is zero, thus
$\sum_{i=0}^n \big[y^{i} - \sigma(\theta^Tx^{i}) \big]x^i_j = 0$
So for every $j$ it holds that (for the sake of discussion assuming $x^i_j \geq 0$ which I believe can be done without loss of generalization)
$\implies \sum_{i=0}^n y^{i}x_j^i = \sum_{i=0}^n \sigma(\theta^Tx^{i})x_j^i $
So the weighted sum of probabilities will equal the weighted count of positive samples at the maxima. Then it makes sense that for given some arbitrary slice of adequate sample size of the training set, the fraction of positives will approximately equal the average estimated probability.
But does this also apply for individual estimates, i.e. if I get a probability of 30% for one observation can I interpret that as that observation having a 30% probability of being a positive? As far as I can see the RHS is really free to distribute the estimated probabilities arbitrarily to satisfy the maxima condition.
Edit: Was pointed out to me that the final implication is faulty. I for some bizarre reason assumed that the term in the parentheses to be greater or equal to 0 but obviously it can also take on negative values. That said the fundamental question remains. I have since updated it to also include the $x_j^i$ term in the sums after the implication.