Hinton in his neural network course on Coursera says that "Any probability distribution P over discrete states (P(x) > 0 for all x) can be represented as the output of a softmax unit for some inputs." I was trying to prove it but didn't manage. Do you know of a proof of this?
2026-03-28 13:22:52.1774704172
Softmax function and modelling probability distributions
3.6k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in PROBABILITY
- How to prove $\lim_{n \rightarrow\infty} e^{-n}\sum_{k=0}^{n}\frac{n^k}{k!} = \frac{1}{2}$?
- Is this a commonly known paradox?
- What's $P(A_1\cap A_2\cap A_3\cap A_4) $?
- Prove or disprove the following inequality
- Another application of the Central Limit Theorem
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- A random point $(a,b)$ is uniformly distributed in a unit square $K=[(u,v):0<u<1,0<v<1]$
- proving Kochen-Stone lemma...
- Solution Check. (Probability)
- Interpreting stationary distribution $P_{\infty}(X,V)$ of a random process
Related Questions in PROBABILITY-DISTRIBUTIONS
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- Statistics based on empirical distribution
- Given $U,V \sim R(0,1)$. Determine covariance between $X = UV$ and $V$
- Comparing Exponentials of different rates
- Linear transform of jointly distributed exponential random variables, how to identify domain?
- Closed form of integration
- Given $X$ Poisson, and $f_{Y}(y\mid X = x)$, find $\mathbb{E}[X\mid Y]$
- weak limit similiar to central limit theorem
- Probability question: two doors, select the correct door to win money, find expected earning
- Calculating $\text{Pr}(X_1<X_2)$
Related Questions in PATTERN-RECOGNITION
- remove uniform distributed noise with 0 mean and j variance from my data
- Is this pattern solvable?
- Does a sum require 2+ numbers, such as $sum(a, b, ...)$? Does $sum(n)$ imply $n + 0$?
- an exercise in 'Pattern Recognition and Machine Learning'
- Matchstick Patterns with limited numbers of matches eg 100, 1000
- Pattern Recognition and Machine Learning Exercise
- Patterns in solutions to simultaneous palindromes in two number bases
- Proof by induction for $\left(1−\frac14\right)\cdot\left(1−\frac19\right)\cdots\left(1−\frac1{n^2}\right)$ for all natural numbers $n$ with $n \ge 2$
- How to proceed in this proof?
- Hessian matrix computation for multi-layer neural networks (from Duda's book)
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Let us first define a few things:
Essentially, what we are trying to do is determine weight vectors $\{w_k\}_{k=1,\ldots,n}$ and one input vector $x$ such that we have: \begin{equation} g_k(x)=p_k,\quad \forall k\in\{1,\ldots,n\} \end{equation}
Take the logarithm of the previous equality to obtain: \begin{equation} \ln p_k=w_k^T x-\ln z,\quad \forall k=1,\ldots,n \end{equation} where $z=\sum_{j=1}^n e^{w_j^T x}$.
In order to get rid of this $z$, we choose one of the possible values as our "pivot". Suppose we choose $n$ as a pivot, by substracting the equality corresponding to $n$ from all other equalitities we can then re-write them as: \begin{align} \ln \frac{p_1}{p_n}&=(w_1-w_n)^T x\\ \ln \frac{p_2}{p_n}&=(w_2-w_n)^T x\\ \vdots\\ \ln \frac{p_{n-1}}{p_n}&=(w_{n-1}-w_n)^T x \end{align} Since we have extra degrees of freedom, we can arbitrarily choose to set $w_n=0$, which yields: \begin{align} \ln \frac{p_1}{p_n}&=w_1^T x\\ \ln \frac{p_2}{p_n}&=w_2^T x\\ \vdots\\ \ln \frac{p_{n-1}}{p_n}&=w_{n-1}^T x \end{align}
Letting $q=\begin{pmatrix}\ln \frac{p_1}{p_n}\\ \vdots\\ \ln \frac{p_{n-1}}{p_n} \end{pmatrix}$ and $W=\begin{pmatrix} \leftarrow &w_1 & \rightarrow \\ &\vdots & \\\leftarrow &w_{n-1} & \rightarrow \end{pmatrix}$, these equations can be written in matrix form as: \begin{equation} q=Wx \end{equation} Now, choose any set of $n-1$ vectors $w_k$, $k=1,\ldots,n-1$, so that the matrix $W$ has a column rank of $n-1$. Without loss of generality, assume that the first $n-1$ columns of $W$ are independent and let $\tilde{W}$ be the matrix composed of those $n-1$ columns. We can then solve for $x$ by letting the first $n-1$ components of $x$ be given by $\tilde{W}^{-1} q$ and letting the remaining components be zero. By construction, we will then satisfy $q=Wx$, which implies that we have $g_k(x)=p_k$ for all $k$, as desired.
To sum up, following this procedure, we will have constructed a set of weight vectors $\{w_k\}_{k=1,\ldots,n}$, with $w_n=0$, and an input vector $x\in\mathbb{R}^m$ such that the first $n-1$ components of $x$ are given by $\tilde{W}^{-1}q$ (where $\tilde{W}$ is any invertible matrix of size $n-1$) and the remaining components are null. These $w_k$ and $x$ will be such that the outputs $g_k(x)$ of the softmax unit satisfy $g_k(x)=p_k$ for all $k$.