Differentiating log likelihood with respect to data

117 Views Asked by At

I'm working with the probabilistic model derived from the KL NMF, in which we sample each entry of a data matrix X from the distribution $\forall_{i,j},X_{ij}\sim\text{Poisson}\left(\left[WH\right]_{ij}\right)$ with W, H non-negative parameters (notice that there's no iid sampling).

We get the log-likelihood function for poisson:

$l\left(W,H;X\right)=\sum_{i,j}\left[-\log\left[X_{ij}!\right]+X_{ij}\log\left(\left[WH\right]_{ij}\right)-\left[WH\right]_{ij}\right]$

As this is not a convex function, we maximize iteratively by blocks (first W, then H, etc.) to find a local maximum.

I'm interested in scoring the "connection" that each parameter has to each data point, i.e. how critical is $X_{ij}$ in determining the value of each parameter.

One thought was to differentiate $l$ by a data point (or its continuous statistic) and then by a parameter (mixed second derivative), analogous to the Observed Information. However, I cannot find any mentions in literature to differentiating the log-likelihood by a data point.

Would love to hear about:

  1. Literature that mentions differentiating log-likelihood by a data point.
  2. Your opinion about the proposed method.
  3. Other scoring methods to evaluate this connection.

Many thanks