I have an $N \times K$ matrix of known data $\mathbf{Y}$ and $K$ length vector of unit weights $\mathbf{x}$, and want to optimize the objective function:
$$ f(\mathbf{x}) = \log(\mathbf{y} \mathbf{x}) $$
I can do this using Python with e.g. scipy.minimize
import scipy.optimize as opt
import numpy as np
N = 10
K = 3
logY = np.random.normal(size=30).reshape((N, K))
Y = np.exp(logY)
def f(x):
return - sum(np.log(Y @ x))
result = opt.minimize(
fun=f,
x0=[1/K] * K,
constraints=dict(type="eq", fun=lambda x: sum(x) - 1),
method="SLSQP",
bounds=[(0, 1) for _ in range(K)],
)
This runs and gives relatively sensible results.
If I want to supply the Jacobian to this routine, I need:
$$ \frac{d f(\mathbf{x})}{\mathbf{x}} = \frac{1}{\mathbf{x}} $$
def jacobian(x):
return - 1/x # negation for minimization
However, adding this seems to result in failure, with completely uniform values of $\mathbf{x}$.
I have seen in some other code in a similar example the definition of the Jacobian as:
def jacobian2(x):
deriv = lambda n, k: (Y[n, k] - Y[n, K - 1]) / Y[n] @ x
partials = np.array([deriv(n, k) for n in range(N) for k in range(K)]).reshape((N, K))
return -partials.sum(axis=0)
This, however, seems to work. So, how is jacobian2 the right/better answer here? Is the derivative of the objective function $\frac{x_{nk} - x_{nK}}{Y X}$ for each $n$ and $k$?
I assume that the cost function is the scalar $f(\mathbf{x})=\mathbf{1}^T \log(\mathbf{Yx})$ where $\log$ is applied elementwise on a vector.
The gradient of $f$ wrt $\mathbf{x}$ is $$ \frac{\partial f}{\partial \mathbf{x}}=\mathbf{Y}^T \frac{\mathbf{1}}{\mathbf{Yx}} $$ where again the division must be taken elementwise.