How to find the gradient of $f(x)=-\sum_{i=1}^n \log x_i$?

2.2k Views Asked by At

I am trying to find the gradient $\nabla f(x)\,\,$ of $\,f: \mathbb{R}^n\rightarrow\mathbb{R}$ of $$f(x)=-\sum_{i=1}^n \log x_i,$$ but I am getting stuck when it comes time to deal with the log.

I know there are several options for taking the derivative:

  1. delta method, which involves basically applying the extreme value theorem. Add a small perturbation, and then isolating the parts of the function which involve an inner product of the perturbation with a function of the variable $$ f(x+h) = f(x) + \langle \nabla f, h \rangle + o(\|h\|) $$ where $o(\|h\|)$ is a function such that the limit as $h \to 0$ is zero (comes straight from the definition of differentiability);

  2. vector calculus using chain rule.

Since the equation $f(x)$ is not expressed in a vector here, I figured it would be easier to do 1) via perturbation, so here's what I tried:

I try to add $h$ which is my perturbation to the vector $x$:

$$f(x+h)= - \sum_{i=1}^n \log (x_i+h_i)$$

But now here, there are not many options for taking the logarithm of the sum of two numbers. So, I'm wondering if maybe I can split this out into $\log (x_i) + \log(h_i)$ and just call my perturbation $log(h_i)$ instead of $h_i$?

Are there any other hints to finding the gradient of this function $f$?

2

There are 2 best solutions below

4
On

We know that $\log(1+x)=x+o(x)$ therefore

$$\log (x_i+h_i)=\log(x_i)+\log \left(1+\frac{h_i}{x_i}\right)=\log(x_i)+\frac{h_i}{x_i}+o(h_i)$$

and then

$$\Delta \log(x_i) = \log (x_i+h_i)-\log(x_i)=\frac{h_i}{x_i}+o(h_i)$$

which implies that

$$\nabla f = \left(-\frac 1 {x_1},-\frac 1 {x_2},...,-\frac 1 {x_n}\right)$$

1
On

Since you are applying $\log$ component-wise, you can use its taylor series expansion and drop higher order terms.

$$f(x+h) = -\sum\limits_{i=1}^n\log(x_i+h_i) = -\sum\limits_{i=1}^n \left(\log(x_i) + \frac{1}{x_i}h_i +o(|h_i|) \right)\approx f(x) + \langle -\frac{1}{x},h\rangle$$

$$\implies f'(x) = \left(-\frac{1}{x}\right)^T\implies \nabla f(x) = -\frac{1}{x}$$

where it's understood that $-\frac{1}{x} = \left[-\frac{1}{x_i}\right]_{i=1}^n$ is applied pointwise.