Here is the definition of mutual information
$I(X;Y) = \int_Y \int_X p(x,y) \log{ \left(\frac{p(x,y)}{p(x)\,p(y)} \right) } \; dx \,dy,$
where $x$ and $y$ are two random variables and $p(x)$ and $p(y)$ are their PDFs, and $p(x,y)$ is the joint density.
I am wondering what is the derivative of $I(x;y)$ with respect to any one of the individual distribution $p(x)$, or $p(y)$? Namely
$\frac{dI(x;y)}{dp(x)}=?$ assuming $p(y)$ is known, or $\frac{dI(x;y)}{dp(y)}=?$ , assuming $p(x)$ is known.
Intuitively, if $p(y)$ is known, then when $p(x) = p(y)$, the mutual information get its largest. When $p(x)$ varies, we should get some behavior of mutual information $I(x;y)$.
Thanks.
Note that changing the distribution of $X$ inevitably changes the distribution of $Y$ (unless you are considering the trivial case where $p_{X,Y}(x,y)=p_X(x)p_Y(y)$ for which $I(X;Y)=0$ by default). Therefore, there is no meaning in looking for the "derivative" of $I$ with respect to $p_X(x)$ assuming $p_Y(y)$ fixed. For the same reason, the claim that, given $p_Y(y)$, the optimal distribution of $X$ is $p_X(x)=p_Y(x)$ does not make sense.
What does make sense is to understand how $I(X;Y)$ varies with $p_X(x)$ with $p_{Y|X}(y|x)$ (not $p_Y(y)$!!!) fixed. A well-known related result that might be of interest to you is the following:
Assuming that $p_{Y|X}(y|x)$ is Gaussian distributed with mean $x$ (i.e., $Y=X+N$, where $N$ is distributed as zero mean Gaussian), the distribution of $X$ that maximizes $I(Y;X)$ is Gaussian.