In information theory the information entropy of a vector of probabilities $\bf p$ can be defined as
$$I_b({\bf p}) = -\sum_{\forall i} p_i \log_b(p_i) = -{\bf p}^T \log_b({\bf p})$$
Where $1^T{\bf p} = 1$ is the probability sums to 100% condition which all probabilities must obey together with positivity $p_i \geq 0, \forall i$ .
Now to the question how would it be possible to make this quantity linear in the sense that we can express it with linear algebra and solve for it using ordinary least squares as we would if it were a linear equation system or minimization?
$p \log p$ isn't nice around $p=0$ due to the behavior of $\log p$ -- to see some issues that could occur, for example, in estimating entropy from data, see arXiv:1407.0381 [cs.IT]. The paper outlines some polynomial approximation techniques for $-p \log p$ in section 4. If you knew all your elements of $p$ had to be far away from $0$, you could do something like a Taylor series about a probability vector with full support (i.e. no zero elements).
That being said, entropy is concave, so maximizing entropy over a convex subset of probability distributions isn't hard. This functionality is built into packages like CVX.
Minimizing a concave function isn't straightforward, but you do have some structural properties of entropy (e.g. looking more like a point mass helps reduce entropy). But you're maximizing entropy in a lot more cases than minimizing it.