In Example 5.5 of the book Convex Optimization, the authors derived the dual problem for the optimization problem
$$ \begin{aligned} & \text{min} && \log \sum_i^m \exp y_i\\ & \text{s.t.} && Ax + b = y \end{aligned} $$
where $A \in \mathbb R^{m \times n}, x \in \mathbb R^n, b, y \in \mathbb R^m$, using the conjugate function of log-sum-exp. (The derivation can be found here at page 268)
My question is, is it possible for us to derive the dual problem without using the conjugate function?
The following is my current attempt:
The Lagrangian is
$$L(x, y, v) = \log \sum_i^m \exp y_i + v^T(Ax + b - y)$$
where $v \in \mathbb R^m$ is the Lagrange multiplier.
$$\frac{\partial L}{\partial x} = A^Tv = 0$$
$$\frac{\partial L}{\partial y} = \frac{1}{\sum_i^m \exp y_i} \left(\begin{array}{c} \exp y_1 \\ \exp y_2 \\ . \\ . \\ . \\ \exp y_m \end{array}\right) - v = 0$$
So we have
$$A^T v = 0$$
$$\frac{\exp y_i}{\sum_j^m \exp y_j} = v_i \ \ \forall i$$
I've also noticed that the second equation implies that $\textbf 1^T v = 1$ and $v \succeq 0$.
Here's where I get stuck. I don't know how to continue to express the dual function.
Thanks to the hints and guidance given by @BrianBorchers and @MichaelGrant, I've figured out the answer. I am posting it here so it might help others.
From
$$\frac{\exp y_i}{\sum_j^m \exp y_j} = v_i \ \ \forall i$$
we can easily express $y_i$ as
$$y_i = \log v_i \sum_j^m \exp y_j$$
Substitute this and $A^Tv = 0$ into the Lagrangian, we have
$$\begin{aligned} && L(y, v) &= \log\sum_i^m \exp y_i + b^Tv - \sum_i^mv_i( \log v_i \sum_j^m \exp y_j)\\ &&& = \log\sum_i^m \exp y_i + b^Tv - \sum_i^mv_i( \log v_i + \log\sum_j^m \exp y_j)\\ &&& = \log\sum_i^m \exp y_i + b^Tv - \sum_i^mv_i \log v_i - \log\sum_j^m \exp y_j\sum_i^m v_i\\ && & = b^Tv- \sum_i^mv_i\log v_i \\ &&& = g(v) \end{aligned}$$
With the dual function in place we can proceed to write out the dual problem:
$$\begin{aligned} & \text{max} && b^Tv - \sum_i^m v_i \log v_i\\ & \text{s.t.} && A^Tv = 0\\ &&& \textbf 1^Tv = 1\\ &&& v \succeq 0 \end{aligned}$$
As @MichaelGrant pointed out, the steps above are effectively deriving the conjugate function.