I need help with one of the problems in the Cover and Thomas book "Elements of Information Theory".
The question is about two parallel gaussian channels, with input $X_i$, output $Y_i$ and noise $Z_i$, $i = 1, 2$. $Z_1 \sim N(0,N_1)$ and $Z_2 \sim N(0,N_2)$ are independent Gaussian random variables and $Y_i = X_i + Z_i$. We wish to allocate power to the two parallel channels. Let $\beta_1$ and $\beta_2$ be fixed. Consider a total cost constraint $\beta_1P_1 + \beta_1P_2 \le \beta$, where $P_i$ is the power allocated to the $i^{th}$ channel and $\beta_i$ is the cost per unit power in that channel. Thus $P_1 \ge 0$ and $P_2 \ge 0$ can be chosen subject to the cost constraint $\beta$. Evaluate the capacity and find $P_1$, $P_2$ that achieve capacity for $\beta_1 = 1, \beta_2 = 2, N_1 = 3, N_2 = 2$ and $\beta = 10$.
The solution says that
We put all the signal power into the channel with less weighted noise ($\beta_iN_i$) until the total weighted power of noise + signal in that channel equals the weighted noise power in the other channel. After that, we will split any additional power between the two channels according to their weights.
I don't understand the bolded part. I thought that in water-filling, once the difference between the two channels is "filled", then the additional power is split evenly between the two channels? Like in the figure below, once the power is added to the red line, the additional power is then allocated equally between the 2 channels (blue level). At least that's what I got intuitively based on the concept of "water-filling".
Also, for the calculation of the capacity, the solution says that
Power is put into channel 1 until $\beta = 1$. After that we would put power according to their weights, i.e. we would divide remaining power of 9 in the ratio 2 is to 1. Thus we would set $P_1 = 6 + 1$ and $P_2 = 3$, and so that $v = 10$.
My thinking is as follows.
$$\beta_1P_1 = v - \beta_1N_1$$ $$\beta_2P_2 = v - \beta_2N_2$$
Hence, for the 2 channels, we have
$$2v = \beta_1P_1 + \beta_2P_2 + \beta_1N_1 + \beta_2N_2$$
Substituting the values, I have
$$2v = P_1 + 2P_2 + 7$$
$P_1 + 2P_2$ have to satisfy the constraint $\beta \le 10$, so setting it as 10, I have
$$v = 8.5$$ $$P_1 = 8.5 - 3 = 5.5$$ $$P_2 = 0.5(8.5-4) = 2.25$$
Capacity is calculated accordingly.
Am I wrong in my approach?

That's when the channels have the same cost. That's not the case here. But you can transform this problem to that one:
Here, we wish to maximize
$$\log\left(1 + \frac{P_1}{N_1}\right) + \log\left(1 + \frac{P_2}{N_2}\right)$$ subject to the restriction $$\beta_1P_1 + \beta_2 P_2 \le \beta$$
Let's call $p_1= \beta_1 P_1$, $n_1=\beta_1 N_1$, etc. Then the objective function is now
$$\log\left(1 + \frac{p_1}{n_1}\right) + \log\left(1 + \frac{p_2}{n_2}\right)$$ subject to the restriction $$p_1 + p_2 \le \beta$$
In the example: $\beta_1=1$, $\beta_2=2$, $n_1=3$, $n_2=4$, $\beta=10$.
Now, this is the standard allocation problem (studied in the textbook), which is solved by the waterfilling procedure: we take the less noisy channel (here, channel $1$), assign the difference ($1=n_2-n_1$) to it, and then distribute the remaining power ($\beta-1=9$) equally to both channels.
(Note that to split the "transfomed" powers $p_1,p_2$ evenly is equivalent to split the original/real powers $P_1,P_2$ inversely proportionally to their weights; hence the quoted solution is right).