Suppose I have a random sample of size 200. Each sample has two observations about wage in 2000 and wage in 2005, called $w_{00}$ and $w_{05}$ I want to test whether $\mu_{2000} = \mu_{2005}$. I can do it in two ways:
The first way is to carry the usual hypothesis testing on equality about two means, estimating sample mean and variance, and the degree of freedom is 398.
The second way is to transform it into a test on equality about one mean. Specifically I'm testing whether $E(w_{00} - w_{05}) = 0$. Then I would just take the sample mean and divide it by the standard error, and the deg of freedom is 185.
What's the difference between these two methods and which one is more appropriate?
The main issue in your question is the distinction between a 'paired' t test and a 'two-sample' t test of data $X_1$ and $X_2$. (I have changed the title of your question accordingly.)
If you have two (possibly correlated) salary values on each individual, one from the first year and the other from the second year, then you should use the paired test:
Let $D_i = X_{1i} - X_{2i},$ for $i = 1, \dots, n$ and use the test statistic $T = \frac{\bar D}{S_D/\sqrt{n}},$ where $T \sim \mathsf{T}(\nu = n-1)$ under $H_0.$
If you have two independent samples (different subjects in different years), then you should use a two-sample t test. (The Welch 'separate variances' test is generally preferred over the 'pooled variances' test, unless you have strong information in advance that two populations have equal variances. However, with $n$'s as large as 200 there will be little difference between these two versions of the two-sample test.)
Technically, both paired and 2-sample tests require normal data in order to use Student's t distribution to compute critical values or P-values. However both kinds of t tests are reasonably accurate unless the data are markedly skewed with relatively many outliers. With salary data across a broad population, you should check the raw data for normality.
The distinction is of considerable practical importance. Using a 2-sample test for data that are actually paired, can result in failure to detect a real difference.
Example. Here are fake normal data for salaries (in thousands of dollars), simulated according to a paired design with $n = 200$ with a true difference of about 2 (\$2000).
Descriptive statistics:
Results of Tests (using R statistical software): The correct paired test finds a highly significant difference (P-value < .001), but the incorrect 2-sample test does not (P-value of the Welch test about 10%, shown; P-value of the pooled test about the same, not shown.).