We use the chain rule for functions that can be seen as composed functions. However, for a simple function like $y=x^2$ we can also say that this is just a composition of 2 functions: $y=x$ and $y=(\text{something})^2$
Now, if we use the chain rule instead of the power rule, it still works and we still get $y'=2x$ But I'm wondering if the "power rule" is really just a shortcut of the chain rule for polynomials (polynomials meaning, no composite functions).
*NOTE: the answer was given to me in one of comments, which is basically that the power rule is applied when the composed functions are only the identity function (f(x)=x) and that of course applies only in polynomials.
I would say that you can always make a power rule problem into a chain rule problem in an unenlightening way. Yes, if $f(x) = x^n$, then we can define $g(x) = x$, so that $$ \frac{d}{dx}f(x) = \frac{d}{dx}f(g(x)) = f'(g(x))g'(x) = n [g(x)]^{n-1} \cdot 1 = nx^{n-1} $$ Note, however, that we would need to know the power rule in order to find the "outer" derivative $f'(x)$.
Slightly more interesting is the fact that if $f(x) = x^n$ and $g(x) = x^m$, then we can write $$ \frac{d}{dx}x^{mn} = \frac{d}{dx}(x^m)^n = \frac{d}{dx}f(g(x)) = n[g(x)]^{n-1} g'(x) = n[x^{m}]^{n-1} \cdot mx^{m-1} =\\ (mn)x^{m(n-1) + m-1} = (mn)x^{mn - 1} $$ Of course, it is easier to use the power rule outright.
We could derive the power rule (for real numbers $n$) using the chain rule together with the rule $\frac{d}{dx} \ln(x)= \frac 1x$. In particular, if we define $y = x^n$, we can use "logarithmic differentiation". $$ y = x^n \implies\\ \ln(y) = \ln(x^n) \implies\\ \ln(y) = n\ln(x) \implies\\ \frac{d}{dx}\ln(y) = \frac{d}{dx}n \ln(x) \implies\\ \frac{1}y \frac{dy}{dx} = \frac{n}{x} \implies\\ \frac{dy}{dx} = \frac{ny}{x} $$ If we substitute $y$ back in, we find $$ \frac{dy}{dx} = \frac{n(x^n)}{x} = nx^{n-1} $$
We could also derive the power rule (for integers $n$) using just the multivariate chain rule. In particular, define $$ f(x_1,\dots,x_n) = \prod_{k=1}^n x_n, \quad g(t) = (t,\dots,t) $$ We then find that $$ \frac{\partial f}{\partial x_k} = \frac{1}{x_k} \prod_{k=1}^n x_n, \qquad g'(t) = (1,\dots,1) $$ It therefore follows that $$ \frac{d}{dt}f(g(t)) = \sum_{k=1}^n \frac{\partial f}{\partial x_k}(g(t)) \cdot g'(t) = \cdots = nt^{n-1} $$ so that's kind of neat.