Proof: Let $\{V_\alpha\}$ be an open cover of $f(X)$. Let $U_\alpha=f^{-1}(V_\alpha)$. By compactness, $X$ has a finite subcover $U_1,..., U_n$. Therefore, $V_1,..., V_n$ cover $f(X)$.
The proof is very simple, although I'm having problems understanding why we can let $f^{-1}$ be defined on $\{V_\alpha\}$? From what I understood about functions, $f^{-1}$ should only defined on a subset of $Y$. But since, any $V_\alpha$ is not necessarily contained in $Y$, then we can't map $V_\alpha$ to $X$ with $f^{-1}$. Correct?
An open cover of $f(X)$ is a family of open subsets $V_\alpha$ in $Y$ such that $$ f(X)\subseteq\bigcup_{\alpha}V_\alpha $$ You can certainly consider $f^{-1}(V_\alpha)$ (see final note 1) and, by elementary properties of the inverse image operation, $$ X\subseteq\bigcup_\alpha f^{-1}(V_\alpha) \tag{1} $$ By compactness of $X$, there are $\alpha_1,\dots,\alpha_n$ so that $$ X\subseteq\bigcup_{i=1}^n f^{-1}(V_{\alpha_i}) \tag{2} $$ (see final note 2). This implies that $$ f(X)\subseteq \bigcup_{i=1}^n V_{\alpha_i} $$ with an easy verification.
Final notes.
The set $f^{-1}(V_\alpha)$ could well be empty, if $V_\alpha$ doesn't intersect $f(X)$, but it's not relevant as far as the proof is concerned.
In the displays $(1)$ and $(2)$, $\subseteq$ could be replaced by $=$, but actually the same argument shows that
so it's better to have $\subseteq$.