Let $f:X\to Y$ be a morphism of schemes, and let $k(y)$ to be the residue field of the point $y$. The fibre of the morphism $f$ over the point $y$ is defined to be the scheme $X_y=X\times Spec(k(y))$.
It's said that $X_y$ is homeomorphic to $f^{-1}(y)$. If we consider affine schemes, then the point $y$ corresponds to a prime ideal $p$ in $\mathcal{O}_y$, then $f^{-1}(y)$ should correspond to a prime ideal $q$ in $\mathcal{O}_x$, which is a point. But sometimes $f^{-1}(y)$ can be more than one point, so I am not sure what $X_y$ looks like.
Besides, it seems that if $H$ is a subscheme of $Y$, then $f^{-1}(H)$ can be regarded as $X\times H$. I hope someone can show me a clear picture. Thanks!
Why should $f^{-1}(y)$ correspond to only one prime ideal $q$ in $X$?
In the affine case you know that $f : X = \mathrm{Spec} \ B \to \mathrm{Spec} \ A = Y$ corresponds to a morphism of rings $\varphi : A \to B$. Now for $y \in Y$ one has $f^{-1}(y) = \{x \in X \mid \varphi^{-1}(x) = y \}$ which is not necessarily just a single point.
But this is what $f^{-1}(y)$ looks like in the affine case (I really mean: what the underlying topological space looks like).
One can check that this set corresponds to $\mathrm{Spec} \ (B \otimes_{A} k(y))$ which has a scheme-structure as well - this is the underlying scheme-structure of $f^{-1}(y)$ - actually $f^{-1}(y) = X \times_Y \mathrm{Spec} \ k(y) = \mathrm{Spec} \ (B \otimes_A k(y))$.
Regarding your second question: One defines $f^{-1}(H) := X \times_Y H$ as the base change of the inclusion $i : H \to Y$ along $f$. Note that the projection $f^{-1}(H) \to X$ is an embedding, as it is the base change of an embedding.
Maybe explicitly writing down the universal property might help you accepting this as a useful and natural definition:
Giving a morphism $Z \to f^{-1}(H)$ is the same as giving morphisms $p : Z \to X$ and $q : Z \to H$ such that $f \circ p = i \circ q$.
That is: A morphism $p : Z \to X$ factorizes over $f^{-1}(H)$ if and only if the composition $f \circ p : Z \to Y$ factorizes over the embedding $H \to Y$.
Also note that this actually does the job in for example the categories of sets:
If $f : A \to B$ is any map of sets and $B' \subseteq B$, then $f^{-1}(B')$ certainly satisfies the universal property of $A \times_B B'$. If you don't want to use the universal property: $A \times_B B' = \{(a,b') \in A \times B' \mid f(a) = b'\}$, so that $f^{-1}(B') = p_1(A\times_B B')$ where $p_1$ denotes the projection to the first factor. Also the $b'$ in $(a,b')$ is no real benefit informationwise as by construction $b' = f(a)$, hence one can reconstruct b' just by knowing $a$.