In Introduction to the Theory of Computation by Michael Sipser, there's an example which shows how to convert the regular expression $ (ab|a)^{*}$ into an NFA. The "standard" method results in 8 states, and there's an additional challenge given to the reader to find the smallest NFA which has only 2 states.
Below is the diagram that I arrived at. It accepts $ a, aa, abab, aab, aaababaa $, etc. but fails when the string starts with $b$. How should I fix this?

Also, is there any general approach or guidelines to find such "smallest" NFA's and DFA's in the general case? (PS: I don't know if this is explained later in the book. Please ignore this additional question, if a textbook will usually explain it at a later stage.)
I'll answer your questions in order.
Redesign your FA. You almost have it right, and you would if you switched the transitions between states $A$ and $B$, giving this picture:
With the start state $A$ being the only final state. One way to think of this is to handle the repeated $a$s with a loop from $A$ to itself and the repeated $ab$s with the transitions from $A$ to itself, through $B$.
Sipser's construction algorithm from a regular expression to an equivalent NFA is conceptually simple but, as you've implied, gives an NFA with a lot of states and epsilon-transitions. (The construction in the other way, from an NFA to an equivalent regular expression, explained in the next seven pages, is so ghastly that one almost never does it manually unless under the duress of a homework exercise.)
At the start of section 1.2 (in the third edition of Sipser), there is a construction which removes epsilon-moves from an NFA and another which converts an NFA without epsilon-moves to an equivalent DFA. There is another, the Myhill-Nerode algorithm, that, given a DFA, constructs an equivalent DFA with the minimal number of states. Unfortunately, Sipser only outlines it in an exercise (1.52), though he does provide an answer (which I think is a bit too telegraphic). These, when combined, will guarantee a minimal-state DFA, given any NFA. As an aside, doing this for the example you gave will yield an equivalent DFA with two states. Can you find it?
The best answer to your question, though, probably won't be completely satisfying right now: "It gets easier with practice." With a stock of examples in your toolbox, you will eventually get to the stage where you can do simple exercised like this fairly easily.