I'm reading Shafarevich's Basic Algebraic Geometry 1. For a variety $X\subset\mathbb A_k^n$ in affine space that goes through $0$, suppose its ideal is $I(X)=\langle f_1,...f_m\rangle$, and let $L=\{ta\mid t\in k, 0\not= a\in\mathbb A_k^n\}$ be a line through $0$. Substituting $L$ into each $f_i$, we get $f_i(at)=tL_i(a)+G_i(ta)$ where $L_i$ is the linear term and $G_i(ta)$ is divisible by $t^2$. Then the condition for tangency is that $L_i(a)=0\;\forall i$.
This is a very intuitive definition for tangents, however, he later gives the example of the following curve in the plane $\mathbb A^2$ : $y(y-x^2)=0$, the union of a parabola and its tangent at $0$. The tangent space at $0$ is then found to be $\mathbb A^2$, even though both have the same tangent $y=0$.
This, to me, is a very unintuitive result. Is it meaningful? How is it (morally?) justified that this is actually a well-defined (in the non-mathematical sense) definition?
In algebraic geometry, you can prove purely abstractly that your notion of tangent space coincides with the ordinary definition at regular points of a variety. Indeed, your description of the $L_i$ are just the 'first order' terms, which are well-known to be identifiable with all the usual notions of a tangent space from say, calculus or differential geometry, such as the ring of derivations at a point.
The question is only what happens at a singular point? It seems to me that the notion you want is that of the tangent cone, not the tangent plane - at a singular point you seem to be interested in the collection of all 'directions' leaving the singularity, and this is why you find the failure to be counterintuitive - the singularity has two 'tangents' but they are the same direction, so you want the tangent plane to be one dimensional. This isn't really what the definition of the tangent plane does though. Instead, the tangent plane is recording the singularity of the Jacobian matrix - this definition exactly mirrors what we learn in our first course on manifolds embedded in $\mathbb{R}^n$, and so is somehow the 'correct' definition. At an ordinary point, the vectors for which the Jacobian is singular can be identified with tangent directions as we like, but at singular points, there is a larger kernel, as we discuss more below. This is good, but it leaves at least one thing to be desired - it is not obvious that singularity is independent of the choice of generators for the ideal of the variety.
Instead, we have the following result, taken from Hartshorne with some small changes for my own notes, but mostly from Hartshorne. (As an aside, I strongly dislike Shafarevich because I feel he hides the algebra. Not that I particularly enjoy algebra, but I feel without some structure, the proofs of the theorems become uninspired polynomial pushing games).
$\textbf{Definition:}$ Let $A$ be a Noetherian local ring with maximal ideal $\mathfrak{m}$ and residue field $k = A/\mathfrak{m}$. A is called a regular local ring if the dimension (as a $k-$vector space) $\dim \mathfrak{m}/\mathfrak{m}^2 = \dim{A}$
This is easy enough to prove, so I'll reproduce Hartshorne's proof.
Let $Y \subseteq \mathbb{A}^n$ be an affine variety, and $P \in Y$ be a point. Then $Y$ is nonsingular at $P$ if and only if the local ring $\mathcal{O}_{P,Y}$ is a regular local ring.
$\textit{Proof:}$
Let $p$ be a point in $\mathbb{A}^n$, and let $\mathfrak{a}_p = (x_1 - a_1, ..., x_n - a_n)$, where $a_i$ are the coordinates of $p$. This ideal is the maximal ideal of $\mathbb{A}^n$ corresponding to $P$. We define a linear map $\theta: A = k[x_1, ..., x_n] \to k^n$ by writing $$\theta(f) = \bigg \langle \frac{\partial f}{\partial x_1}(p), ..., \frac{\partial f}{\partial x_n}(p) \bigg \rangle$$. Now it is apparent that $\theta(x_i - a_i)$ maps to the standard basis of $k^n$. Furthermore, we have $\theta(\mathfrak{a}_p^2) = 0$, since for every $i$, $(x_i - a_i)^2 = (x_i^2 - 2x_ia_i + a_i^2)$, which $\theta$ sends to $(0,..., 2x_i - 2a_i , 0, ..., 0)$ evalauted at $p$, which is $0$ since the $a_i$ are the coordinates of $p$. Thus by the isomorphism theorem, there is an induced isomorphism $\theta': \mathfrak{a}_p/\mathfrak{a}^2_p \to k^n$.
Next, let $\mathfrak{b}$ be the ideal of $Y$ in $A$, and suppose that $\mathfrak{b}$ is generated by polynomiasl $f_1, ..., f_m$. Since $\theta$ takes each $f$ to a vector in $k^n$, the rank of the Jacobian matrix is the dimension of the subspace of $\theta(\mathfrak{b})$ as a subspace of $k^n$. We can use the isomorphism above $\theta'$ to pull this back to a coset of $0$. Some ideal arithmetic reveals that this is the dimension of the subspace of $(\mathfrak{b} + \mathfrak{a}_p^2)/\mathfrak{a}_p^2$ as a subspace of $\mathfrak{a}_p / \mathfrak{a}_p^2$. On the other hand, the local ring $\mathcal{O}_p$ of $p$ on $Y$ is obtained from $A$ by dividing the polynomial ring by $\mathfrak{b}$ and localizing at $\mathfrak{a}_p$. Then if $\mathfrak{m}$ is the maximal ideal of $\mathcal{O}_p$, it follows that $$\mathfrak{m}/\mathfrak{m}^2 \simeq \mathfrak{a}_p/ ( \mathfrak{b} + \mathfrak{a}_p^2).$$ Then all of these things being vector spaces, we can count up the respective dimensions, and observe that $\dim_k \mathfrak{m}/\mathfrak{m}^2 + \operatorname{rank} J = n$. If the dimension of $Y$ is $r$, then the punchline to this analysis is that $\mathcal{O}_p$ is a local ring of dimension $r$, and so is regular if and only if the dimension of $\mathfrak{m}/\mathfrak{m}^2$ is $r$, i.e. that the rank of $J$ is exactly $n-r$, which is what we wanted to prove.
-
The takeaway here is in twofold, the first being in the last line: if $A$ is a Noetherian local ring with maximal ideal $\mathfrak{m}$, and residue field $k$, then the dimension over $k$ of $\mathfrak{m}/\mathfrak{m}^2$ is at least that of $A$ - the dimension jumps $\textit{up}$ at the singularities. The second is that this expresses the condition of non/singularity only in terms of intrinsically defined data. There is no reference of the generators for the ideal in the statement of the local ring.
In your case, the only $2$ dimensional subspace that this can possibly be is the whole plane, so that's what it is, but this is a cruddy answer because you can (and should) object that you can just embed $\mathbb{A}^2 \hookrightarrow \mathbb{A}^n$ for any $n \geq 3$ you like, and my answer is useless. So let's take a look at what happens if we look at the same curves, but now embedded in one dimension up, for simplicity. We still have the same equation $y(y-x^2) = 0$ but let's also now fix the plane $z =0$ so that we have cut this out as a reducible plane curve in $3-$space. The argument in the proof above has shown that singularity can be detected independently of the embedding of affine space, so it is enough to look at these equations, and inspect their Jacobian matrix.
We have a $2 \times 3$ matrix now $$J = \begin{bmatrix} -2xy & 2y-x^2 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$
Evaluating this at $0$ we see that the kernel of this matrix is exactly the $x,y$ plane. This shows us that you don't always just get 'everything else' at a singularity, and illustrates that the extra direction is not from a true tangency as you note, but it is visible as a linear dependence in the Jacobian.
I apologize for writing a small article here on this subject. I've been writing notes on Hartshorne's book for a while to help me learn this myself, and so I lifted large chunks of this answer from my notes, and I hope that it helps you!