Is it always true that if you sum over some variable then you can "remove" that varaible in each expression of the product inside the sum? For example:
$ \sum_x P(x, y)P(y | x)P(y | x, z) = P(y)P(y)P(y|z)? $
It seems that this is often the case, but what I'm asking is if this is a general "rule" you can use?
The notation you use is a little ambiguous, but in general the way you describe it, it is incorrect. For example, take $X, Y, Z$ as discrete random variables that are independent. Let $X$ take values $x_1, \dots , x_n$. According to your conjecture we should find that,
$$\sum_{i=1}^n \mathbb{P}(X = x_i, Y = y) \mathbb{P}(X = x_i, Z = z) = \mathbb{P}(Y=y) \mathbb{P}(Z = z) $$
But this is false since if $X, Y, Z$ are independent we obtain,
$$\sum_{i=1}^n \mathbb{P}(X = x_i) \mathbb{P}(Y = y) \mathbb{P}(X = x_i) \mathbb{P}(Z = z) = \mathbb{P}(Y=y) \mathbb{P}(Z = z) \sum_{i=1}^n \mathbb{P}(X = x_i)^2 $$
The sum of squares of the probabilities is not in general 1, take $X \sim \text{Bernoulli}(p)$ for example; that is, $X$ takes value $1$ with probability $p$ and $0$ with probability $(1-p)$. We do have that $(1-p) + p = 1$ but in general $(1-p)^2 + p^2 \neq 1$.
You should try to understand why marginalization works in the first place, it all comes back down to the following law of probability,
If $A, B$ are disjoint events then,
$$\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B), \ \ (*)$$
From this we have that if $B_1, \dots, B_n$ is a partition of the sample space, meaning
$$\bigcup_{i=1}^n B_i = \Omega, \ B_i \cap B_j = \emptyset, \ i \neq j$$
Then we obtain by applying the additivity of probability $(*)$, since we have that,
$$A = A \cap \Omega = A \cap \bigcup_{i=1}^n B_i = \bigcup_{i=1}^n A \cap B_i $$
Then,
$$\mathbb{P} (A) = \sum_{i=1}^n \mathbb{P}(A \cap B_i) $$
Now, coming back to the language of events as values of random variables, how is this marginilisation? Consider $X$ as a discrete random variable taking values $x_1, \dots , x_n$. Then the events, $\{X = x_1 \}, \{X = x_2 \}, \dots , \{X = x_n \} $ constitute a partition of the sample space! (since the cases are exhaustive and disjoint).
So if we let $A = \{Y = y\}$ and $B_i = \{X = x_i \}$ we have the familiar procedure of "marginilisation"
$$\mathbb{P} (Y = y) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i), \ \ (**)$$
Note that this also works when there is a conditional probability as long as that conditional probability does not involve the variable you are marginilising over, again, try to link it back to the basics, a conditional event is of the form,
$$\mathbb{P}(A | C) = \frac{\mathbb{P}(A \cap C)}{\mathbb{P}(C)} $$
For our marginilisation formula we have from $(**)$ that if we wanted to condition on $Z = z$ for example we can do the following,
$$\mathbb{P} (Y = y, Z = z) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i, Z = z)$$
Then we have,
$$\frac{\mathbb{P} (Y = y, Z = z)}{\mathbb{P}(Z = z)} = \sum_{i=1}^n \frac{\mathbb{P}(Y = y, X = x_i, Z = z)}{\mathbb{P}(Z = z)}$$
So in fact we recover our "marginilisation" with just simple arithmetic,
$$\mathbb{P}(Y = y | Z = z) = \sum_{i=1}^n \mathbb{P}(Y = y, X = x_i| Z = z)$$
In summary, when thinking about laws of probability, try to link it back to the fundamental laws of probability.