I understand that the Kronecker product basically multiplying each element in one matrix by the other matrix, but I don't understand how exactly it is supposed to be used when scaling gates to the right number of qubits or creating the states of the system.
Ex, $H\otimes H$ produces the Hadamard gate for a 2 qubit system. Likewise, $H^{\otimes n}$ produces the Hadamard gate for a $n$ qubit system. This is, as far as I know, true for any quantum gate. But how does this change if you are applying the Hadamard gate to the second or third qubit instead of the first? Or what about with gates like the controlled NOT gate? I read here that you use the swap gate, but how do you scale the swap gate?
(I'm trying to learn this for use in Python, with numpy.kron.)
Each qubit can be thought of as a vector in $\mathbb{C}^2$, and a system of many qubits lives in the tensor product of some copies of $\mathbb{C}^2$. For example, a three qubit system lives in $\mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2$. This means that any operator you want to apply to this system has to be a map $\mathbb{C}^8 \to \mathbb{C}^8$.
Gates such as $X: \mathbb{C}^2 \to \mathbb{C}^2$ can only be applied to a single qubit, and somehow you have to "lift" them into position so you can apply them to the correct qubit in a three qubit system. This is done by taking a tensor product with the identity operator $I$, for example the operator $I \otimes X \otimes I$ will apply the $X$ gate to the middle qubit only.
If you have some gate, perhaps called $C: \mathbb{C}^4 \to \mathbb{C}^4$, which operates on two qubits at once, you only have to lift in the identity operator for the one qubit which is left out. For example, $I \otimes C$ will apply the $C$ gate to the second and third qubit, and leave the first alone.