The concept of this "norm" does not come from the desire to generalize $L^p$ norm. It comes from the desire to compress information. When you save a picture as JPEG file, information is stored in the form of a sequence of Fourier coefficients. To reduce the size of the file, we want to store as few coefficients as possible, and still have a decent image. Hence, one is led to consider minimization problems like
$$
\|x\|_0 + \alpha \|x-x^*\|_2 \to\min \tag1
$$
where $x^*$ is the original image (or signal) and $x$ is the compressed one. The formula (1) attempts to balance the compression (small $\|x\|_0$) with accuracy (small $\|x-x^*\|_2$).
The terminology "$L_0$" comes from applied mathematicians who needed a name for the thing they were minimizing. Pure mathematicians would not name this thing (which isn't a norm) so.
Yes, there is formal similarity if we use the (unconventional) convention $0^0 = 0$: then
$$\|x\|_0 = \sum x_k^0$$
The concept of this "norm" does not come from the desire to generalize $L^p$ norm. It comes from the desire to compress information. When you save a picture as JPEG file, information is stored in the form of a sequence of Fourier coefficients. To reduce the size of the file, we want to store as few coefficients as possible, and still have a decent image. Hence, one is led to consider minimization problems like $$ \|x\|_0 + \alpha \|x-x^*\|_2 \to\min \tag1 $$ where $x^*$ is the original image (or signal) and $x$ is the compressed one. The formula (1) attempts to balance the compression (small $\|x\|_0$) with accuracy (small $\|x-x^*\|_2$).
The terminology "$L_0$" comes from applied mathematicians who needed a name for the thing they were minimizing. Pure mathematicians would not name this thing (which isn't a norm) so.
Yes, there is formal similarity if we use the (unconventional) convention $0^0 = 0$: then $$\|x\|_0 = \sum x_k^0$$