I recently took a better look at the operator norm defined on a matrix $\mathbf A \in \Bbb{K}^{n\times n}$ as follows:
$$ \|\mathbf A\|_p=\sup\{\|\mathbf Ax\|_p \mid x\in\Bbb{K}^n\land\|x\|=1\} $$
The first time I looked at this I thought "ok, lets calculate it for a few example matrices". I started with $n = 3$ and $p = 2$, just to start "simple". Let $$ \mathbf A = \left[\begin{matrix}a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33}\end{matrix}\right]\quad a_{ij}\in\Bbb{K}^n $$ Now if we're going to minimize $\|\mathbf Ax\|_2$ ($ = \|\mathbf Ax\|$), we might as well make it easy on ourselves and only minimize $\|\mathbf Ax\|^2$ so as to not worry about that annoying radical. We get $$ \begin{align} \|\mathbf Ax\|^2 & = (a_{11}x_1 + a_{12}x_2 + a_{13}x_3)^2 + (a_{21}x_1 + a_{22}x_2 + a_{23}x_3)^2 + (a_{31}x_1 + a_{32}x_2 + a_{33}x_3)^2 \\ & = Ax_1^2 + Bx_2^2 + Cx_3^2 + Dx_1x_2 + Ex_1x_3 + Fx_2x_3 \end{align} $$ where $$ \begin{align} A & = a_{11}^2+a_{21}^2+a_{31}^2 \\ B & = a_{12}^2+a_{22}^2+a_{32}^2 \\ C & = a_{13}^2+a_{23}^2+a_{33}^2 \\ D & = 2(a_{11}a_{12} + a_{21}a_{22} + a_{31}a_{33}) \\ E & = 2(a_{11}a_{13} + a_{21}a_{23} + a_{31}a_{33}) \\ F & = 2(a_{12}a_{13} + a_{22}a_{23} + a_{32}a_{33}) \end{align} $$ Now lets define $$ G(x_1,\ x_2,\ x_3) = Ax_1^2+Bx_2^2+Cx_3^2+Dx_1x_2+Ex_1x_3+Fx_2x_3 $$ So if we want to minimize $||\mathbf Ax||^2$, we're either going to have to minimize $$ N(x_1,\ x_2,\ x_3) = \frac{G(x_1,\ x_2,\ x_3)}{x_1^2+x_2^2+x_3^2} $$ or simply minimize $G$ with the constraint $g(x_1,\ x_2,\ x_3) = x_1^2 + x_2^2 + x_3^2 = 1$. The latter seemed easier to me, so I gave it a shot using Lagrange multipliers.
As usual, I defined $$ \mathcal{L}(x_1,\ x_2,\ x_3,\ \lambda) = G(x_1,\ x_2,\ x_3)-\lambda g(x_1,\ x_2,\ x_3) $$ setting it's gradient to zero gives $$ \nabla \mathcal L = 0 \implies \begin{cases} 2(A - \lambda)x_1 + Dx_2 + Ex_3 & = 0 \\ Dx_1 + 2(B - \lambda)x_2 + Fx_3 & = 0 \\ Ex_1 + Fx_2 + 2(C - \lambda)x_3 & = 0 \\ x_1^2 + x_2^2 + x_3^2 - 1 & = 0 \end{cases} $$ Now this is where I really started to get stuck. I tried solving the first three equations for $x_1,\ x_2,$ and $x_3$ but didn't end up with anything I could use. I tried solving for $x_1$ in terms of $x_2,\ x_3,$ and $\lambda$, then $x_2$ in terms of $x_3$ and $\lambda$, and then subbing that all into the third equation, but ended up with either $x_3 = 0$ or $$ 4\lambda^3 - 4\lambda^2(A+B+C) + \lambda(4AB+4AC+4BC-D2+E^2+F^2)-4ABC-AF^2-BE^2+CD^2+DEF = 0 $$ which, although technically solvable for $\lambda$ via the cubic equation, would be incredibly messy.
Now, I probably created my own roadblock for this problem, because I didn't want to think about the system of equations logically and just wanted to bash it out. Regardless of my approach, it seems like the operator norm is a very difficult thing to calculate, and I only analyzed the case where $n = 3$ and $p = 2$. What about the general case? What if $n = 75$ and $p = 9/4$? How on earth would you calculate it then?
The questions above are rhetorical, however, and my actual question is as follows:
Why define such an ordinary norm for matrices which is so difficult to calculate in general?
I see the operator norm everywhere, and it seems like the standard norm for a lot of theorems (unless I'm mistaken and ||A|| just means any matrix norm). So why would we define such a standard norm in a way that is so difficult to calculate? What's the point? Is it easy to work with in theorems? I get that it intuitively makes sense as a norm, but it can't possibly be that easy to work with, especially in comparison to things like the Frobenius norm.
So why do we care about this definition?
First, as others have mentioned, the operator norm has many nice properties that make it convenient to use in proofs (most basically the fact that, by definition, it satisfies $\| Ax \| \le \| A \| \|x \|$). You might, for example, end up with factors of the operator norm in various bounds; even if you can't calculate the operator norm, if you can upper or lower bound it as appropriate then you can still extract information from these bounds. To really see the operator norm in action you can try learning some functional analysis; it really starts to be useful in the infinite-dimensional setting.
Second, here's how you calculate the operator norm (edit: when $p=2$). Let me assume that $A$ is real for simplicity although it doesn't matter much. You want to maximize $\langle Ax, Ax \rangle$ as $x$ ranges over all unit vectors. This is equivalent to maximizing
$$\langle A^T A x, x \rangle.$$
Now, unlike $A$, the matrix $A^T A$ is symmetric, and so by the spectral theorem it has an orthonormal basis of eigenvectors. These are the right singular vectors $r_i$ of $A$, and the corresponding eigenvalues are the squares $\sigma_i^2$ of the singular values of $A$ (up to the appearance of some zeroes, which don't matter for this calculation). If we write $x$ in this basis as
$$x = \sum x_i r_i$$
we get that
$$\langle Ax, Ax \rangle = \sum \sigma_i^2 x_i^2$$
where $\langle x, x \rangle = \sum x_i^2 = 1$. This is a much easier optimization problem! It follows that $\langle Ax, Ax \rangle$ is maximized when $x$ is equal to a right singular vector corresponding to the largest singular value $\sigma_1$, and that its maximum value is $\sigma_1^2$. Hence $\sigma_1$ is the operator norm of $A$. Note that if $A$ is normal it coincides with the absolute value of the largest eigenvalue (in absolute value) of $A$.
The largest singular value can be calculated in various ways. See the Wikipedia article on singular value decomposition for details.