I am trying to understand why the convolution kernel, $$\left[\begin{array}{rrr} -1&-1&-1\\ 2&2&2\\ -1&-1&-1 \end{array}\right]$$ detects the edges in an image. If anyone has a mathematical reason for this, please post it. Thanks.
Also, why does the a $3$-by-$3$ matrix of $1/9$ work for denoising?
The convolution with matrix $\left[\begin{array}{rrr} -1&-1&-1\\ 2&2&2\\ -1&-1&-1 \end{array}\right]$ gives a low value on a more even region of an array, and a high value if there is a large difference between that point and the previous or next point in vertical direction. You can try to perform a convolution on an array with horizontal edge like
$$\left[\begin{array}{ccccc} 0&0&0&0&0\\ 0&0&0&0&0\\ 0&0&0&0&0\\ 1&1&1&1&1\\ 1&1&1&1&1\\ 1&1&1&1&1\\ \end{array}\right]$$
Around the horizontal edge, the convolution gives a higher absolute value than in the more even area. As @copper.hat pointed out, this convolution detects better for horizontal or almost horizontal edges, but not vertical edges, as this convolution kernel does not calculate differences between columns.
For your second question, the 3-by-3 matrix of $\dfrac{1}{9}\left[\begin{array}{rrr} 1&1&1\\ 1&1&1\\ 1&1&1 \end{array}\right]$ does averaging, and you may think that as a low-pass filter to filter out or average out image noises. $\frac{1}{9}$ is better than other values like $\frac{1}{3}$ because this convolution does not increase overall magnitude/amplitude/value/intensity (I don't know the right word for this...). Think of performing this convolution on a large array of $1$'s, you would not wish to make the image brighter (for example, if we are talking about brightness) after denoising.