An early result in studying ring theory for the first time is that $$(x+y)^p=x^p+y^p$$for a commutative ring $R$ with prime characteristic $p$. A seemingly obvious way to do this is to use the Binomial Thm. So obvious, in fact, most proofs I've seen using the Binomial Thm. concentrate mostly on the fact that a prime $p$ divides ${p \choose i}$ (for $1<i<p$) and not so much on the application of the Binomial Thm. itself.
I am confused on how the Binomial Thm. can be used so simply for a general commutative ring $R$ without any more properties given. Expanding and simplifying just a little bit, we have for $x,y\in R$,$$(x+y)^p=\sum_{i=0}^p {p \choose i}x^{p-i}y^i = {p \choose 0}x^py^0+{p \choose 1}x^{p-1}y+...+{p \choose p}x^0y^p$$
Here's my question - as I understand it, unless a ring has unity, a non-zero element raised to the zero power is taken to be undefined. Similar to how negative exponents have no meaning in general rings unless the elements in question are stated to have multiplicative inverses.
This is where I am hung up on this seemingly elementary result. I understand the rest of the proof just fine (i.e., that each of the 'middle' terms of the sum go to $0$ because $p$ divides ${p \choose i}$ for $1<i<p$). If I am missing something painfully obvious here, I apologize. Thank you for any help.