Here's an algorithm that multiplies two $n\times n$ matrices (actually it outputs $A\cdot A$)

The number of additions that are carried out is $n^2(n − 1)$ and the number of multiplications is $n^3$. The number of store-instructions is $n^2(n + 1)$. The number of read-instructions is of similar magnitude.
Why is the number of additions not $n^3$? I also don't see why the number of store-instructions is $n^2(n + 1)$.
The way this is written, there are $n^3$ additions.
This can be reduced to $n^2(n-1)$ additions by initializing $c_{i,j} = a_{i,1}a_{1.j}$ and have $k$ go from $2$ to $n$.
If I were coding this, I would use a temporary for $c_{i,j}$ and store the total at the end.