I'm reading the Wikipedia page for Convolutional Neural Networks along with some other papers and references. In the wiki article under the section called 'Building Blocks', subsection called 'Convolutional layer', there is the first mention of the input volume depth: "The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field, but extend through the full depth of the input volume.".
Can someone please explain what is meant by this ? What is the length, breadth and depth of the input volume ? Is the length just the number of pixels in one dimension, the breadth the number of pixels in the other dimension and the volume simply the number of features for the image ?