I'm learning about Computer Graphics and there is one point really puzzling me.
I understand that vertices (vectors) represent points in space and that transformation matrices represent changes that could be made to said vertices.
However, when it comes to defining what is visible to a camera, I don't understand how a single matrix can represent the entire space in view.
For example, suppose I have a cube and a cylinder somewhere in 3D space, how can I create a camera that "sees" them? How can I describe a matrix that fits the entire frustum?
I am finding this concept incredibly confusing.
AFAIK you wouldn't consider the concept of a camera as you describe it as an object inside you matrix. There are obviously many ways to achieve a camera, but let me vaguely describe one way. This isn't meant to be technically correct, but give you somewhat of an idea of how it can work.
Think of it more like the following:
Your (model view) matrix is representing your world state. Your "camera" is just looking at that matrix from a fixed position. Instead of moving the camera, you move the entire world (matrix).
You don't rotate the camera left, you rotate your world right.
You don't move your camera forward, you move your world backward.
This way you could define a pane like z=0 as your cameras position and not consider anything with z>0 ("behind the camera") as visible. Starting from there you could span up your frustrum into the distance (up to some arbitrary "max view distance") to determine what would be visible.
So, no matter what the camera sees, it all stays in your world matrix. Your camera is more of an applied manipulation to your world state, in OpenGL stored in a different matrix. I would recommend on reading up on the difference between Model Matrix and View Matrix in OpenGL, there are some nice visual explanations.