And to explain why, it’s because when you’re transforming the camera, you’re actually transforming the vertices of all the meshes in the “world” (in space). The camera does not exist, it’s simply an object that calculates and holds the matrices which describe the transformation of vertices. Matrices are used since they simplify the calculations that are needed to describe a transformation. When you want to translate “the camera” on the X axis +100px, you’re actually wanting to transform all the vertices that are drawn in the world -100px.
Camera is real object in same sense than anything else. You can use all the same code for calculating its transform than for rest of objects. But then you just calculate view matrix by taking the inverse of its transform matrix.