Chapter 5 Viewing
The model we use to describe our graphics is that of a virtual camera viewing an object. In this chapter we consider a number of ways in which the virtual camera might be described.
There are a number of important topics covered in this chapter.
1) the use of the
model-view matrix to switch from the world frame in which we defined
our objects to their
representation in a frame in which the camera is at the origin.
2) the selection of a
projection method to use – either parallel or perspective
3) the
formation of a projection matrix to be concatenated with the model-view matrix.
Classical Viewing
We examine classical viewing techniques first for at least two reasons.
1) Many of the tasks
previously done by hand drawing are now being done by computer
graphics, so we should be able
to automate the classical hand-drawing techniques.
2) Study of the
classical methods helps us to understand some of the difficulties
associated with computer
graphics.
The concept of viewing is based on the idea of projections, with the two main varieties being parallel and perspective. Parallel projections are the easiest to understand. Consider parallel projection onto the (X, Y)-plane: every point (X, Y, Z) in three-dimensional space is projected onto the point (X, Y) and every line connecting points (X1, Y1, Z1) and (X2, Y2, Z2) is projected onto a line connecting points (X1, Y1) and (X2, Y2) in the (X, Y)-plane. These projections are useful for a number of representations, but have the drawback that they do not represent depth information or present an illusion of three dimensions.
Perspective drawing originated as a way to represent three-dimensional scenes on two-dimensional surfaces, such as canvas. One of the primary tricks in representing a three-dimensional scene is the vanishing point, a point to which parallel lines seem to converge. This effect is seen when looking at railroad tracks vanishing into the distance; visually the two tracks seem to get closer together as the distance from the observer increases.
The implementation of this idea in a computer graphics system is based on a point called the COP (Center of Projection). The graphic image of an object is formed on a plane called the Projection Plane. The method of forming an image of the object is to form projections of the essential vertices in the object and to connect these projected vertices by lines appropriate to rendering the object. The method of projection of a vertex is a straight line, called a Projector, from the vertex to the COP. The intersection of the projector and the projection plane forms the image of the vertex. Perspective viewing is the center of much of computer graphics, including all of animation, in which realistic images are important.
In considering the topic of viewing in computer graphics, we are using the camera model in which the lens of the camera is at the origin of its coordinate system. We now consider positioning the camera and then consider simple methods of projection.
Positioning the Camera
In the default set-up, the camera is positioned at the origin of coordinates. The first thing to consider when building a graphic scene is the possibility that the camera should be located at some other point. Generally, we use rotation and translation to specify the camera position.
Translation is used to move the camera to a selected spot and rotation is used to specify the direction in which the camera is pointed.
The camera frame is specified by two vectors – the normal vector determining the way in which the camera is pointing and a vector indicating which way is up. The up vector is not required to be perpendicular to the normal vector, but cannot be collinear with it. It is an interesting question to speculate what would happen if the camera is pointed in the up position; that is, if the normal vector were parallel to the up vector. It is likely that the up vector is defined to indicate what will appear up in the camera view. Consider a camera flat on the floor pointed up at the ceiling. The “up vector” would be parallel to the floor.
The camera coordinate system is called either the viewing-coordinate system or the u-v-n system, the latter named after the three vectors U, V, and N used to specify the system. We now describe the method of generating these three vectors from the two vectors used to specify the coordinate system.
The vector N is the easiest to specify, it is the normal vector indicating the pointing direction of the camera. The vector V is next specified, based on the up vector and the normal vector.
Let Vup be the up vector. If Vup is perpendicular to the vector N (Vup · N = 0), then V = Vup. Otherwise, V is the projection of Vup on the plane perpendicular to the vector N. This component of Vup is determined by first determining the component parallel to N and then subtracting that component from Vup, as shown in the figure below.
In
this figure, I use the symbol W to represent the vector Vup,
to avoid excessive subscripts. We
consider W and N as rooted at the origin of the camera’s
frame. Since W is neither
perpendicular nor parallel to N, it can be broken into a component
parallel to N and a component perpendicular to N. We first compute the component of W
parallel to N.
We consider the dot product of W
and N, expressed in terms of the cosine of the angle between the two
vectors: W·N
= |W|·|N|·cos q, so
the magnitude of the vector W parallel to N is given by |W|·cos q =
and the vector
component of W parallel to N is given by
, as N·N = |N|2.
In terms of the vector Vup,
we have the component parallel to N as
and the component of Vup
parallel to N as V = Vup –
. In a bit simpler
form, the component is V = Vup – a·N,
where a
=
is a real number.
We generate U, the third vector of the u-v-n system, as the cross-product of the two vectors N and V. In generating these three vectors describing the camera coordinate system, it is common to represent them as unit vectors.
As an example, consider a
camera with orientation given by the vector (0, 3, 4) with the up vector being
given by Vup = (0, 0, 1).
In this example, we shall convert all of the vectors to unit
vectors. The first step is to convert
the vector (0, 3, 4) to a unit vector for N; thus
N = (0.0, 0.6, 0.8).
With N already
normalized, the formula for V is V = Vup – a·N,
where a
= Vup·N, so
a = (0, 0, 1)·(0.0, 0.6, 0.8) = 0 + 0 + 0.8 = 0.8, and
V = (0, 0, 1) – 0.8·(0.0, 0.6, 0.8) = (0, 0, 1)
– (0.0, 0.48, 0.64)
= (0.0, – 0.48, 0.36).
As computed, V is not a unit vector. |V|2 = (0.48)2 + (0.36)2 = 0.36, so |V| = 0.6 and we use the unit vector V / |V| = (0.0 / 0.6, – 0.48 / 0.6, 0.36 / 0.6) = (0.0, – 0.8, 0.6).
The third vector U is then determined as the cross product U = V x N, or
U =
, so Ux = – 0.64 – 0.36 = – 1.0, Uy
= 0.0, and Uz = 0.0.
In summary, we have the u-v-n
coordinate system described by the three unit vectors
U = ( – 1.0, 0.0, 0.0 )
V = ( 0.0, – 0.8, 0.6)
N = ( 0.0, 0.6, 0.8).
OpenGL provides a number of ways to specify these vectors. The first method involves three functions to specify the camera location, direction of view, and direction of the up vector.
The easier way to specify this specifies the location of a point that the camera is “looking at” and constructs the vector N along the line from the camera to the object.
glutLookAt (CX, CY, CZ, AtX, AtY, AtZ, UpX, UpY, UpZ).
Perspective Projections
We consider a camera pointed along the – z-axis. Images are projected onto a plane perpendicular to the z-axis specified by the equation z = d. These projections are formed by a projector from the point (x, y, z) to the origin of coordinates. The coordinates of the projection are (xp, yp, zp) = (xp, yp, d).
We can use similar triangles to compute (xp, yp, zp) as a function of (x, y, z) and d. The results are:
![]()
![]()
and
.
These equations are nonlinear, representing a nonuniform foreshortening. There are a number of effects of this projection, including the fact that the images of objects farther from the center of projection appear smaller.

Another effect of this projection is that it is irreversible. To understand this point, consider the illustration of a projection onto the plane z = 1 (O.K. – it really should be negative). The point (2a, 2b, 2) projects onto the point (a, b, 1) on the plane of projection, but so do the points (3a, 3b, 3), (4a, 4b, 4), and so on. The issue is that in projecting a three-dimensional point onto a two-dimensional plane, we lose information on distance from the camera.
There are a number of situations in which we want to preserve distance information. Consider the problem of hidden surface removal. Objects closer to the camera will obscure parts of objects farther from the camera. We want our graphical representation to reflect that. Consider the following figure with a square and a circle.
What
we see here suggests that the circle is in front of the square, or equivalently
that the square is at a greater distance from the camera. We want our rendering routines to have drawn
this figure only if what is suggested is actually the case and not just because
the square was drawn first and then the circle was drawn. The only way to insure correct depth cues is
to store depth information.
The way to handle this problem is to introduce a modification of homogeneous coordinates. Recall that, up to this time, we have used homogeneous coordinates only to distinguish points from vectors, so that for coordinates (x, y, z),
is a vector and
is a point. We have
constructed transformation matrices and shown that these four-dimensional
entities transform in a way appropriate to the three dimensional vectors and
points being represented. We extend the
idea of homogeneous coordinates to represent depth information for projected
points by using a non-zero real number in the forth position of the
representation.
Thus for w ¹ 0.0,
we have the point
by the representation
. We now state the
conclusion that should be obvious from the above formulation. Suppose a point with homogeneous coordinate
representation
. This is the point
.
We can now represent the perspective projection onto the plane z = d by the matrix
M =
. Note that the
determinant of M is 0.
M projects the point p =
to the point q =
·
=
.
Recalling that we retrieve the projected point from this representation of point q by dividing the first three components by the fourth component, so
![]()
![]()
and
, as we have stated previously.
This process is called perspective division.
Orthographic projections are a bit easier to represent. The orthographic projection onto the plane z = d is represented by the matrix
M =
, with d = 0 for projection onto the plane z = 0.
The reader is invited to verify
that
·
=
.
The first topic we must discuss is that of clipping. Clipping arises from the fact that a camera has a finite field of view. Technically, this field of view is the projection of the imaging surface through the lens onto object space. Since the last statement might be somewhat confusing, we give a simple example of clipping. Stand about four feet from the center of a standard window and look out of the building. The edges of the window limit the field of view in a way similar to the limits placed on a camera.
In our example, we consider the window as the plane of projection and consider the field of view as being defined by the limits of the window – it would be a truncated pyramid with the part of the pyramid between the window and the viewer’s eye not being considered. In theory the pyramid is semi-infinite in that it extends to infinity – whatever that means.
Consider the following thought experiment. One is looking out of a window at the Andromeda galaxy, usually considered the most distant object visible to the unaided human eye. In clear skies, with good viewing the galaxy is easily visible with the major problem being light pollution from nearby cities. The galaxy is at a distance of about two million light years (about 1.9 · 1021 meters) – not an infinite distance, but big enough for me.
The viewing volume in OpenGL is defined as a frustum, which is a finite pyramid with the top cut off. There is a near (front) clipping plane and a far (back) clipping plane. Points closer to the camera than the front clipping plane or farther than the back clipping plane are not projected into the view. Orthographic views are handled in a similar fashion except that the frustum is replaced by a right parallelepiped, in which the rectangle forming the front clipping plane is the same size as the rectangle forming the back clipping plane.
In both perspective and orthographic viewing, one of the specifications of the view volume is the dimensions of the rectangle serving as the front clipping plane, denoted by the variables Xmin, Xmax, Ymin, and Ymax. Based on these numbers, we define the aspect ratio as
Aspect =
. Typically, the
aspect ratio is greater than 1, although certain displays specialized for word
processing have aspect ratios approximating 8.5/11.0 = 0.77.
As an example, the aspect ratio of a VGA display is 640 / 480 = 4/3.
Another way to specify the view volume for perspective viewing is by use of field of view as well as the near and far clipping distances. The field of view is described by two angles, one for the angle in the X-direction and one for the angle in the Y-direction. An equivalent definition provides one field of view (perhaps the Y-direction) and the aspect ratio. The field of view is easily computed for the symmetric case in which Xmin = – Xmax and Ymin = – Ymax. For example, the field of view in the x direction is given by
QX = 2·tan–1(Xmax / near).
There are two OpenGL functions used to specify perspective views. These functions seem to be self-explanatory.
glFrustrum (xmin, xmax, ymin, ymax, near, far)
gluPerspective (fovy, aspect, near, far)
There is one OpenGL function used to specify orthographic (parallel) views.
glOrtho (xmin, xmax, ymin, ymax, near, far)