# Projection Matrices with Vulkan – Part 2 Deriving a perspective projection matrix for Vulkan

## Recap

Recall that in Part 1 we discussed the differences between OpenGL and Vulkan when it comes to the fixed function parts of the graphics pipeline. We looked at how OpenGL’s use of a left-handed set of coordinate axes for clip-space meant that projection matrices for OpenGL also incorporate a z-axis flip to switch from a right-handed eye space to a left-handed clip space.

We then went on to explain how we can apply a post-view correction matrix that performs a rotation of 180 degrees about the eye-space x-axis which will reorient the eye space axes such that they are aligned with the Vulkan clip space axes.

In this article we shall derive a perspective projection matrix, that transforms a vertex from the rotated eye space into the Vulkan clip space. Thanks to the fact that we have already taken care of aligning the source and destination space axes, all we have to care about is the projection itself. There is no need to introduce any axis inversions or other sleights of hand. We hope that this article when coupled with Part 1 will give you a full understanding of your transformations and allow you to make modifications without adding special cases. Let’s get cracking!

## Defining the Problem

We will look at deriving the perspective projection matrix for a view volume, defined by 6 planes forming a frustum (rectangular truncated pyramid). Let’s assume that the camera is located at the origin O in our “rotated eye space” and looking along the **positive** z-axis. From here on in we will just refer to this “rotated eye space” as “eye space” for brevity and we will use the subscript “eye” for quantities in this space.

The following two diagrams show the view volume from top-down and side-elevation views. You may want to middle-click on them to get the full SVGs in separate browser tabs so that you can refer back to them.

The planes forming the frustum are defined by:

**Near plane**is defined by z_{eye} = n. This is the plane that we will project the vertices on to. Think of it as the window on to the virtual world through which we will look.**Far plane**is defined by z_{eye} = f. This defines the maximum distance to which we can see. Anything beyond this will be clipped to the far plane.**Left and right planes**are defined by specifying the x-coordinate of the near plane x_{eye} = l and x_{eye} = r, then projecting those back to the origin O. Note that r > l.**Top and bottom planes**are defined by specifying the y-coordinate of the near plane y_{eye} = t and y_{eye} = b, then projecting those back to the origin O. Note that b > t which is in the opposite sense that you may be used to. This is because we rotated our eye space coordinate system so that y increases downwards.

Within the view volume, we define a point \bm{p}_{eye} = (x_e, y_e, z_e)^T representing a vertex that we wish to transform into clip space. If we trace a ray back from \bm{p}_{eye} to the origin, then we label the point where the ray crosses the near plane as \bm{p}_{proj} = (x_p, y_p, z_p)^T. Note that \bm{p}_{proj} is still in eye space coordinates.

We know that clip space uses 4 dimensional homogeneous coordinates. We shall call the resulting point in clip-space \bm{p}_{clip} = (x_c, y_c, z_c, w_c)^T. Our job then is to find a 4×4 projection matrix, P such that:

\bm{p}_{clip} = P \bm{p}_{eye} \qquad \rm{or} \qquad \begin{pmatrix} x_c \\ y_c \\ z_c \\ w_c \end{pmatrix} = P \begin{pmatrix} x_e \\ y_e \\ z_e \\ 1 \end{pmatrix} \qquad (\dagger)

## Deriving the Perspective Projection Matrix

Clip space is an intermediate coordinate system used by Vulkan and the other graphics APIs to perform clipping of geometry. Once that is complete, the clip space homogenous coordinates are projected back to Cartesian space by dividing all components by the 4th component, w_c. In order to allow for perspective-correct interpolation of per-vertex attributes to happen, the 4th component must be equal to the eye space depth or w_c = z_e. This normalisation process then yields the vertex position in normalised device coordinates (NDC) as:

\bm{p}_{ndc} = \begin{pmatrix} x_n \\ y_n \\ z_n \end{pmatrix} = \begin{pmatrix} x_c / z_e \\ y_c / z_e \\ z_c / z_e \\ \end{pmatrix} \qquad (\ast)

Since we always want w_c = z_e, this means that the final row of P will be (0, 0, 1, 0). Notice that because our z-axis is aligned with the clip-space z-axis there is no negation required here.

So, at this stage we know that the projection matrix looks like this:

P = \begin{pmatrix} \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ 0 & 0 & 1 & 0 \\ \end{pmatrix}

Let’s carry on and fill in the blanks.

### Projection of the x-coordinate

Looking back at Figure 1, we can see by the properties of similar triangles that:

\frac{x_p}{z_p} = \frac{x_e}{z_e} \implies \frac{x_p}{n} = \frac{x_e}{z_e}

since on the near plane z_p = n. Rearranging this very slightly we get:

x_p = \frac{n x_e}{z_e} \qquad (i)

Let us now consider how the projected vertex positions map through to normalised device coordinates. In Vulkan’s NDC, the view volume becomes a cuboid where -1 \leq x_n \leq 1, -1 \leq y_n \leq 1, and 0 \leq z_n \leq 1. We want the x component of \bm{p}_{ndc} to vary linearly with the x component of the projected point, x_p. If it was not a linear relationship then objects would appear to be distorted across the screen or to move with apparently varying velocities.

We know that the extremities of the view volume in the x direction are defined by x_p = l and x_p = r. These map to -1 and +1 in normalised device coordinates respectively. We can therefore say that at x_p = l, x_n = -1 and x_p = r, x_n = 1. Using this information we can plot the following graph for x_n = m x_p + c.

That’s right, more of your high school maths is going to be used to find the gradient and intercept of this equation!

The gradient, m is given by:

m = \frac{\Delta y}{\Delta x} = \frac{1 - (-1)}{r - l} = \frac{2}{r - l}

Substituting the gradient back in we get a simple equation to solve to find the intercept, c:

x_n = \frac{2 x_p}{r - l} + c

substituting in x_n = 1 and x_p = r:

1 = \frac{2 r}{r - l} + c \implies c = 1 - \frac{2 r}{r - l} \implies c = \frac{r - l - 2r}{r - l} \implies c = - \frac{r + l}{r - l}

We then get the following expression for x_n as a function of x_p:

x_n = \frac{2 x_p}{r - l} - \frac{r + l}{r - l} \qquad (ii)

Substituting in for x_p from equation (i) into equation (ii) and factorising gives:

\begin{align*} x_n &= \frac{2 n x_e}{(r - l) z_e} - \frac{r + l}{r - l} \\ \implies x_n &= \frac{2 n x_e}{(r - l) z_e} - \frac{r + l}{r - l} \frac{z_e}{z_e} \\ \implies x_n &= \frac{1}{z_e} \left( \left( \frac{2n}{r - l} \right) x_e - \frac{r + l}{r - l} z_e \right) \\ \implies x_n z_e &= \left( \frac{2n}{r - l} \right) x_e - \frac{r + l}{r - l} z_e \\ \end{align*}

Recall from the first component of (\ast) that x_n z_e = x_c. Substituting this in for the left-hand side of the previous equation gives:

x_c = \left( \frac{2n}{r - l} \right) x_e - \frac{r + l}{r - l} z_e

which is now directly comparable to the equation for the 1st component of (\dagger) and comparing coefficients allows us to immediately read off the first row of the projection matrix as (\frac{2n}{r - l}, 0, -\frac{r + l}{r - l}, 0). This also makes intuitive sense looking back at Figure 1 as the x component of the clip space point should only depend upon the x and z components of the eye space position (the eye space y component does not affect it).

As it stands here, the projection matrix looks like this:

P = \begin{pmatrix} \frac{2n}{r - l} & 0 & -\frac{r + l}{r - l} & 0 \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ 0 & 0 & 1 & 0 \\ \end{pmatrix}

### Projection of the y-coordinate

The good news, is that the analysis in the y direction is exactly analogous to what we just did for the x direction. Without further ado, from Figure 2 and by the properties of similar triangles and since on the near plane z_p = n:

\frac{y_p}{z_p} = \frac{y_e}{z_e} \implies \frac{y_p}{n} = \frac{y_e}{z_e}.

Which then gives:

y_p = \frac{n y_e}{z_e} \qquad (iii)

We know that the extremities of the view volume in the y direction are defined by y_p = t and y_p = b. These map to -1 and +1 in normalised device coordinates respectively. We can therefore say that at y_p = t, y_n = -1 and y_p = b, y_n = 1. Using this information we can plot the following graph for y_n = m y_p + c.

As before, we have a linear equation to find the gradient and intercept of. The gradient, m is given by:

m = \frac{\Delta y}{\Delta x} = \frac{1 - (-1)}{b - t} = \frac{2}{b - t}

Substituting the gradient back in we get a simple equation to solve to find the intercept, c:

y_n = \frac{2 y_p}{b - t} + c

substituting in y_n = 1 and y_p = b:

1 = \frac{2 b}{b - t} + c \implies c = 1 - \frac{2 b}{b - t} \implies c = \frac{b - t - 2b}{b - t} \implies c = - \frac{b + t}{b - t}

We then get the following expression for y_n as a function of y_p:

y_n = \frac{2 y_p}{b - t} - \frac{b + t}{b - t} \qquad (iv)

Substituting in for y_p from equation (iii) into equation (iv) and factorising gives:

\begin{align*} y_n &= \frac{2 n y_e}{(b - t) z_e} - \frac{b + t}{b - t} \\ \implies y_n &= \frac{2 n y_e}{(b - t) z_e} - \frac{b + t}{b - t} \frac{z_e}{z_e} \\ \implies y_n &= \frac{1}{z_e} \left( \left( \frac{2n}{b - t} \right) y_e - \frac{b + t}{b - t} z_e \right) \\ \implies y_n z_e &= \left( \frac{2n}{b - t} \right) y_e - \frac{b + t}{b - t} z_e \\ \end{align*}

Recall from the second component of (\ast) that y_n z_e = y_c. Substituting this in for the left-hand side of the previous equation gives:

y_c = \left( \frac{2n}{b - t} \right) y_e - \frac{b + t}{b - t} z_e

This time, comparing to the second component of (\dagger) we can read off the coefficients for the second row of the projection matrix as (0, \frac{2n}{b - t}, -\frac{b + t}{b - t}, 0). Once again a quick intuitive check against Figure 2 matches what we have found. The projected and clip space y coordinates do not depend upon the x component of the eye space position.

At the three quarters stage, the projection matrix is now:

P = \begin{pmatrix} \frac{2n}{r - l} & 0 & -\frac{r + l}{r - l} & 0 \\ 0 & \frac{2n}{b-t} & -\frac{b + t}{b - t} & 0 \\ \cdot & \cdot & \cdot & \cdot \\ 0 & 0 & 1 & 0 \\ \end{pmatrix}

We are almost there now. We have just the z-axis mapping left to deal with.

### Mapping the z-coordinate

The analysis of the z-axis is a little different to that of the x and y dimensions. For Vulkan, we wish to map eye space depths such that:

- the near plane, z_e = n, maps to z_n = 0 and
- the far plane, z_e = f, maps to z_n = 1

The z components of the projected point and the normalised device coordinates point should not depend upon the x and y components. This means that for the 3rd row of the projection matrix the first two elements will be 0. The remaining two elements we will denote by A and B respectively:

P = \begin{pmatrix} \frac{2n}{r - l} & 0 & -\frac{r + l}{r - l} & 0 \\ 0 & \frac{2n}{b-t} & -\frac{b + t}{b - t} & 0 \\ 0 & 0 & A & B \\ 0 & 0 & 1 & 0 \\ \end{pmatrix}

Combining this with the 3rd row of (\dagger) we see that:

z_c = A z_e + B

Now if we divide both sides by z_e and recalling that from (\ast) that z_n = z_c / z_e we can write:

z_n = A + \frac{B}{z_e}. \qquad (v)

Substituting in our boundary conditions (shown in the bullet points above) into equation (v) we get a pair of simultaneous equations for A and B:

\begin{align*} A + \frac{B}{n} &= 0 \qquad (vi) \\ A + \frac{B}{f} &= 1 \qquad (vii) \\ \end{align*}

We can subtract equation (vi) from equation (vii) to eliminate A:

\begin{gather*} \frac{B}{f} - \frac{B}{n} = 1 \implies \frac{Bn - Bf}{nf} = 1 \implies \frac{B(n - f)}{nf} = 1 \implies B = \frac{nf}{n - f} \\ \implies B = - \frac{nf}{f - n} \qquad (viii) \\ \end{gather*}

Now to find A we can substitute (viii) back into (vii):

\begin{align*} A &- \frac{n f}{f(f - n)} = 1 \\ \implies A &- \frac{n}{f - n} = 1 \\ \implies A &= 1 + \frac{n}{f - n} \\ \implies A &= \frac{f - n + n}{f - n} \\ \implies A &= \frac{f}{f - n} \qquad (ix) \end{align*}

Substituting equations (viii) and (ix) back into the projection matrix, we finally arrive at the result for a perspective projection matrix useable with Vulkan in conjunction with the post-view rotation matrix from Part 1:

P = \begin{pmatrix} \frac{2n}{r - l} & 0 & -\frac{r + l}{r - l} & 0 \\ 0 & \frac{2n}{b-t} & -\frac{b + t}{b - t} & 0 \\ 0 & 0 & \frac{f}{f - n} & - \frac{n f}{f - n} \\ 0 & 0 & 1 & 0 \\ \end{pmatrix} \qquad (x)

## Using the Projection Matrix in Practice

Recall that equation (x) is the matrix to perform the projection operation from the **rotated eye space** coordinates to the right-handed **clip space** coordinates used by Vulkan. What does this mean? Well, it means that we should include the post-view correction matrix into our calculations when transforming vertices. Given a vertex position in model space, \bm{p}_{model}, we can transform it into clip space by the following:

\bm{p}_{clip} = P X V M \bm{p}_{model}

As we saw in Part 1, the post-view correction matrix is just a constant that performs the 180 degree rotation about the x-axis, we can combine this into our calculation of the projection matrix, P. This is analogous to how the OpenGL projection matrix typically includes the z-axis flip to change from a right-handed to left-handed coordinate system. Combining the post-view rotation and Vulkan projection matrix gives:

\begin{align*} Q &= P X \\ \implies Q &= \begin{pmatrix} \frac{2n}{r - l} & 0 & -\frac{r + l}{r - l} & 0 \\ 0 & \frac{2n}{b-t} & -\frac{b + t}{b - t} & 0 \\ 0 & 0 & \frac{f}{f - n} & - \frac{n f}{f - n} \\ 0 & 0 & 1 & 0 \\ \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{pmatrix} \\ \implies Q &= \begin{pmatrix} \frac{2n}{r - l} & 0 & \frac{r + l}{r - l} & 0 \\ 0 & -\frac{2n}{b-t} & \frac{b + t}{b - t} & 0 \\ 0 & 0 & -\frac{f}{f - n} & - \frac{n f}{f - n} \\ 0 & 0 & -1 & 0 \\ \end{pmatrix} \qquad (xi) \\ \end{align*}

**Edit: Fixed the signs of the 1st and 2nd rows of the 3rd column in the matrix above (copy and paste error). Thanks to FourierTransform and Ziflin in the comments for pointing this out!**

Before you rush off and implement equation (xi) in your favourite editor and language, there is one final piece of subtlety to consider! Recall that when we began deriving the perspective projection matrix, we set things up so that our source coordinate system was the rotated eye space so that its axes were already aligned with the clip space destination coordinate system. Refer back to Figures 1 and 2 and note the orientation of the axes. In particular that the y axis increases in a downward direction.

The thing to keep in mind is that the parameters used in (xi) are actually specified in the rotated eye space coordinate system. This has implications:

**x axis:**Nothing to change here. Since we rotate about the x-axis to get from eye space to rotated eye space, the x component of any position does not change.**y axis:**The 180 degree rotation about the x axis will affect the y components of any positions. The following diagram shows a blue view volume in the non-rotated eye space – the z-axis increases to the left and the near plane is positioned on the negative z side. The view volume is in the upper right quadrant and in this case both the top and bottom values for the near plane are positive. In the lower left quadrant, in green, we also show the rotated view volume. Notice that the 180 degree rotation causes the**signs of the t and b parameters to be negated**.**z axis:**Technically, the 180 degree rotation would also negate the z components of any positions. However, developers are already used to specifying the near and far plane parameters, n and f, as distances from the z_e = 0 plane. This is exactly what happens when creating an OpenGL projection matrix for example. Since we already specified n and f as positive values in the rotated eye space, we can just treat the inputs to any function that we write as positive distances for the near and far plane and stay in keeping with what developers are used to.

Putting this together, we can create a function to produce a Vulkan projection matrix and optionally have it incorporate the post-view correction rotation matrix. All we have to remember is that if we are opting in to include the post-view correction, then the top and bottom parameters are treated as being specified in the non-rotated eye space. If we do not opt in, then they are specified in rotated eye space.

In practise, this works well because often you want to minimise the amount of floating point arithmetic going on per frame so opting in allows the developer to specify top and bottom in the usual eye space coordinates which is closer to the chosen world space system (often y-up too), than the rotated eye space.

Using the popular glm library, we can declare a function as:

```
enum class ApplyPostViewCorrection : uint8_t {
No,
Yes
};
struct AsymmetricPerspectiveOptions {
float left{ -1.0f };
float right{ 1.0f };
float bottom{ -1.0f };
float top{ 1.0f };
float nearPlane{ 0.1f };
float farPlane{ 100.0f };
ApplyPostViewCorrection applyPostViewCorrection{ ApplyPostViewCorrection::Yes };
};
glm::mat4 perspective(const AsymmetricPerspectiveOptions &options);
```

The implementation turns out to be very easy once we know equations (x) and (xi):

```
glm::mat4 perspective(const AsymmetricPerspectiveOptions &options)
{
const auto twoNear = 2.0f * options.nearPlane;
const auto rightMinusLeft = options.right - options.left;
const auto farMinusNear = options.farPlane - options.nearPlane;
if (options.applyPostViewCorrection == ApplyPostViewCorrection::No) {
const auto bottomMinusTop = options.bottom - options.top;
const glm::mat4 m = {
twoNear / rightMinusLeft,
0.0f,
0.0f,
0.0f,
0.0f,
twoNear / bottomMinusTop,
0.0f,
0.0f,
-(options.right + options.left) / rightMinusLeft,
-(options.bottom + options.top) / bottomMinusTop,
options.farPlane / farMinusNear,
1.0f,
0.0f,
0.0f,
-options.nearPlane * options.farPlane / farMinusNear,
0.0f
};
return m;
} else {
// If we are applying the post view correction, we need to negate the signs of the
// top and bottom planes to take into account the fact that the post view correction
// rotate them 180 degrees around the x axis.
//
// This has the effect of treating the top and bottom planes as if they were specified
// in the non-rotated eye space coordinate system.
//
// We do not need to flip the signs of the near and far planes as these are always
// treated as positive distances from the camera.
const auto bottom = -options.bottom;
const auto top = -options.top;
const auto bottomMinusTop = bottom - top;
// In addition to negating the top and bottom planes, we also need to post-multiply
// the projection matrix by the post view correction matrix. This amounts to negating
// the y and z axes of the projection matrix.
const glm::mat4 m = {
twoNear / rightMinusLeft,
0.0f,
0.0f,
0.0f,
0.0f,
-twoNear / (bottomMinusTop),
0.0f,
0.0f,
(options.right + options.left) / rightMinusLeft,
(bottom + top) / bottomMinusTop,
-options.farPlane / farMinusNear,
-1.0f,
0.0f,
0.0f,
-options.nearPlane * options.farPlane / farMinusNear,
0.0f
};
return m;
}
}
```

## Summary

In this article we have shown how to build a perspective projection matrix to transform vertices from rotated eye space to clip space all from first principles. The requirement for perspective correct interpolation and the perspective divide yielded the 4th row of the the projection matrix. We have then shown how we can construct a linear relationship between the x or y components of the eye space projected point on the near plane to the normalised device coordinate point, and from there back to clip space. We then showed how to map the eye space depth component onto the normalised device coordinate depth. Finally we have given some practical tips about combining the projection matrix with the post-view rotation matrix.

We hope that this has removed some of the mystery surrounding the perspective projection matrix and how using an OpenGL projection matrix can cause your rendered results to be upside down. Armed with this knowledge you will have no need for the various hacks mentioned earlier.

In the next article, we will take a look at some more variations on the projection matrix and some more tips for using it in applications. Thank you for reading!

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

Thanks!

However just to be clear, all of this is assuming that camera space basis is y-up and right-handed, right? Therefore the vertices that we want to render must have a z negative component in camera space?

Also, I haven’t read the code yet but there’s a mistake with Q = PX (after the part “Combining the post-view rotation…”): The first and second component of the third column should be positive.

Btw glm seems to have a right-handed perspective matrix with depth 0 to 1 (perspectiveRH_ZO), but I’m not sure if it’s the same as what you did here.

It does appear that Q=PX (xi) equation is wrong as you mentioned. It looks like it is correct in the code. I’m also not sure the explanation for negating bottom and top when using it is correct. The reason is that we are used to specifying top and bottom values in a “Y-up” space and therefore want to use top=1 and bottom=-1. For use with Vulkan’s *Y-down* clip space, these need to be negated. This is why the solutions you mentioned at the start of the last article are needed (including here by negating them in the perspective matrix). The amount of rotation needed to ‘correct’ the desired world/view space should not matter.

Thanks for finding the typos with the negated terms. Fixed now. The assumption is that we are in a right-handed coordinate system all the way through from world space to clip space. The comments about flipping the signs of bottom, top and z are just for specifying the projection matrix. With OpenGL, the z should technically be specified negative but it’s always treated as just distance from the z = 0 plane due to the z-axis flip that happens. For Vulkan with its right-handed clip space, the near and far are the z-coordinates of the planes (as well as the distances).

I still find discussing coordinate systems very tricky as different people think about them in different ways. I find this approach of pre-rotating eye space to align it with the clip space axes to be useful for me. Anyway, I hope that you found the article interesting and thanks again for commenting!