KDAB

Simplifying 3D Stereo Visualization – an Automated Approach

Thu, 22 May 2025 06:51:05 GMT

Simplifying 3D Stereo Visualization – an Automated Approach

As early as the 1980s, the first stereo 3D visualizations appeared on computers using shutter glasses and anaglyph glasses. The theoretical foundations of stereoscopy are largely established and considered solved. Nevertheless, there is still room for improvement when it comes to usability.

Together with Schneider Digital, a leading manufacturer of 3D stereo display systems, KDAB tackled the issue of usability by developing a demonstrator. In a second phase, we applied the findings and, with support from Bullinger GmbH as a sponsor, created a 3D stereo prototype for QGIS, an open-source geospatial software.

Screenshot: Schneider / KDAB 3D-Stereo-Demonstrator OpenGL / Vulkan

Static configurations in 3D visualization

When considering static configurations - such as the playback of a 3D movie in a cinema - all parameters are predefined. The screen size is approximately the same for all viewers. Virtual conditions such as the virtual camera distance, the focal distance to the object, and the “pop-out” effect are either fixed or implicitly embedded in the media. Viewers can simply put on their 3D glasses and enjoy the experience without any prior setup.

Challenges of dynamic real-time 3D scenes

The situation is different with dynamic, real-time 3D scenes. Display sizes vary, and the virtual scale of the scenes can differ significantly. This introduces several parameters that must be configured in advance to ensure a comfortable 3D stereo experience:

Camera separation: The virtual distance between the two cameras responsible for rendering the left and right eye perspectives.
Focal distance: The virtual distance from the camera to the object of focus. At this point, the two camera frustums intersect, and the images for both eyes align on the screen plane.
Pop-out: Determines how much a 3D object appears to protrude from the screen (positive values) or recede into it (negative values).
FOV (Field of View): The vertical field of view of the virtual camera.

3D PluraView

Automating parameter configuration for user comfort

The goal is to define all these parameters automatically, based on virtual and physical conditions, so that the user ideally doesn’t need to adjust anything manually.

The field of view can be determined relatively easily: While it's generally a free parameter set according to the task (e.g., CAD, games, etc.) or personal preference, we can compute a good starting value from the real-world viewing angle, which depends on the display size and physical distance from the viewer.

Automatically determining the focal distance is more challenging. The trivial options are: 1) The user adjusts a slider to manually set the focal distance, or 2) the user clicks in the 3D scene to set a focal point. While the second method is relatively convenient, both require user interaction. We’ve developed a third approach inspired by digital cameras. Most allow users to define a "focus area“, and the camera adjusts its optics to focus within this region. Similarly in our case, we cast several rays within a defined focus area into the 3D scene and calculate the optimal focal distance using median or average depth values. This focus field is adjustable in size and position, initially centered in the image, and typically works out of the box. The focal distance is then updated each frame, allowing continuous adaptation. Within the focus field, incorrect focus is effectively eliminated.

3D PluraView

Fine-tuning the 3D experience

Once the focal distance is determined, objects closer to the viewer appear in front of the screen (popping out), and those farther away appear inside it. Depending on taste or application, this impression can be fine-tuned by shifting the pop-out effect. Technically, this can be done by adjusting the focal distance but requiring the user to manually do this whenever focal distance changes would be inconvenient. Therefore, we introduced a dedicated pop-out parameter in our demonstrator, which indirectly adjusts the focal distance each frame “under the hood” without requiring any user input.

Defining camera separation for optimal viewing

The final piece is camera separation, which may initially seem counterintuitive. Aren’t human eyes all roughly the same distance apart? Yes and no. 3D graphics coordinates are not bound to real-world units - a "unit" might represent a millimeter, a meter, or even a kilometer. So what value should be used for camera separation? A common solution, also used in our demonstrator, is to define a base value (e. g., 1/30) and multiply it by the focal distance to derive the camera separation. While this isn’t physically accurate - human eye distance doesn’t change (at least that would be disconcerting) - it works well in practice for stereo 3D rendering.

VR Wall

Conclusion: an immersive 3D stereo experience

With these techniques, we’ve managed to define all four parameters in a way that allows users to immediately immerse themselves in a 3D stereo experience without needing to configure settings beforehand or during runtime.

The stereo demonstrator runs on both OpenGL and Vulkan. The source code for the OpenGL / Qt3D version is freely available at: https://github.com/KDABLabs/stereo3ddemo

The post Simplifying 3D Stereo Visualization – an Automated Approach appeared first on KDAB.

KDGpu 0.5.0 is here!

Thu, 30 May 2024 06:30:29 GMT

KDGpu 0.5.0 is here!

Since we first announced it last year, our Vulkan wrapper KDGpu has been busy evolving to meet customer needs and our own. Our last post announced the public release of v0.1.0, and version 0.5.0 is available today. It's never been easier to interact with modern graphics technologies, enabling you to focus on the big picture instead of hassling with the intricacies and nuances of Vulkan.

The PBR example in the new KDGpu Examples repository.

Wider device support

KDGpu now supports a wider array of devices, such as older versions of Android. For some context, additional features in Vulkan are supported by extensions. If said features become part of the "core" specification, they are automatically included in Vulkan 1.2, 1.3 and so on. In the past, KDGpu required the device to fully support Vulkan 1.2, which limited what devices you could target. In newer KDGpu versions (>0.4.6) it will now run on certain 1.1 devices (like the Meta Quest) as long as the required extensions are supported.

A KDGpu example running natively on an Android device.

We also added native examples for Android, which can be ran straight from Android Studio! There's also better iOS support alongside a native Apple example.

the KDGpu Hello Triangle example running in the iOS simulator

External memory and images support

When writing applications using KDGpu, you will inevitably have to interface with other APIs or libraries that don't support it or maybe not even Vulkan specifically. For example, if you generate an image using Vulkan graphics and then need to pass that to CUDA for further processing. Now with KDGpu it's possible to grab texture and buffer objects and get their external memory handles:

const TextureOptions textureOptions = {
    .type = TextureType::TextureType2D,
    .format = Format::R8G8B8A8_SNORM,
    .extent = { 512, 512, 1 },
    .mipLevels = 1,
    .usage = TextureUsageFlagBits::SampledBit,
    .memoryUsage = MemoryUsage::GpuOnly,
    .externalMemoryHandleType = ExternalMemoryHandleTypeFlagBits::OpaqueFD,
};

Texture t = device.createTexture(textureOptions);
const MemoryHandle externalHandleOrFD = t.externalMemoryHandle();

Additionally, we have added methods to adopt existing VkImages as native KDGpu objects to better support libraries like OpenXR.

Easy & fast XR

OpenXR is the leading API used for writing cross-platform VR/AR experiences. Like Vulkan, code directly using OpenXR tends to be verbose and requires a lot of setup. To alleviate this, KDGpu now includes an optional library called KDXr. It wraps OpenXR, and it even easily integrates into KDGpu. It takes care of initialization, has the C++ classes you expect and can make it painless to integrate XR functionality into your application including support for XR compositor layers, head tracking, input handling and haptic feedback.

For example, to set up a projection view you subclass the ProjectionLayer type:

class ProjectionLayer : public XrProjectionLayer
{
public:

And implement the required methods like renderView() to start rendering into each eye:

void ProjectionLayer::renderView()
{
    m_fence.wait();
    m_fence.reset();

    // Update the scene data once per frame
    if (m_currentViewIndex == 0) {
        updateTransformUbo();
    }

    // Update the per-view camera matrices
    updateViewUbo();

    auto commandRecorder = m_device->createCommandRecorder();

    // Set up the render pass using the current color and depth texture views
    m_opaquePassOptions.colorAttachments[0].view = m_colorSwapchains[m_currentViewIndex].textureViews[m_currentColorImageIndex];
    m_opaquePassOptions.depthStencilAttachment.view = m_depthSwapchains[m_currentViewIndex].textureViews[m_currentDepthImageIndex];
    auto opaquePass = commandRecorder.beginRenderPass(m_opaquePassOptions);

   // Do the rest of your rendering commands to this pass...

And add this layer to the compositor, in our examples this is abstracted away for you:

// Create a projection layer to render the 3D scene
const XrProjectionLayerOptions projectionLayerOptions = {
    .device = &m_device,
    .queue = &m_queue,
    .session = &m_session,
    .colorSwapchainFormat = m_colorSwapchainFormat,
    .depthSwapchainFormat = m_depthSwapchainFormat,
    .samples = m_samples.get()
};
m_projectionLayer = createCompositorLayer<ProjectionLayer>(projectionLayerOptions);
m_projectionLayer->setReferenceSpace(m_referenceSpace);

You can view the complete example here. In this new release, we're continuing to work on multiview support! KDXr supports multiview out of the box (see the example layer code) and you can check out the multiview example.

More in-depth examples are now available

The examples sitting in our main repository are no more than small tests, which don't show the true benefits of using KDGpu in large graphical applications. So, in addition to our previous examples, we now have a dedicated KDGpu Examples repository!

Screenshot from our N-Body Compute example.

And more!

There are also small improvements such as being able to request custom extensions and ignore specific validation layer warnings. Check out the changelog on GitHub for a full list of what's been changed.

Let us know what you think about the improvements we've made, and what could be useful for you in the future!

The post KDGpu 0.5.0 is here! appeared first on KDAB.

Projection Matrices with Vulkan - Part 2

Sean Harmer — Thu, 21 Dec 2023 09:00:00 GMT

Projection Matrices with Vulkan - Part 2

Recap

Recall that in Part 1 we discussed the differences between OpenGL and Vulkan when it comes to the fixed function parts of the graphics pipeline. We looked at how OpenGL's use of a left-handed set of coordinate axes for clip-space meant that projection matrices for OpenGL also incorporate a z-axis flip to switch from a right-handed eye space to a left-handed clip space.

We then went on to explain how we can apply a post-view correction matrix that performs a rotation of 180 degrees about the eye-space x-axis which will reorient the eye space axes such that they are aligned with the Vulkan clip space axes.

Rotating the eye space coordinate axes to align them with the Vulkan clip space axes as a step prior to applying the projection matrix.

In this article we shall derive a perspective projection matrix, that transforms a vertex from the rotated eye space into the Vulkan clip space. Thanks to the fact that we have already taken care of aligning the source and destination space axes, all we have to care about is the projection itself. There is no need to introduce any axis inversions or other sleights of hand. We hope that this article when coupled with Part 1 will give you a full understanding of your transformations and allow you to make modifications without adding special cases. Let's get cracking!

Defining the Problem

We will look at deriving the perspective projection matrix for a view volume, defined by 6 planes forming a frustum (rectangular truncated pyramid). Let's assume that the camera is located at the origin

in our "rotated eye space" and looking along the positive z-axis. From here on in we will just refer to this "rotated eye space" as "eye space" for brevity and we will use the subscript "eye" for quantities in this space.

The following two diagrams show the view volume from top-down and side-elevation views. You may want to middle-click on them to get the full images in separate browser tabs so that you can refer back to them.

The planes forming the frustum are defined by:

Near plane is defined by

. This is the plane that we will project the vertices on to. Think of it as the window on to the virtual world through which we will look.

Far plane is defined by

. This defines the maximum distance to which we can see. Anything beyond this will be clipped to the far plane.

Left and right planes are defined by specifying the x-coordinate of the near plane

and

, then projecting those back to the origin

. Note that

Top and bottom planes are defined by specifying the y-coordinate of the near plane

and

, then projecting those back to the origin

. Note that

which is in the opposite sense that you may be used to. This is because we rotated our eye space coordinate system so that y increases downwards.

Within the view volume, we define a point

representing a vertex that we wish to transform into clip space. If we trace a ray back from

to the origin, then we label the point where the ray crosses the near plane as

. Note that

is still in eye space coordinates.

We know that clip space uses 4 dimensional homogeneous coordinates. We shall call the resulting point in clip-space

. Our job then is to find a 4x4 projection matrix,

such that:

Deriving the Perspective Projection Matrix

Clip space is an intermediate coordinate system used by Vulkan and the other graphics APIs to perform clipping of geometry. Once that is complete, the clip space homogenous coordinates are projected back to Cartesian space by dividing all components by the 4th component,

. In order to allow for perspective-correct interpolation of per-vertex attributes to happen, the 4th component must be equal to the eye space depth or

. This normalisation process then yields the vertex position in normalised device coordinates (NDC) as:

Since we always want

, this means that the final row of

will be

. Notice that because our z-axis is aligned with the clip-space z-axis there is no negation required here.

So, at this stage we know that the projection matrix looks like this:

Let's carry on and fill in the blanks.

Projection of the x-coordinate

Looking back at Figure 1, we can see by the properties of similar triangles that:

since on the near plane

. Rearranging this very slightly we get:

Let us now consider how the projected vertex positions map through to normalised device coordinates. In Vulkan's NDC, the view volume becomes a cuboid where

, and

. We want the x component of

to vary linearly with the x component of the projected point,

. If it was not a linear relationship then objects would appear to be distorted across the screen or to move with apparently varying velocities.

We know that the extremities of the view volume in the x direction are defined by

and

. These map to -1 and +1 in normalised device coordinates respectively. We can therefore say that at

and

. Using this information we can plot the following graph for

That's right, more of your high school maths is going to be used to find the gradient and intercept of this equation!

The gradient,

is given by:

Substituting the gradient back in we get a simple equation to solve to find the intercept,

substituting in

and

We then get the following expression for

as a function of

Substituting in for

from equation

into equation

and factorising gives:

Recall from the first component of

that

. Substituting this in for the left-hand side of the previous equation gives:

which is now directly comparable to the equation for the 1st component of

and comparing coefficients allows us to immediately read off the first row of the projection matrix as

. This also makes intuitive sense looking back at Figure 1 as the x component of the clip space point should only depend upon the x and z components of the eye space position (the eye space y component does not affect it).

As it stands here, the projection matrix looks like this:

Projection of the y-coordinate

The good news, is that the analysis in the y direction is exactly analogous to what we just did for the x direction. Without further ado, from Figure 2 and by the properties of similar triangles and since on the near plane

Which then gives:

We know that the extremities of the view volume in the y direction are defined by

and

. These map to -1 and +1 in normalised device coordinates respectively. We can therefore say that at

and

. Using this information we can plot the following graph for

As before, we have a linear equation to find the gradient and intercept of. The gradient,

is given by:

Substituting the gradient back in we get a simple equation to solve to find the intercept,

substituting in

and

We then get the following expression for

as a function of

Substituting in for

from equation

into equation

and factorising gives:

Recall from the second component of

that

. Substituting this in for the left-hand side of the previous equation gives:

This time, comparing to the second component of

we can read off the coefficients for the second row of the projection matrix as

. Once again a quick intuitive check against Figure 2 matches what we have found. The projected and clip space y coordinates do not depend upon the x component of the eye space position.

At the three quarters stage, the projection matrix is now:

We are almost there now. We have just the z-axis mapping left to deal with.

Mapping the z-coordinate

The analysis of the z-axis is a little different to that of the x and y dimensions. For Vulkan, we wish to map eye space depths such that:

the near plane,

, maps to

and

the far plane,

, maps to

The z components of the projected point and the normalised device coordinates point should not depend upon the x and y components. This means that for the 3rd row of the projection matrix the first two elements will be 0. The remaining two elements we will denote by

and

respectively:

Combining this with the 3rd row of

we see that:

Now if we divide both sides by

and recalling that from

that

we can write:

Substituting in our boundary conditions (shown in the bullet points above) into equation

we get a pair of simultaneous equations for

and

We can subtract equation

from equation

to eliminate

Now to find

we can substitute

back into

Substituting equations

and

back into the projection matrix, we finally arrive at the result for a perspective projection matrix useable with Vulkan in conjunction with the post-view rotation matrix from Part 1:

Using the Projection Matrix in Practice

Recall that equation

is the matrix to perform the projection operation from the rotated eye space coordinates to the right-handed clip space coordinates used by Vulkan. What does this mean? Well, it means that we should include the post-view correction matrix into our calculations when transforming vertices. Given a vertex position in model space,

, we can transform it into clip space by the following:

As we saw in Part 1, the post-view correction matrix is just a constant that performs the 180 degree rotation about the x-axis, we can combine this into our calculation of the projection matrix,

. This is analogous to how the OpenGL projection matrix typically includes the z-axis flip to change from a right-handed to left-handed coordinate system. Combining the post-view rotation and Vulkan projection matrix gives:

Edit: Fixed the signs of the 1st and 2nd rows of the 3rd column in the matrix above (copy and paste error). Thanks to FourierTransform and Ziflin in the comments for pointing this out!

Before you rush off and implement equation

in your favourite editor and language, there is one final piece of subtlety to consider! Recall that when we began deriving the perspective projection matrix, we set things up so that our source coordinate system was the rotated eye space so that its axes were already aligned with the clip space destination coordinate system. Refer back to Figures 1 and 2 and note the orientation of the axes. In particular that the y axis increases in a downward direction.

The thing to keep in mind is that the parameters used in

are actually specified in the rotated eye space coordinate system. This has implications:

x axis: Nothing to change here. Since we rotate about the x-axis to get from eye space to rotated eye space, the x component of any position does not change.
y axis: The 180 degree rotation about the x axis will affect the y components of any positions. The following diagram shows a blue view volume in the non-rotated eye space - the z-axis increases to the left and the near plane is positioned on the negative z side. The view volume is in the upper right quadrant and in this case both the top and bottom values for the near plane are positive. In the lower left quadrant, in green, we also show the rotated view volume. Notice that the 180 degree rotation causes the signs of the

and

parameters to be negated.

z axis: Technically, the 180 degree rotation would also negate the z components of any positions. However, developers are already used to specifying the near and far plane parameters,

and

, as distances from the

plane. This is exactly what happens when creating an OpenGL projection matrix for example. Since we already specified

and

as positive values in the rotated eye space, we can just treat the inputs to any function that we write as positive distances for the near and far plane and stay in keeping with what developers are used to.

Putting this together, we can create a function to produce a Vulkan projection matrix and optionally have it incorporate the post-view correction rotation matrix. All we have to remember is that if we are opting in to include the post-view correction, then the top and bottom parameters are treated as being specified in the non-rotated eye space. If we do not opt in, then they are specified in rotated eye space.

In practise, this works well because often you want to minimise the amount of floating point arithmetic going on per frame so opting in allows the developer to specify top and bottom in the usual eye space coordinates which is closer to the chosen world space system (often y-up too), than the rotated eye space.

Using the popular glm library, we can declare a function as:

enum class ApplyPostViewCorrection : uint8_t {
    No,
    Yes
};

struct AsymmetricPerspectiveOptions {
    float left{ -1.0f };
    float right{ 1.0f };
    float bottom{ -1.0f };
    float top{ 1.0f };
    float nearPlane{ 0.1f };
    float farPlane{ 100.0f };
    ApplyPostViewCorrection applyPostViewCorrection{ ApplyPostViewCorrection::Yes };
};

glm::mat4 perspective(const AsymmetricPerspectiveOptions &options);

The implementation turns out to be very easy once we know equations

and

glm::mat4 perspective(const AsymmetricPerspectiveOptions &options)
{
    const auto twoNear = 2.0f * options.nearPlane;
    const auto rightMinusLeft = options.right - options.left;
    const auto farMinusNear = options.farPlane - options.nearPlane;

    if (options.applyPostViewCorrection == ApplyPostViewCorrection::No) {
        const auto bottomMinusTop = options.bottom - options.top;

        const glm::mat4 m = {
            twoNear / rightMinusLeft,
            0.0f,
            0.0f,
            0.0f,

            0.0f,
            twoNear / bottomMinusTop,
            0.0f,
            0.0f,

            -(options.right + options.left) / rightMinusLeft,
            -(options.bottom + options.top) / bottomMinusTop,
            options.farPlane / farMinusNear,
            1.0f,

            0.0f,
            0.0f,
            -options.nearPlane * options.farPlane / farMinusNear,
            0.0f
        };

        return m;
    } else {
        // If we are applying the post view correction, we need to negate the signs of the
        // top and bottom planes to take into account the fact that the post view correction
        // rotate them 180 degrees around the x axis.
        //
        // This has the effect of treating the top and bottom planes as if they were specified
        // in the non-rotated eye space coordinate system.
        //
        // We do not need to flip the signs of the near and far planes as these are always
        // treated as positive distances from the camera.
        const auto bottom = -options.bottom;
        const auto top = -options.top;
        const auto bottomMinusTop = bottom - top;

        // In addition to negating the top and bottom planes, we also need to post-multiply
        // the projection matrix by the post view correction matrix. This amounts to negating
        // the y and z axes of the projection matrix.
        const glm::mat4 m = {
            twoNear / rightMinusLeft,
            0.0f,
            0.0f,
            0.0f,

            0.0f,
            -twoNear / (bottomMinusTop),
            0.0f,
            0.0f,

            (options.right + options.left) / rightMinusLeft,
            (bottom + top) / bottomMinusTop,
            -options.farPlane / farMinusNear,
            -1.0f,

            0.0f,
            0.0f,
            -options.nearPlane * options.farPlane / farMinusNear,
            0.0f
        };

        return m;
    }
}

Summary

In this article we have shown how to build a perspective projection matrix to transform vertices from rotated eye space to clip space all from first principles. The requirement for perspective correct interpolation and the perspective divide yielded the 4th row of the the projection matrix. We have then shown how we can construct a linear relationship between the x or y components of the eye space projected point on the near plane to the normalised device coordinate point, and from there back to clip space. We then showed how to map the eye space depth component onto the normalised device coordinate depth. Finally we have given some practical tips about combining the projection matrix with the post-view rotation matrix.

We hope that this has removed some of the mystery surrounding the perspective projection matrix and how using an OpenGL projection matrix can cause your rendered results to be upside down. Armed with this knowledge you will have no need for the various hacks mentioned earlier.

In the next article, we will take a look at some more variations on the projection matrix and some more tips for using it in applications. Thank you for reading!

The post Projection Matrices with Vulkan - Part 2 appeared first on KDAB.

Projection Matrices with Vulkan - Part 1

Sean Harmer — Thu, 23 Nov 2023 08:00:00 GMT

Projection Matrices with Vulkan - Part 1

Introduction

When someone with an OpenGL background begins using Vulkan, one of the very common outcomes - beyond the initial one of "OMG how much code does it take to draw a triangle?" - is that the resulting image is upside down.

Searching the web for this will give many hits on discussions about coordinate systems being flipped with suggested solutions being to do things like:

Invert all of your gl_Position.y coordinates in all of your vertex shaders.
Provide a negative height viewport to flip the viewport transformation applied by Vulkan.
Perform some magic incantation on your transformation matrices such as negating the y-axis.

All of these approaches have downsides such as needing to touch all of your vertex shaders; using hardware where negative viewport heights are supported; not really understanding the implications of randomly flipping an axis in a transformation matrix; having to invert your geometry winding order.

This post will aim to explain what is different between OpenGL and Vulkan transformations and how we can adapt our code to get the desired results with the bonus of actually understanding what is going on. This final point is crucial when it comes time to make changes later so that you don't end up in the common situation of randomly flipping axes until you get what you want but which probably breaks something else.

Left- vs Right-handed Coordinate Systems

As a quick aside, it is important in what follows to know if we are dealing with a left-handed or right-handed coordinate system at any given time. First of all what does it even mean for a coordinate system to be left-handed or right-handed?

Well, it's just a way of defining the relative orientations of the coordinate axes. In the following pictures we can use our thumb, first finger, and middle finger to represent the x, y, and z axes (or basis vectors if you prefer).

In a right-handed coordinate system we use those digits on our right hand so that the x-axis points to the right say, the y-axis points up, leaving the z-axis (middle finger) to point towards us.

Conversely, in a left-handed coordinate system we can still have the x-axis pointing to the right and the y-axis pointing up, but this time the z-axis increases away from us.

Converting from a right-handed coordinate system to a left-handed coordinate system or vice versa can be achieved by simply flipping the sign of a single axis (or any odd number of axes).

As we shall see, different graphics APIs use left- or right-handed coordinate systems at various stages of processing. This stuff can be a major source of confusion for graphics developers if they do not keep track of coordinate systems and often results in "oh hey, it works if I flip the sign of this column but I have no idea why".

Common Coordinate Systems in 3D Graphics

Let's take a quick tour of the coordinate systems used in 3D graphics at various stages of the (extended) pipeline. We will begin with OpenGL and then go on to discuss Vulkan and its differences. Note that the uses of the coordinate systems are the same in both systems, but as we shall see, there are some small but important changes between the two APIs. It is these differences that we need to be aware of in order to make our applications behave the way we want them to.

Here is a quick summary of the coordinate systems, what they are used for and where they occur.

Model Space or Object Space

This is any coordinate system that a 3D artist chooses to use when creating a particular asset. If modelling a chair, then they may use units of cm perhaps. If modelling a mountain range, a more suitable choice of unit may be km. Different tools also have different conventions for the orientation of axes. Blender for example, uses a z-up convention whereas as we shall see later, many real-time 3D applications choose to use y-up as their chosen orientation. Ultimately, it does not matter just so long as we know which conventions are used. Objects in model space are also often located close to the origin for convenience when being modelled and for when we later wish to position them.

Model space is often right-handed but it is usually decided by the tool author or generative code author.

World Space

World space is what we are most familiar with and is typically what you create using game engine editors. World space is a coordinate system where everything is brought into consistent units whether the units we choose are microns, centimeters, meters, kilometers etc. How we define world space in our applications is up to us. It may well differ depending upon what it is we are trying to simulate. Cellular microscopy applications probably make more sense using suitable units such as microns or perhaps even nanometers. Whereas a space simulation is probably better off using kilometers or maybe something even larger – whatever allows you to make best use of the limited precision of floating point numbers.

World space is also where we would rotate objects coming from various definitions of model space so that they make sense in the larger scene. For example, if a chair was modeled with the z-up convention and it wasn't rotated when it was exported, then when we place it into world space we would also apply the rotation here, so that it looks correct in a y-up convention.

To create a consistent scene, we scale, rotate and translate our various 3D assets so that they are positioned relative to each other as we wish. The way we do this is to pre-multiply the vertex positions of the 3D assets by a "Model Matrix" for that asset. The Model Matrix, or just

for short, is a 4x4 matrix that encodes the scaling, rotation and translation operations needed to correctly position the asset.

World space is often right-handed but it is up to the application developer to decide.

Camera or View or Eye Space

This next space goes by various names in the literature and online such as eye space, camera space or view space. Ultimately they all mean the same thing which is that the objects in our 3D world are transformed to be relative to our virtual camera. Wait, our what?

Well, in order to be able to visualize our virtual 3D worlds on a display device, we must choose a position and orientation from which to view it. This is typically achieved by placing a virtual camera into the world. Yes, the camera entity is also positioned in world space by way of a transformation just like the assets mentioned above. View space is often defined to be a right-handed coordinate system where:

the x-axis points to the right;
the y-axis points upwards;
and the z-axis is such that we are looking down the negative z-axis.

Typically a camera is only rotated and translated to place it into world space and so the units of measurement are still whatever you decided upon for World space. Therefore, the transformation to get our 3D entities from world space and into view space consists only of a translation and rotation. The matrix for transforming from World space to View space is typically called the "View Matrix" or just

View space is often right-handed but it is up to the developer to decide.

Clip Space

In addition to a position and orientation, our virtual camera also needs to provide some additional information that helps us convert from a purely mathematical model of our 3D world to how it should appear on screen. We need a way to map points in View space onto specific pixel coordinates on the display.

The first step towards this is the conversion from View space to "Clip Space" which is achieved by multiplying the View space positions by a so-called "Projection Matrix" (abbreviated to

There are various ways to calculate a projection matrix,

, depending upon if you wish to use an orthographic projection or a perspective projection.

Orthographic projection: Often used in CAD applications as parallel lines in the world remain parallel on screen. Angles are preserved. The view volume (portion of scene that will appear on screen) is a cuboid.
Perspective projection: Often used in games and other applications as this mimics the way our eyes work. Distant objects appear smaller. Angles are not preserved. The view volume for a perspective projection is a frustum (truncated rectangular pyramid).

Ultimately, the projection matrix transforms the view volume into a cuboid in clip space with a characteristic size of w. Thanks to the way that perspective projection matrices are constructed, the w component is equal to the z-depth in eye space of the point being transformed. This is so that we can later use this to perform the perspective divide operation and get perspective-correct interpolation of our geometry's attributes (see below).

Don't worry too much about the details of this. Conceptually it squashes things around so that anything that was inside the view volume (cube or frustum) into a cuboidal volume. The exact details of this depends upon which graphics API you are using (see even further below).

Normalised Device Coordinates

The next step along our path to getting something to appear on screen involves the use of NDC space or Normalized Device Coordinates. This step is easy though. All we do to get from Clip space to NDC space is to divide the x, y, and z components of each vertex by the 4th w component (and then discarding the 4th component which is now guaranteed to be exactly 1). A process known as homogenization or perspective divide.

Why even do this? Well as the name suggests, clip space is used by the fixed function parts of the GPU to clip geometry so that it only has to rasterize parts that will actually be visible on the display. Any coordinate that has a magnitude exceeding the value w will be clipped.

It is this step that "bakes in" the perspective effect if using a perspective transformation.

The end result is that our visible part of the scene is now contained within a cuboid with characteristic length of 1. Again, see below for the differences between graphics APIs.

NDC space is a nice simple, normalized coordinate system to reason about. We're now just a small step away from getting our 3D world to appear at the correct set of pixels on the display.

Framebuffer or Window Space

The final step of the process is to convert from NDC to Window Space or Framebuffer Space or Viewport Space. Again more names for the same thing. It’s basically the pixel coordinates in your application window.

The conversion from NDC to Framebuffer space is controlled by the viewport transformation that you can configure in your graphics API of choice. This transformation is just a bias (offset) and scaling operation. This makes intuitive sense when you think we are converting from the normalized coordinates in NDC space to pixels. The levels of scale and bias are controlled by which portion of the window you wish to display to. Specifically it's offset and dimensions. The details of how to set the viewport transformation vary between graphics APIs.

Coordinate Systems in Practice

The above descriptions sound very scary and intimidating but in practice they are not so bad once we understand what is going on. Spending a little time to understand the sequence of operations is very worth while and is infinitely better than randomly changing the signs of various elements to make something work in your one particular case. It's only a matter of time until your random tweak will break something else in the future.

Take a look at the following diagram that summarizes the path that data takes through the graphics pipeline and the transformations/operations at each stage:

A few things to note:

The transformations from Model Space to Clip Space are performed in the programmable Vertex Shader stage of the graphics pipeline.
Rather than doing 3 distinct matrix multiplications for every vertex, we often combine the

, and

matrices into a single matrix on the CPU and pass the result into the vertex shader. This allows a vertex to be transformed all the way to Clip Space with a single matrix multiplication.

The clipping and perspective divide operations are fixed function (hardwired in silicon) operations. Each graphics API specifies the coordinate systems in which these happen.
The scale and bias transformation to go from NDC to Framebuffer Space is fixed function too but is controlled via API calls such as glViewport() or vkCmdSetViewport().

The upshot of all of this is that we need to create the Model, View and Projection matrices to get our vertex data correctly into Clip Space. How we do this differs subtly between the different graphics APIs such as OpenGL vs Vulkan as we shall see now. These differences are what often lead to some issues when migrating from OpenGL to Vulkan. Especially when using some helper libraries that were coded up with the expectation of only being used with OpenGL.

As stated above, the Model, World and View spaces are defined by the content creation tools (Model Space) or by us as application/library developers (World and View spaces). It is only when we get to clip space that we have to be concerned about what the graphics API we are using expects to receive.

OpenGL Coordinate Systems

With OpenGL, the fixed function parts of the pipeline all use left-handed coordinate systems as shown here:

If we stick with the common conventions of using a right-handed set of coordinate systems for Model Space, World Space and View Space, then the transformation from View Space to Clip Space must also flip the handedness of the coordinate system somehow.

Recall that to go from View Space to Clip Space, we multiply our View Space vertex by the projection matrix

. Usually we would use some library to create a projection matrix for us such as glm or even glFrustum, if you are still using OpenGL 1.x!

There are various ways to parameterize a perspective projection matrix but to keep it simple let's stick with the left (

), right (

), top (

), bottom (

), near (

) and far (

) parameterisation as per the glFrustum specification. This assumes the virtual camera (or eye) is at the origin and that the near and far values are the distances to the near and far clip planes along the negative z-axis. The near plane is the plane to which our scene will be projected. The left, right, top and bottom values specify the positions on the near plane used to define the clip planes that form the view volume - a frustum in the case of a perspective transform.

With this parameterisation, the projection matrix for OpenGL looks like this:

Do not blindly use this as your projection matrix! It is specifically for OpenGL!

OK, that looks reasonable and matches various texts on OpenGL programming. It works perfectly well for OpenGL because it not only performs the perspective projection transform but it also bakes in the flip from right-handed coordinates to left-handed coordinates. This last little fact seems to be something that many texts gloss over and so goes unnoticed by many graphics developers. So where does this happen? That pesky little -1 in the 3rd column of the 4th row is what does it. This has the effect of flipping the z-axis and using -z as the w component causing the change in handedness.

If we blindly then use the same matrix to calculate a perspective projection for use with Vulkan that does not need the handedness flip, then we end up in trouble. This is typically followed by google searches leading to one of the many hacks to provide a "fix".

Instead, let's use our understanding of the problem domain to now come up with a proper correction for use with Vulkan.

Vulkan Coordinate Systems

Conversely to OpenGL, the fixed function coordinate systems used in Vulkan remain as right-handed in keeping with the earlier coordinate systems as shown here:

Notice that even though z-increases into the distance and y is increasing down, it is still in fact a right-handed coordinate system. You can convince yourself of this with some flexible rotations of your right hand similar to the photographs above.

Let's think about what we need conceptually without getting bogged down in the math - for now at least, we will save that for next time. With the OpenGL perspective projection matrix we have something that takes care of the transformation of the view frustum into a cube in clip space. The problem we have when using it with Vulkan is the flip in the handedness of the coordinate system thanks to that -1 we mentioned in the previous section. Setting that perspective component to 1 instead of -1 prevents the flip in handedness - there's a bit more to it as we will see in part 2 but that takes care of the change in handedness.

We still need to reorient our coordinate axes from View Space (x-right, y-up, looking down the negative z-axis) to Vulkan's Clip Space (x-right, y-down, looking down the positive z-axis). Since the start and end coordinate systems are both right-handed, this does not involve an axis flip as in the OpenGL case. Instead, all we need to do is to perform a rotation of 180 degrees about the x-axis. This gives us exactly the change in orientation that we need.

This means that before we see how to construct a projection matrix, we should reorient our coordinate axes to already be aligned with the desired clip space orientation. To do this, we inject a 180 degree rotation of the eye space coordinate around the x-axis before we later apply the actual projection. This rotation is shown here:

Recall from high school maths, that a rotation matrix about the x-axis basis vector of

(

radians) is easily constructed as:

This also makes sense intuitively as the y and z components of the matrix will both be negated by the -1 elements. Note that we have two "axis flips", so it still maintains the right-handedness of the coordinate system as desired.

So, in the end all we need to do is to include this "correction matrix", X, into our usual chain of matrices when calculating the combined model-view-projection matrix that gets passed to the vertex shader. With the correction included, our combined matrix is calculated as

. That means the transforms applied in order (right to left) are:

Model to World
World to Eye/View
Eye/View to Rotated Eye/View
Rotated Eye/View to Clip

With the above in place, we can transform vertices all the way from Model Space through to Vulkan's Clip Space and beyond. All that remains for us next time, is to see how to actually construct the perspective projection matrix. However, we are now in a good position (and orientation) to derive the perspective projection matrix as our source (rotated eye space) and destination (clip space) coordinate systems are now aligned. All we have to worry about is the actual projection of vertices onto the near plane.

Once we complete this next step, we will be able to avoid any of the ugly hacks mentioned at the start of this article and we will have a full understanding of how our vertices are transformed all the way from Blender through to appearing on our screens. Thanks for reading!

Part 2 is available here.

The post Projection Matrices with Vulkan - Part 1 appeared first on KDAB.

Optimizing and Sharing Shader Structures

Thu, 24 Aug 2023 07:00:00 GMT

Optimizing and Sharing Shader Structures

When writing large graphics applications in Vulkan or OpenGL, there's many data structures that need to be passed from the CPU to the GPU and vice versa. There are subtle differences in alignment, padding and so on between C++ and GLSL to keep track of as well. I'm going to cover a tool I wrote that generates safe and optimal code. This helps not only the GPU but the programmer writing shaders too. Here's a rundown of the problems I'm trying to solve and how you can implement a similar system in your own programs.

This tool specifically targets and references Vulkan rules, but similar rules exist in OpenGL.

Reasoning

Here's an example of real code, exposing options to a post-processing stage.

layout(push_constant) uniform PushConstant {
    vec4 viewport;
    vec4 options;
    vec4 transform_ops;
    vec4 ao_options;
    vec4 ao_options2;
    vec4 proj_info;
    mat4 cameraProj;
    mat4 invProj;
};

Even for the person who wrote this code, it's hard to tell what each option does from a glance. This is a great way to create bugs, since it's extremely easy to mix up accessors like ao_options.x and ao_options.y. Ideally, we want these options to be separated but there's a reason why they're packed in the first place.

Alignment rules

Say you're beginning to explore Phong shading, and you want to expose a position and a color property so you can change them while the program is running. In a 3D environment, there are three axes (X, Y and Z) so naturally it must be a vec3. Light color also makes sense to be a vec3. When emitted from a light, it's color can't really be "transparent" so we don't need the alpha channel. The GLSL code so far looks like this:

#version 430

out vec4 finalColor;

layout(binding = 0) buffer block {
    vec3 position;
    vec3 color;
} light;

void main() {
    const vec3 dummy = vec3(1) - light.position;
    finalColor = vec4(vec3(1.0, 1.0, 1.0) * light.color, 1.0);
}

(There's no Phong formula here, we want to make sure the GLSL compiler doesn't optimize anything out.)

When writing the structure on the C++ side, you might write something like this:

struct Light {
    glm::vec3 position;
    glm::vec3 color;
} light;

light.position = {1, 5, 0};
light.color = {3, 2, -1};

For this example I used the debug printf system, which is part of the Vulkan SDK so we can confirm the exact values. The output is as follows:

Position = (1.000000, 5.000000, 0.000000)
Color = (2.000000, -1.000000, 0.000000)

As you can see, the first value of color is getting chopped off when reading it in the shader. The usual solution to the problem is to use a vec4 instead:

struct Light {
    glm::vec4 position;
    glm::vec4 color;
};

And to confirm, this does indeed fix the issue:

Position = (1.000000, 5.000000, 0.000000)
Color = (3.000000, 2.000000, -1.000000)

But why does it work when we change to it a vec4? This section from the Vulkan specification spells it out for us:

The base alignment of the type of an OpTypeStruct member is defined recursively as follows:

A scalar has a base alignment equal to its scalar alignment.
A two-component vector has a base alignment equal to twice its scalar alignment.
A three- or four-component vector has a base alignment equal to four times its scalar alignment.

The third bullet point hits it right on the head, vec4 and vec3 have the same alignment! An alternative solution could be to use alignas:

struct Light {
    glm::vec3 color;
    alignas(16) glm::vec3 position;
};

There's a bunch of more nitty and dirty alignment issues that stem from differences between C++ and GLSL, and this is one of those cases. In my opinion, this shouldn't be nessecary for the programmer to handle themselves.

Passing booleans

Another example of esoteric shader rules is when you try passing booleans. Take a look at this C++ structure, which seems okay at first glance:

struct TestBuffer {
    bool a = false;
    bool b = true;
    bool c = false;
    bool d = true;
};

And this is how it's defined in GLSL:

layout(binding = 0) buffer readonly TestBuffer {
    bool a, b, c, d;
};

When sent to the shader, the values of the structure end up like this:

a = 1, b = 0, c = 0, d = 0

This is because because SPIR-V doesn't seem to define a physical size for bool, so it could be represented as anything (like an unsigned integer). In this case, you actually want to define them as integer:

layout(binding = 0) buffer readonly TestBuffer {
    int a, b, c, d;
};

This is a little disappointing, because the semantic meaning of a boolean option is lost when you declare them as integers. You can also pack a lot of booleans into the space of one 32-bit integer, which could be a possible space-saving optimization in the future.

The last problem is keeping the structures in sync. There's usually one instance of the structure written in C++ and many copies in GLSL shaders. This is problematic because member order could change, so parts of the structure itself could be undefined and can easily escape notice. Having one definition for all shaders and C++ would be a huge improvement!

Struct compiler

What I ended up with is a new pre-processing step, which I called the "struct compiler". I tried searching on the Internet to see if someone has already made a tool like this, but couldn't find much - maybe shader reflection is more popular. I did learn a lot from making this tool anyway. It's main goals are:

Define the shader structures in one, centralized file.
Structures should be able to be written on a higher-level, allowing us to decouple the actual member order, alignment and packing from the logic. This enables the compiler to optimize the structure in the future, maybe beyond what we can reasonably hand-write.
The structure is usable in GLSL and C++.

First you write a .struct file, describing the required members and their types. Here's the same post-processing structure showcased in the beginning, but now written in the compiler's custom syntax:

primary PostPushConstant {
    viewport: vec4
    camera_proj: mat4
    inv_proj: mat4
    inv_view: mat4

    enable_aa: bool
    enable_dof: bool

    exposure: float
    display_color_space: int
    tonemapping: int

    ao_radius: float
    ao_r2: float
    ao_rneginvr2: float
    ao_rdotvbias: float
    ao_intensity: float
    ao_bias: float
}

This looks much better, doesn't it? Even without knowing anything else about the actual shader, you can guess which options do what with some accuracy. Here's what it might look like, compiled to C++:

struct PostPushConstant {
    glm::mat4 camera_proj;
    glm::mat4 inv_proj;
    glm::mat4 inv_view;
    glm::vec4 viewport;
    glm::ivec4 enable_aa_enable_dof_display_color_space_tonemapping_;
    glm::vec4 exposure_ao_radius_ao_r2_ao_rneginvr2_;
    glm::vec4 ao_rdotvbias_ao_intensity_ao_bias_;
    ...
};

(Setters like set_exposure() and set_exposure() are used instead of accessing the glm::vec4 manually.)

I hook the generation step in my buildsystem to automatically run, so all you need to do is include the auto-generated header. To use the structure in GLSL, I created a new directive that inserts the GLSL version of the structure given by the struct compiler. The same system that generates the C++ headers also generates GLSL which inserts where this directive is found:

#use_struct(push_constant, post, post_push_constant)

(The syntax could use some work, but the first argument is the usage, and the second argument is the name of the struct. The third argument is a unique name for the instance.)

Since the member order and names are undefined, you must access the members by a setter/getter in GLSL and C++. I think this is a worthwhile trade-off for readable code.

vec3 ao_result = pow(ao, ao_intensity())

This tool runs as a pre-processing step offline, before shader compilation begins. The tool's source code is available here, which is taken from one of my personal projects. It's quickly written and I don't recommend using it directly, but I'm confident that this idea is worth pursuing.

The post Optimizing and Sharing Shader Structures appeared first on KDAB.

KDGpu v.0.1.0 is released

Sean Harmer — Thu, 10 Aug 2023 06:00:00 GMT

KDGpu v.0.1.0 is released

We're pleased to announce we've added a new library, KDGpu, to the arsenal of tools we invent to make our lives easier - and then share with you on KDAB's GitHub.

Who is this for?

If you want to become more productive with Vulkan or learn the concepts of modern explicit graphics APIs, then KDGpu is the library for you!

KDGpu is a thin wrapper around Vulkan whose purpose it is to make modern graphics APIs more accessible and easier to learn. It cuts through the verbose syntax, makes managing object lifetimes much simpler and allows you to get your project working without having to be bogged down in intricacies involved in tasks such as synchronization or memory handling. The sensible option defaults in KDGpu mean that you can focus on solving the problem at hand. Furthermore, KDGpu exposes almost all of the power of raw Vulkan and is very well suited to teaching modern graphics APIs and their concepts.

KDGpu enables you to make examples like this easily and with great readability:

What got KDAB started on this?

As you are likely aware, KDAB offers training in 3D. So far, we have used OpenGL as the vehicle for these training courses and for years it has performed very well, both for the courses and the industry.

However, with the modern explicit graphics APIs that have come along (Vulkan, Metal, D3D12 and WebGPU), OpenGL has been left to languish a bit. On MacOS, Apple has frozen OpenGL at version 4.1 which is annoying as 4.2 brought all sorts of cool things to the party.

Replacing OpenGL in our training courses with Vulkan wasn't really an option due to the extreme complexity in this, the most verbose API available. We needed something like Vulkan but not Vulkan :-)

Say hello to KDGpu!

What does KDGpu Provide?

Although we began work on KDGpu as a tool to aid teaching, it has grown beyond that into a library that is suitable for use in production projects and we are using it as such ourselves. We will keep adding to the library as we find new features that we need but we will keep KDGpu itself as lightweight as possible. It is also possible that in the future we may add other backend beyond Vulkan.

For now though, what exactly is in KDGpu the repository?

The KDGpu library. A collection of classes and options structs that make it easy, concise and clear to work with Vulkan.
An example framework called KDGpuExample that provides integration with KDGui to make it easy to experiment by providing a cross-platform windowing and event loop implementation.
A set of illustrative examples showing how to use KDGpu for common rendering tasks from the typical hello_triangle through to rendering involving multiple rendering and compute passes.

KDGpu is independent of any particular windowing system. You can use it purely with platform native APIs or you can checkout the included KDGpuExample library which makes it trivial to use KDGpu with KDGui::Window and friends. Check out the handy KDBindings repository too for some more syntactic sugar.

The following images were all created with examples written with KDGpu. We will publish some more in depth blogs on these and other examples in the future to show just how easy it is to reason about graphics issues without getting lost in hundreds of lines of code.

Using KDGpu to render a textured glTF 2 model. KDGpu makes it easy to manage resources for shaders. The glTF parsing was performed using the excellent tinygltf library.

Instanced rendering of the glTF 2 buggy model. Each sub-mesh is drawn using instancing to reduce the number of draw calls and binding operations.

Example showing various styles of text effect using multi-channel signed distance fields.

For now, you can get started looking at the examples in the repository and looking at the documentation.

The KDGpuExample helper library is a great help to those starting out. It takes care of creating a device; performing frame-to-frame synchronisation (options for loops with and without the use of vkDeviceWaitIdle); managing the depth buffer; easily enabling of MSAA etc. When you want to take the leap and manage these things yourself or integrate KDGpu into your own engine, then it is very easy to do so.

Show me some code!

We won't go into all of the details in this announcement blog, but it would be remiss of us not to give you a taster of what KDGpu code looks like in practise.

The core of a graphics application is its render function which gets called each and every frame to record the command buffers for the GPU to process. In KDGpu, the process of recording a command buffer to bind a graphics pipeline, set some state and actually draw something would look like this:

void HelloTriangle::render()
{
    // m_device is a KDGpu::Device
    auto commandRecorder = m_device.createCommandRecorder();

    // m_opaquePassOptions is a KDGpu::RenderPassCommandRecorderOptions struct that we use to specify the render pass setup
    m_opaquePassOptions.colorAttachments[0].view = m_swapchainViews.at(m_currentSwapchainImageIndex);
    auto opaquePass = commandRecorder.beginRenderPass(m_opaquePassOptions);

    // Get ready to draw
    opaquePass.setPipeline(m_pipeline);
    opaquePass.setVertexBuffer(0, m_buffer);
    opaquePass.setIndexBuffer(m_indexBuffer);
    opaquePass.setBindGroup(0, m_transformBindGroup);

    // Record the draw command
    const DrawIndexedCommand drawCmd = { .indexCount = 3 };
    opaquePass.drawIndexed(drawCmd);

    // End the render pass and finish the command recording
    opaquePass.end();
    m_commandBuffer = commandRecorder.finish();

    // Submit command buffer
    ...
}

In the above code, we are referring to a few other objects of types such as KDGpu::Buffer (m_buffer and m_indexBuffer), KDGpu::BindGroup (m_transformBindGroup) and KDGpu::GraphicsPipeline (m_pipeline). The functions that consume these objects (and many others) are actually passing around handles to the objects rather than the objects themselves.

The objects themselves are owning, move-only, strong references to the underlying Vulkan objects whereas the templated Handle type is a weak-reference that is very cheap to pass around. The handles are just generational indices into some object pools managed by KDGpu.

To keep a GPU resource alive, simply keep the corresponding C++ object alive. To pass it to a consuming function, each class has a convenience conversion to handle operator implemented. This really simplifies resource management with natural C++ semantics.

Just having local named objects makes the explicit graphics APIs much easier to reason about than OpenGL's massive global state machine approach. Using KDGpu to go even further in terms of reducing code verbosity and using it's sane defaults approach makes graphics programming a delight and accessible to mere mortals.

Of course, rendering is not very exciting without some GPU-based resources such as textures and buffers. KDGpu makes creating such resources very easy and intuitive.

We begin with a quick look at the KDGpu::BufferOptions struct:

struct BufferOptions {
    DeviceSize size;
    BufferUsageFlags usage;
    MemoryUsage memoryUsage;
    SharingMode sharingMode{ SharingMode::Exclusive };
    std::vector queueTypeIndices{};
};

Of note is the sensible default value for the sharing mode member. The option is there if we need to set it but in the common case we can just run with the default value. The code to create a buffer object to hold some index data is then pretty much as you would hope:

std::array<uint32_t, 3> indexData = { 0, 1, 2 };
const DeviceSize dataByteSize = indexData.size() * sizeof(uint32_t);
const BufferOptions bufferOptions = {
    .size = dataByteSize,
    .usage = BufferUsageFlagBits::IndexBufferBit | BufferUsageFlagBits::TransferDstBit,
    .memoryUsage = MemoryUsage::GpuOnly
};
m_indexBuffer = m_device.createBuffer(bufferOptions);

The memory usage flags indicate that we wish the buffer to be resident in GPU memory and the usage flags indicate that we will use it as a source of index data for some geometry and that the buffer should be capable of being the target of a copy operation (we need to get the data in there somehow).

This pattern of using options structs and initializing them with C++20 designated initializers permeates through the API. It makes it easily discoverable, extensible and trivial to queue up for deferred invocations.

Uploading data to the above buffer is just as simple:

const BufferUploadOptions uploadOptions = {
    .destinationBuffer = m_indexBuffer,
    .dstStages = PipelineStageFlagBit::VertexAttributeInputBit,
    .dstMask = AccessFlagBit::IndexReadBit,
    .data = indexData.data(),
    .byteSize = dataByteSize
};
m_queue.uploadBufferData(uploadOptions);

Here we are using the uploadBufferData function that is a member function on KDGpu::Queue to perform a synchronous upload of the data. If you instead wish to perform an asynchronous upload of the data to keep the CPU doing useful work, then the KDGpuExample library has helpers for this too. There is some interplay with the concept of a frame which the KDGpu::Queue on its own does not know about. Feel free to take a look at the code for the details, it is all very simple and clearly written.

Find out more and get up and running with KDGpu

In this blog post we have introduced KDAB's new library, KDGpu, seen some of the advantages it brings, and had a cursory look at a flavour of the API in use.

If this has whetted your appetite for GPU programming, look out for our follow-up posts which will deal with glTF rendering and more in much more detail. We think that you will like KDGpu. We have certainly had fun writing it and then using it on projects. For now, please give KDGpu a try.

The post KDGpu v.0.1.0 is released appeared first on KDAB.

Synchronization in Vulkan

Sean Harmer — Thu, 22 Jun 2023 07:00:00 GMT

Synchronization in Vulkan

An important part of working with Vulkan and other modern explicit rendering APIs is the synchronization of GPU/GPU and CPU/GPU workloads. In this article we will learn about what Vulkan needs us to synchronize and how to achieve it. We will talk about two high-level parts of the synchronization domain that we, as application and library developers, are responsible for:

GPU↔GPU synchronization to ensure that certain GPU operations do not occur out of order,
CPU↔GPU synchronization to ensure that we maintain a certain level of latency and resource usage in our applications.

GPU↔GPU Synchronization

Whereas in OpenGL we could simply render to the GL_BACK buffer of the default framebuffer and then tell the system to swap the back and front buffers, with Vulkan we have to get more involved. Vulkan exposes the concept of a swapchain of images. This is essentially a collection of textures (VkImages) that are owned and managed by the swapchain and the window system integration (WSI). A typical frame in Vulkan looks something like this:

Acquire the index of the swapchain image to which we should render.
Record one or more command buffers that ultimately output to the swapchain image from step 1.
Submit the command buffers from step 2 to a GPU queue for processing.
Instruct the GPU presentation engine to display the final rendered swapchain image from step 3.
Go back to step 1 and start over for the next frame.

This may look innocuous at first glance but let's delve deeper.

A day at the races

In step 1 we are asking the WSI to tell us the index of the next available swapchain image that we may render into. Now, just because this function tells us (and the CPU) that, for example, image index 1 is the image we should use as our render target, it does not mean that the GPU is actually ready to write to this image right now.

It is important to note that we are operating on two distinct timelines. There is the CPU timeline that we are familiar with when writing applications. Then there is also the GPU timeline on which the GPU processes the work that we give to it (from the CPU timeline).

In the case of acquiring a swapchain image index, we are actually asking the GPU to look into the future a little bit and tell us which image index will become the next image to become ready for writing. However, when we call the function to acquire this image index, the GPU presentation engine may well still be reading from the image in question in order to display its contents from an earlier frame.

Many people coming new to Vulkan (myself included) make the mistake of thinking that acquiring the swapchain image index means the image is ready to go right now. It's not!

In step 2, we are entirely operating on the CPU timeline and we can safely record command buffers without fear of trampling over anything happening on the GPU.

The same is true in step 3. We can happily submit the command buffers which will render to our swapchain image. However, this does then trigger the problem. If the GPU presentation engine is still busy reading from the swapchain image when suddenly along comes a bundle of work that tells the GPU to render into that same image we have a potential problem. GPUs are thirsty beasts and are massively parallel machines that like to do as much as possible concurrently. Without some form of synchronization, it is clear to see that, if the GPU begins processing the command buffers, it could easily lead to a situation where the presentation engine could be reading data at the same time it is being written to by another GPU thread. Say hello to our old friend undefined behaviour!

It is now clear that we need some mechanism to instruct the GPU to not process these command buffers until the GPU presentation engine is done reading from the swapchain image we are rendering to.

The solution for synchronising blocks of GPU work in Vulkan is a semaphore (VkSemaphore).

The way it works is that in our application's initialisation code, we create a semaphore for the purposes of forcing the command buffer processing to begin only once the GPU presentation engine tells us it is done reading from the swapchain image it told us to use.

With this semaphore in hand, we can tell the GPU to switch it to a "signalled" state when the presentation engine is done reading from the image. The other half of the problem is solved when we submit the render command buffers to the GPU by handing the same semaphore to the call to vkQueueSubmit().

We now have this kind of setup:

At initialisation, create a semaphore (vkCreateSemaphore) in the unsignalled state.
Pass the above semaphore to vkAcquireNextImageKHR as the semaphore argument so that it is signalled when the image is ready for writing.
Pass the above semaphore to vkQueueSubmit (as one of the pWaitSemaphore arguments of the VkSubmitInfo struct) so that this set of command buffers is deferred until the semaphore is signalled.

Phew, we're all done right? Nope, sadly not. Read on to see what else can go wrong and how to solve it.

I'm not ready to show you my painting

We have solved the race condition on the GPU of preventing the start of the rendering from clobbering the swapchain image whilst the presentation engine may still be reading from it. However, there is currently nothing to prevent the request to begin the presentation of the swapchain image whilst the rendering is still going on!

That is, we have solved the potential race between steps 1 and 3, but there is another race between steps 3 and 4. Luckily the problem is at heart exactly the same. We need to stop some incoming GPU work (the present request in step 4) from stepping on the toes of the already ongoing rendering work from step 3. That is, we need another application of GPU↔GPU synchronization which we know we can do with a semaphore.

To solve this race condition we use the following approach:

At initialisation, create another unsignalled semaphore.
In step 3 when we submit the command buffers for rendering, we pass in the semaphore to vkQueueSubmit as one of the pSignalSemaphores arguments.
In step 4 we then pass this same semaphore to the call to vkQueuePresentKHR as one of the pWaitSemaphores arguments.

This works in a completely analogous way to the first problem that we solved. When we submit the render command buffers for processing, this second semaphore is unsignalled. When the command buffers finish execution, the GPU will transition the semaphore to the signalled state. The call to vkQueuePresentKHR has been configured to ensure the presentation engine waits for this condition to be true before beginning whatever work it needs to do to get that image on to our screen.

With the above two race conditions brought under control, we can now safely loop around the sequence of steps 1-4 as many times as we like.

Well, almost. There is a slight subtlety in that the swapchain has N frames (typically 3 or so) but so far we have only created a single semaphore for the presentation→render ordering, and a second single semaphore for the render→presentation ordering. Usually however, we do not want to render and present a single image and then wait around for the presentation to be done before starting over, as that is a big waste of cycles on both the CPU and GPU sides.

As a side note, many Vulkan examples in tutorials do this by introducing a call to vkDeviceWaitIdle or vkQueueWaitIdle somewhere in their main loop. This is fine for learning Vulkan and its concepts but to get full performance we want to go further into allowing the CPU and the GPU to work concurrently.

One thing that we can do is to create enough semaphores such that we have one each for every frame that we wish to have "in flight" at any time and for each of the 2 required synchronization points. We can then use the i'th pair of semaphores for the i'th in-flight frame and when we get to the N'th in-flight frame we loop back to the 0'th pair of semaphores in a round robin fashion.

This then allows us to get potentially N frames ahead of the GPU on the CPU timeline. This, unfortunately, opens up our next can of worms.

CPU↔GPU Synchronization

So far we have shown that using semaphores when enqueuing work for the GPU allows us to correctly order the work done on the GPU timeline. We have briefly mentioned that this does nothing to keep the CPU in sync with the GPU. As it stands right now the CPU is free to schedule as much work in advance as we like (assuming sufficient available resources). This has a couple of issues though:

The more frames of work in advance the CPU schedules work for the GPU, the more resources we need to hold command buffers, semaphores etc. - not to mention the GPU resources to which the command buffers refer, such as buffers and textures. These GPU resources all have to be kept alive as long as any command buffers are referencing them.
The second issue is that the further the CPU gets ahead of the GPU the further our simulation state gets ahead of what we see. That means, the more frames ahead we allow the CPU to get, the higher is our latency. Some latency can be good in that if we have a frame or two queued up already, a frame that then takes a bit longer to prepare can be absorbed unnoticed. However, too much latency and our application feels sluggish and unnatural to use as it takes too long for our input to be responded to and for us to see the results of that on screen.

It is therefore essential to have a good handle on our system's latency which in this case means the number of frames we allow to be "in flight" at any one time. That is, the number of frames worth of command buffers that have been submitted to the GPU queues and are being recorded at the current time. A common choice here is to allow 2-3 frames to be in flight at once. Bear in mind that this also depends upon other factors such as your display's refresh rate. If you are running on a high refresh rate display at say 240Hz, then each frame is only around for 1/4 of the time of a "standard" 60Hz display. If this is the case, you may wish to increase the number of frames in flight to compensate.

Let's parameterise the max number of frames that the CPU can get ahead as MAX_FRAMES_IN_FLIGHT. From our discussions in the previous sections we know that if we can keep the CPU from getting ahead by only MAX_FRAMES_IN_FLIGHT frames, then we will only need MAX_FRAMES_IN_FLIGHT semaphores for each use of a semaphore within a frame.

So now the question is how do we stop the CPU from racing ahead of the GPU? Specifically we need a way to make the CPU timeline wait until the GPU timeline indicates that it is done with processing a frame. In Vulkan, the answer to this is a fence (VkFence). Conceptually this is how we can structure a frame with fences to get the desired result (ignoring the use of semaphores for GPU↔GPU synchronization):

In the application initialisation, create MAX_FRAMES_IN_FLIGHT fence objects in the signalled state.
Force the CPU timeline to wait until the fence for this frame becomes signalled or continue immediately if it is the first frame and the fence is already signalled (vkWaitForFences).
Reset the fence to the unsignalled state so that we can wait for it again in the future (vkResetFences).
Acquire the swapchain image index (as before).
Record and submit the command buffers to perform the rendering for this frame. When it is time to submit the command buffers to the GPU queue, we can pass in the fence for this frame as the final argument to vkQueueSubmit. Just as with a semaphore, when the GPU queue finishes processing this command buffer submission, it will transition the fence to the signalled state.
Issue a GPU command to present the completed swapchain image (as before).
Go to step 2 and use the next fence and (set of semaphores).

With this approach, the CPU timeline can only get at most MAX_FRAMES_IN_FLIGHT ahead of the GPU before the call to vkWaitForFences in step 2 forces it to wait for the corresponding fence to become signalled by the GPU. This is when it completes command buffer submission that went along with this fence.

Making use of both fences and semaphores allows us to nicely keep both the CPU and the GPU timelines making progress without races (between rendering and presentation) and without the CPU running away from us. These two synchronization primitives, fences and semaphores, solve similar but different problems:

A VkFence is a synchronization primitive to allow the keeping of the GPU and CPU timelines in pace.
A VkSemaphore is a synchronization primitive to ensure ordering of GPU tasks.

It is also worth noting that a VkFence can also be queried as to its state from the CPU timeline rather than having to block until it becomes signalled (vkGetFenceStatus). This allows your application to peek and see if a fence is signalled or not. If it is not yet signalled, your application may be able to make more use of the available time to go do something more productive than just blocking like with vkWaitFences. It all depends upon the design of your application.

Other Considerations

Presentation Mode

We have seen above how we can utilise fences and semaphores to make our Vulkan applications well-behaved. It is also worth mentioning that, as an application author, you should also consider your choice of swapchain presentation mode. This is because this can heavily impact on how your application behaves and how many CPU/GPU cycles it uses. With OpenGL we would typically setup to have either:

VSync enabled rendering for tear-free display OR
VSync disabled rendering and go as fast as you can but probably see some image tearing.

With Vulkan we can still get these configurations but there are also others that offer variations. As an example, VK_PRESENT_MODE_MAILBOX_KHR allows us to have tear-free display of the currently presented image (it is vsync enabled), but we can have our application also render as fast as possible. Very briefly, the way this works is that when the presentation engine is displaying swapchain image 0, our calls to vkAcquireNextImageKHR will only return the other swapchain image indices. When we subsequently tell the GPU to present those images it will happily take the image and overwrite your previous presentation submission. When the next vertical blank occurs, the presentation engine will actually show the most up to date submitted swapchain image.

In this manner we can render to e.g. images 1 and 2 as many times as we like so that when the presentation engine moves along, it has the most up to date representation of our application's state possible.

Depending upon which swapchain presentation mode you request, your application could be locked to the VSync frequency or not, which in turn can lead to large differences in how much of your available CPU and GPU resources are consumed. Are they out for a leisurely stroll (VSync enabled) or sprinting (VSync disabled or mailbox mode)?

Multiple Windows and Swapchains

All of the above examples have assumed we are working with a single window surface and a single swapchain. This is the common case for games but in desktop and embedded applications we may well have multiple windows or multiple screens or even multiple adapters. Vulkan, unlike OpenGL, is pretty flexible when it comes to threading. With some care, we can happily record the command buffers for different windows (swapchains) on different CPU threads. For swapchains sharing a common Vulkan device, we can even request them all to be presented in one function call rather than having to call the equivalent of swapbuffers on each of them sequentially. Once again, Vulkan and the WSI gives you the tools, it's up to you how you utilise them.

Timeline Semaphores

A more recent addition to Vulkan, known as timeline semaphores, allows applications to use this synchronization primitive to work like a combination of a traditional semaphore and a fence. A timeline semaphore can be used just like a traditional (binary) semaphore to order packets of GPU work correctly, but they may also be waited upon by the CPU timeline (vkWaitSemaphores). The CPU may also signal a timeline semaphore via a call to vkSignalSemaphore. If found to be supported by your Vulkan version and driver, you can use timeline semaphores to simplify your synchronization mechanisms.

Pipeline and Memory Barriers

This article has only concerned itself with the high-level or coarse synchronization requirements. Depending upon what you are doing inside of your command buffers you will also likely need to take care of synchronising access to various other resources. These include textures and buffers to ensure that different phases of your rendering are not trampling over each other. This is a large topic in itself and is covered in extreme detail by this article and the accompanying examples.

It's up to you!

A lot of what the OpenGL driver used to manage for us, is now firmly on the plate of application and library developers who wish to make use of Vulkan or other explicit modern graphics APIs. Vulkan provides us with a plethora of tools but it is up to us to decide how to make best use of them and how to map them onto the requirements of our applications. I hope that this article has helped explain some of the considerations of synchronization that you need to keep in mind when you decide to take the next step from the tutorial examples and remove that magic call to vkDeviceWaitIdle.

The post Synchronization in Vulkan appeared first on KDAB.

Shader Variants

Sean Harmer — Thu, 27 Apr 2023 08:00:00 GMT

Shader Variants

Background of Shaders

One particular facet of modern graphics development that is often a pain - even for AAA games -- is shader variants!

If you have bought an AAA game in recent years and wondered what the heck it is doing when it says it is compiling shaders for a long time (up to an hour or more for some recent PC titles on slower machines!), then this blog will explain it a little.

Modern graphics APIs (Vulkan, D3D12, Metal) like to know about everything that has to do with GPU state, up front. A large chunk of the GPU state is provided by so-called shader programs. These shader programs fill in various gaps in the graphics pipeline that used to be provided by fixed-function hardware back in the days of OpenGL 1.x.

As OpenGL (and DirectX) evolved, people wanted to do a wider range of things when processing vertices into colorful pixels on-screen. So, over time, the fixed function silicon on GPUs has gradually been replaced by more and more general purpose processors. As with CPUs, we now need to tell these processors what to do by writing small (sometimes larger), specialized programs called shader programs.

In OpenGL, we would write our shaders in the high-level GLSL language and feed that to the OpenGL driver as a string at runtime. The OpenGL driver would then compile the GLSL to GPU machine code and we could then throw big piles of vertices and other resources like textures at it and marvel at the results -- or, more likely, swear a bit and wonder why we are staring at a black window yet again.

The necessity of including a complete compiler in the graphics driver was a huge burden for each of the GPU vendors, resulting in a great deal of overhead for them. It also led to some strange problems for developers when running code on a new platform with a different GLSL compiler in the driver and hitting new and different bugs or shortcomings.

With the advent of modern graphics APIs, there has been a move toward consuming shader code in the form of a bytecode intermediate representation, such as SPIR-V. SPIR-V is still not the final form of executable code required by the GPU silicon but it is much closer to it than GLSL and means the Vulkan drivers no longer need the entire compiler front-end.

Tooling, such as nSight and RenderDoc, are able to decompile the SPIR-V shader code back to GLSL (or HLSL) to make it easier for you to debug your applications.

The conversion from GLSL (or any other suitable language) to SPIR-V can still happen at runtime if that's what you need -- for example, in dynamic editor tools. However, for constrained applications, we can now compile the GLSL to SPIR-V up front at build time.

That's nice! We can simply add a few targets to our CMakeLists.txt and go home, right? Well, not quite.

The Need for Shader Variants

You see, shader developers are just as lazy as any other kinds of developers and like to reduce the amount of copy/paste coding that we have to do. So, we add optional features to our shaders that can be compiled in or out by way of pre-processor #defines, just as with C/C++.

Why is this even needed, though? Well, we don't always have full control over the data that our application will be fed. Imagine a generic glTF file viewer application. Some models that get loaded will use textures for the materials and include texture coordinates in the model's vertex data. Other models may just use vertex colors, completely leaving out texture coordinates.

To handle this, our vertex shader's prologue may look something like this:

layout(location = 0) in vec3 vertexPosition;
layout(location = 1) in vec3 vertexNormal;
#ifdef TEXCOORD_0_ENABLED
layout(location = 2) in vec2 vertexTexCoord;
#endif

layout(location = 0) out vec3 normal;
#ifdef TEXCOORD_0_ENABLED
layout(location = 1) out vec2 texCoord;
#endif

Then, in the main() function, we would have:

void main()
{
#ifdef TEXCOORD_0_ENABLED
    texCoord = vertexTexCoord;
#endif
    normal = normalize((camera.view * entity.model[gl_InstanceIndex] * vec4(vertexNormal, 0.0)).xyz);
    gl_Position = camera.projection * camera.view * entity.model[gl_InstanceIndex] * vec4(vertexPosition, 1.0);
}

The fragment shader would have similar changes to handle the cases with and without texture coordinates.

Super, so we have one set of shader source files that can handle both models with textures and models without textures. How do we compile the shaders to get these shader variants?

Just as with C/C++ we have a compiler toolchain and, similarly, we invoke the compiler with the various -D options as needed, e.g.:

glslangValidator -o material-with-uvs.vert.spirv -DTEXCOORD_0_ENABLED material.vert    # With texture coords
glslangValidator -o material-without-uvs.vert.spirv material.vert                      # Without texture coords

Then, within our application, we can load the glTF model, inspect its data to see whether it uses textures, and then load the appropriate SPIR-V compiled shader.

Hooray! The job is done and we can go home now, right? Well, actually, no -- the project manager just called to say we also need to handle models that include the alpha cut-off feature and models that don't include it.

Alpha cut-off is a feature of glTF files by which any pixels determined to have an alpha value less than some specified threshold simply get discarded. This is often used to cut away the transparent parts of quads used to render leaves of plants.

Ok then -- let's simply repeat a process similar to that which we did for handling the presence, or absence, of texture coordinates.

The fragment shader implementation of alpha cut-off is trivial:

void main()
{
    vec4 baseColor = ...;
#ifdef ALPHA_CUTOFF_ENABLED
    if (baseColor.a < material.alphaCutoff)
        discard;
#endif
    ...
    fragColor = baseColor;
}

We can then add suitable CMake targets to compile with and without this option.

Of course, there's a catch. We have a combinatorial explosion of feature combinations. This only gets worse when we add the next optional feature or optional features that have various settings we wish to set at compile time, such as the number of taps used when sampling from a texture to perform a Gaussian blur.

Clearly, we do not want to have to add several thousand combinations of features as CMake targets by hand! So, what can we do?

Exploring the Problem

Let's consider the above combination of the texture coordinates and alpha cut-off features. Our table of features and compiler flags looks like this:

	Tex Coord Off	Tex Coord On
Alpha Cut-off Off		-DTEXCOORD_0_ENABLED
Alpha Cut-off On	-DALPHA_CUTOFF_ENABLED	-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED

Adding another option would add another dimension to this table. The above mentioned option of blur filter taps with, say, 3, 5, 7, or 9 taps would add a 3rd dimension to the table and increase the number of options by another factor of 4, for a total of 16 possible configurations of this one shader program.

Adding just a handful of features, we can see that it would be all too easy to end up with thousands of combinations of compiled shaders from the single set of GLSL files!

How can we solve this in a nice and extensible way?

It is easy enough to have nested loops to iterate over the available options for each of the specified axes of variations. But what if we don't know all of the axes of variation up front? What if they vary from shader to shader? Not all shaders will care about alpha cut-off or blur filter taps, for example.

We can't simply hard-wire a set number of nested loops to iterate over the combinations in our CMake files. We need something a bit more flexible and smarter.

Let's think about the problem in a slightly different way.

To start with, let's represent a given configuration of our option space by a vector of length N, where N is the number of options. For now, let's set this to 3, for our options we have discussed:

Texture Coordinates (Off or On)
Alpha Cut-off (Off or On)
Blur filter taps (3, 5, 7, or 9)

That is, we will have a vector like this:

[TexCoords Off, Alpha Cut-off Off, blur taps = 3]

To save some typing, let's now replace the wordy description of each element with a number representing the index of the option for that axis of variation:

Texture Coordinates: (0 = Off, 1 = On)
Alpha Cut-off: (0 = Off, 1 = On)
Blur filter taps: (0 = 3 taps, 1 = 5 taps, 2 = 7 taps, 3 = 9 taps)

With this scheme in place, our above option set will be:

[0, 0, 0]

And the vector representing texture coordinates on, no alpha cut-off, and 7 blur filter taps option will be:

[1, 0, 2]

How does this help us? Well, it allows us to succinctly represent any combination of options; but it's even better than that. We can now easily go through the list of all possible combinations in a logical order. We begin by incrementing the final element of the vector over all possible values. Then we increment the previous element and repeat, like this:

[0, 0, 0]
[0, 0, 1]
[0, 0, 2]
[0, 0, 3]
[0, 1, 0]
[0, 1, 1]
[0, 1, 2]
[0, 1, 3]
[1, 0, 0]
[1, 0, 1]
[1, 0, 2]
[1, 0, 3]
[1, 1, 0]
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]

Note that the total number of option combinations is just the product of the number of options in each dimension or axis of variation, e.g. 2x2x4 = 16 in this example.

The above sequence is exactly what we would get if we had 3 nested for-loops to iterate over the options at each level. How does this help us?

Well, looking at the above sequence of options vectors, you may well notice the similarity to plain old counting of numbers. For each "decimal place" (element in the vector), starting with the final or least significant digit, we go up through each of the available values. Then, we increment the next least significant digit and repeat.

The only difference to how we are used to counting in decimal (base 10), binary, octal, or hexadecimal is that the base of each digit is potentially different. The base for each digit is simply the number of options available for that axis of variation (e.g. the texture coordinates can only be on or off (base = 2)). It's the same for the alpha cut-off. The blur taps option has a base of 4 (4 possible options).

We know how many combinations we need in total and we know that each combination can be represented by a vector that acts like a variable-base number. Therefore, if we can find a way to convert from a decimal number to the corresponding combination vector, we are in a good situation, as we will have converted a recursive approach (nested for-loops) into a flat linear approach. All we would need would be something like this pseudo-code:

for i = 0 to combination_count
   option_vector = calculate_option_vector(i)
   output_compiler_options(option_vector)
next i

So how do we do this?

A Solution

To convert a decimal number into a different base system is fairly easy. The process is described well at https://www.tutorialspoint.com/computer_logical_organization/number_system_conversion.htm, where they give an example of converting from decimal to binary.

All we have to do, in our case, is use a base that differs for each digit of our combination vector. However, before we show this, we need a way to specify the options for each shader that we wish to consider. We have done this by way of a simple JSON file, for now. Here is an example showing our above case for these options as applied to the fragment shader, but only the texture coordinates and alpha cut-off for the vertex shader. This is just an example for illustration. In reality, the vertex shader has nothing to do with alpha cut-off and our simple shaders do not do anything with the blur tap option at all:

{
    "options": [
        {
            "name": "hasTexCoords",
            "define": "TEXCOORD_0_ENABLED"
        },
        {
            "name": "enableAlphaCutoff",
            "define": "ALPHA_CUTOFF_ENABLED"
        },
        {
            "name": "taps",
            "define": "BLUR_TAPS",
            "values": [3, 5, 7, 9]
        }
    ],
    "shaders": [
        {
            "filename": "materials.vert",
            "options": [0, 1]
        },
        {
            "filename": "materials.frag",
            "options": [0, 1, 2]
        }
    ]
}

The default in our system, if no explicit options are provided in the JSON file, is defined (on) or not defined (off).

Each input shader file section then specifies which of the options it cares about. So, in this example, the fragment shader considers all 3 options and will have 16 variants compiled.

In order to generate the possible build combinations, we have written a small Ruby script to implement the necessary logic. Why Ruby? Because I couldn't face trying to do the necessary math in CMake's scripting language and Ruby is lovely!

The core of the script that implements the decimal to a variable-base number (combination vector) is pretty simple:

def calculate_digits(bases, index)
  digits = Array.new(bases.size, 0)
  base_index = digits.size - 1
  current_value = index
  while current_value != 0
    quotient, remainder = current_value.divmod(bases[base_index])
    digits[base_index] = remainder
    current_value = quotient
    base_index -= 1
  end
  return digits
end

In the above code, the bases argument is a vector representing the base of each digit in the final combination vector. Here, bases = [2, 2, 4]. We then loop over the decimal number, performing the divmod operation at each step to find the value of each digit in our combination vector. When we have reduced the input decimal number to 0, we are done. This is exactly analogous to the decimal to binary conversion linked above but for variable base at each digit.

With the resulting combination vector in hand, it is simple for us to then look up the corresponding compiler -D option for that selection and output that into a JSON string. Here is an example of the output of running the ruby script against the above configuration file:

{
  "variants": [
    {
      "input": "materials.vert",
      "defines": "",
      "output": "materials.vert.spv"
    },
    {
      "input": "materials.vert",
      "defines": "-DALPHA_CUTOFF_ENABLED",
      "output": "materials_alpha_cutoff_enabled.vert.spv"
    },
    {
      "input": "materials.vert",
      "defines": "-DTEXCOORD_0_ENABLED",
      "output": "materials_texcoord_0_enabled.vert.spv"
    },
    {
      "input": "materials.vert",
      "defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED",
      "output": "materials_texcoord_0_enabled_alpha_cutoff_enabled.vert.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DBLUR_TAPS=3",
      "output": "materials_blur_taps_3.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DBLUR_TAPS=5",
      "output": "materials_blur_taps_5.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DBLUR_TAPS=7",
      "output": "materials_blur_taps_7.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DBLUR_TAPS=9",
      "output": "materials_blur_taps_9.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=3",
      "output": "materials_alpha_cutoff_enabled_blur_taps_3.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=5",
      "output": "materials_alpha_cutoff_enabled_blur_taps_5.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=7",
      "output": "materials_alpha_cutoff_enabled_blur_taps_7.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=9",
      "output": "materials_alpha_cutoff_enabled_blur_taps_9.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=3",
      "output": "materials_texcoord_0_enabled_blur_taps_3.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=5",
      "output": "materials_texcoord_0_enabled_blur_taps_5.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=7",
      "output": "materials_texcoord_0_enabled_blur_taps_7.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DBLUR_TAPS=9",
      "output": "materials_texcoord_0_enabled_blur_taps_9.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=3",
      "output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_3.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=5",
      "output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_5.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=7",
      "output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_7.frag.spv"
    },
    {
      "input": "materials.frag",
      "defines": "-DTEXCOORD_0_ENABLED -DALPHA_CUTOFF_ENABLED -DBLUR_TAPS=9",
      "output": "materials_texcoord_0_enabled_alpha_cutoff_enabled_blur_taps_9.frag.spv"
    }
  ]
}

If you are interested, this is the full script:

require 'json'
require 'pp'

def expand_options(data)
  # Expand the options so that if no explicit options are specified we default
  # to options where the #define symbole is defined or not
  data[:options].each do |option|
    if !option.has_key?(:values)
      option[:values] = [:nil, :defined]
    end
    option[:count] = option[:values].size
  end
end

def extract_options(data, shader)
  shader_options = Hash.new
  shader_options[:options] = Array.new
  shader[:options].each do |option_index|
    shader_options[:options].push data[:options][option_index]
  end
  # STDERR.puts "Options for shader:"
  # STDERR.puts shader_options
  return shader_options
end

def find_bases(data)
  bases = Array.new(data[:options].size)
  (0..(data[:options].size - 1)).each do |index|
    bases[index] = data[:options][index][:count]
  end
  return bases
end

def calculate_steps(bases)
  step_count = bases[0]
  (1..(bases.size - 1)).each do |index|
    step_count *= bases[index]
  end
  return step_count
end

# Calculate the number for "index" in our variable-bases counting system
def calculate_digits(bases, index)
  digits = Array.new(bases.size, 0)
  base_index = digits.size - 1
  current_value = index
  while current_value != 0
    quotient, remainder = current_value.divmod(bases[base_index])
    digits[base_index] = remainder
    current_value = quotient
    base_index -= 1
  end
  return digits
end

def build_options_string(data, selected_options)
  str = ""
  selected_options.each_with_index do |selected_option, index|
    # Don't add anything if option is disabled
    next if selected_option == :nil

    # If we have the special :defined option, then we add a -D option
    if selected_option == :defined
      str += " -D#{data[:options][index][:define]}"
    else
      str += " -D#{data[:options][index][:define]}=#{selected_option}"
    end
  end
  return str.strip
end

def build_filename(shader, data, selected_options)
  str = File.basename(shader[:filename], File.extname(shader[:filename]))
  selected_options.each_with_index do |selected_option, index|
    # Don't add anything if option is disabled
    next if selected_option == :nil

    # If we have the special :defined option, then we add a section for that option
    if selected_option == :defined
      str += "_#{data[:options][index][:define].downcase}"
    else
      str += "_#{data[:options][index][:define].downcase}_#{selected_option.to_s}"
    end
  end
  str += File.extname(shader[:filename]) + ".spv"
  return str
end

# Load the configuration data and expand default options
if ARGV.size != 1
  puts "No filename specified."
  puts "  Usage: generate_shader_variants.rb "
  exit(1)
end

variants_filename = ARGV[0]
file = File.read(variants_filename)
data = JSON.parse(file, { symbolize_names: true })
expand_options(data)

# Prepare a hash to output as json at the end
output_data = Hash.new
output_data[:variants] = Array.new

data[:shaders].each do |shader|
  # STDERR.puts "Processing #{shader[:filename]}"

  # Copy over the options referenced by this shader to a local hash that we can operate on
  shader_options = extract_options(data, shader)

  # Create a "digits" array we can use for counting. Each element (digit) in the array
  # will correspond to an option in the loaded data configuration. The values each
  # digit can take are those specified in the "values" array for that option.
  #
  # The number of steps we need to take to count from "0" to the maximum value is the
  # product of the number of options for each "digit" (option).
  bases = find_bases(shader_options)
  # STDERR.puts "Bases = #{bases}"
  step_count = calculate_steps(bases)
  # STDERR.puts "There are #{step_count} combinations of options"

  # Count up through out range of options
  (0..(step_count - 1)).each do |index|
    digits = calculate_digits(bases, index)

    selected_options = Array.new(bases.size)
    (0..(bases.size - 1)).each do |digit_index|
      settings = data[:options][digit_index]
      setting_index = digits[digit_index]
      selected_options[digit_index] = settings[:values][setting_index]
    end

    # Construct the options to pass to glslangValidator
    defines = build_options_string(shader_options, selected_options)
    output_filename = build_filename(shader, shader_options, selected_options)

    # STDERR.puts "  Step #{index}: #{digits}, selected_options = #{selected_options}, defines = #{defines}, output_filename = #{output_filename}"

    variant = { input: shader[:filename], defines: defines, output: output_filename }
    output_data[:variants].push variant
  end

  # STDERR.puts ""
end

puts output_data.to_json

Integrating into the Build System

CMake is now able to read and parse JSON documents -- a fact that I didn't know at first. This means that we can quite conveniently ask our build system to execute our Ruby script as an external process at configure time, capture the JSON output as shown above, iterate over the generated combinations, and add a build target for each one.

The cut-down code for doing this is:

function(CompileShaderVariants target variants_filename)
    # Run the helper script to generate json data for all configured shader variants
    execute_process(
        COMMAND ruby ${CMAKE_SOURCE_DIR}/generate_shader_variants.rb ${variants_filename}
        WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
        OUTPUT_VARIABLE SHADER_VARIANTS
        RESULT_VARIABLE SHADER_VARIANT_RESULT
    )

    if(NOT SHADER_VARIANT_RESULT EQUAL "0")
        message(NOTICE ${SHADER_VARIANT_RESULT})
        message(FATAL_ERROR "Failed to generate shader variant build targets for " ${variants_filename})
    endif()

    string(JSON VARIANT_COUNT LENGTH ${SHADER_VARIANTS} variants)
    message(NOTICE "Generating " ${VARIANT_COUNT} " shader variants from " ${variants_filename})

    # Adjust count as loop index goes from 0 to N
    MATH(EXPR VARIANT_COUNT "${VARIANT_COUNT} - 1")

    foreach(VARIANT_INDEX RANGE ${VARIANT_COUNT})
        string(JSON CURRENT_INTPUT_FILENAME GET ${SHADER_VARIANTS} variants ${VARIANT_INDEX} input)
        string(JSON CURRENT_OUTPUT_FILENAME GET ${SHADER_VARIANTS} variants ${VARIANT_INDEX} output)
        string(JSON CURRENT_DEFINES GET ${SHADER_VARIANTS} variants ${VARIANT_INDEX} defines)

        set(SHADER_TARGET_NAME "${target}_${CURRENT_OUTPUT_FILENAME}")
        CompileShader(${SHADER_TARGET_NAME} ${CURRENT_INTPUT_FILENAME} ${CURRENT_OUTPUT_FILENAME} ${CURRENT_DEFINES})
    endforeach(VARIANT_INDEX RANGE ${VARIANT_COUNT})
endfunction()

Here, CompileShader() call is another helper function that just invokes the glslangValidator GLSL->SPIR-V compiler with the specified options.

This nicely takes care of generating all of the required shader variants that will be compiled with correct dependencies on the source GLSL files. To ensure that the targets get updated if the input JSON configuration file changes, we can add the following snippet to the above function:

# Re-run cmake configure step if the variants file changes
set_property(
    DIRECTORY
    APPEND
    PROPERTY CMAKE_CONFIGURE_DEPENDS ${variants_filename}
)

Now, if we edit the JSON configuration file that contains the options, CMake will automatically re-run and generate the targets.

On the C++ runtime side of things, we have some logic to construct the appropriate shader file name for the compiled SPIR-V shader matching the options needed by whatever model we are rendering.

In the future, we may make this part more reusable by making it read in the same JSON configuration file used to create the shader variants.

Wrapping Up

So, going back to where we started: how does all of this tie into your PC's spending an hour compiling shaders when we have shown here how to compile them at application build time?

It all goes back to SPIR-V's just being a bytecode intermediate representation. Before the GPU can execute these shaders, it needs to do a final compilation step to convert the SPIR-V to actual machine code. In a modern graphics API, this is done when we create a so-called "graphics pipeline." At this point, we have to specify pretty much all GPU state, which then gets baked into a binary blob along with the shader code by the driver. This binary blob is both GPU-vendor and driver-version specific. So, it cannot be built at application build time but, rather, has to be done on the actual machine on which it will execute.

The first time you run such a game or other application, it will often loop through all of the shader variants and compile a graphics pipeline for each one. These then get cached to disk for use on subsequent runs. If you change your GPU or (more likely) the driver version, then this cache might get invalidated and you'd have to sit through this process once again.

For systems with known hardware and drivers, this whole process can be performed as part of the build step. This is why consoles such as the PlayStation 5 do not have to do this lengthy shader compiling step, while we wait there and watch.

There is some work going on in Khronos at present, in the shape of VK_ext_shader_object, to try to get back to a more dynamic-shader friendly way of doing things, in which the driver takes care of much of this compiling and caching for us. As with all things in computer science though, it will be a trade-off.

Thank you for reading about what turned out to be a nice little excursion of simplifying a problem by changing it from recursive to linear and learning about converting between numbers of different bases.

If you would like to learn more about modern 3D graphics or get some help on your own projects, then please get in touch.

The post Shader Variants appeared first on KDAB.

FMA Woes

Fri, 24 Feb 2023 07:00:00 GMT

FMA Woes

Given a strictly positive integer i, this code will calculate i+1 "equally spaced" values between 1 and 0:

const double scale = 1.0 / i;

for (int j = 0; j <= i; ++j) {
    const double r = 1.0 - j * scale;
    assert(r >= 0);
}

If you're looking for a trap, this does actually work for any i > 0. One can verify it experimentally; run the code with i from 1 to INT_MAX.

For simplicity, just consider the case j = i (the maximum for j, in the last loop of iteration above):

const double scale = 1.0 / i;
const double r = 1.0 - i * scale;
assert(r >= 0);

You can see it running here on Compiler Explorer.

Some time later, you upgrade your compiler and the code doesn't work any more.

Another example: given two double numbers a and b such that abs(a) >= abs(b), then this code,

const double result = std::sqrt(a*a - b*b);

will work; it will never pass a negative argument to sqrt until you upgrade your compiler. Then, it will start failing...

What Did the Compiler Do to You?

In an least one interesting case (Clang 14, now shipped with the last XCode on Apple), started recognizing floating-point expressions, such as,

x * y + z

and started to automatically turn them into fused multiply-add (FMA) instructions:

The MAC operation modifies an accumulator

a: a ← a + ( b × c )

When done with floating point numbers, it might be performed with two roundings (typical in many DSPs), or with a single rounding. When performed with a single rounding, it is called a fused multiply–add (FMA) or fused multiply–accumulate (FMAC).

An FMA instruction carries the two operations in one step, and does them in "infinite precision". Notably, an FMA does only one rounding at the end instead of the sequence expressed by the source code, which is: 1) multiplying, 2) rounding the result, 3) adding, 4) rounding the result. So, there are two steps of rounding.

So not only is it faster, but it's also more accurate.

However, one can easily encounter cases (like the two cases illustrated above) in which doing operations without the intermediate rounding step will give you trouble.

Let's look again at the first example:

const double scale = 1.0 / i;
const double r = 1.0 - i * scale;
assert(r >= 0);

If i = 5, then scale is
0.200000000000000011102230246251565404236316680908203125, and r is
negative (about -5.55e-17) when using a FMA. The point is that i * scale did not get rounded in an intermediate.

In the second example,

const double result = std::sqrt(a*a - b*b);

the argument to sqrt(a*a - b*b) can be turned into FMA(a, a, -b*b). If a == b. Then, this expression is equivalent to FMA(a, a, -a*a). The problem is that, if a*a done in "infinite precision" is strictly less than the rounded product of a*a, then the result will again be a negative number (not 0!) passed into sqrt. This is very easy to obtain (example on CE)!

Rounding in C++

For me, the interesting question is, "Is the compiler allowed to do these manipulations, since they affect the rounding as expressed by the source code?"

Within the context of one expression, compilers can use as much precision as they want. This is allowed by:
[expr.pre/6] and similar paragraphs:

The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

with a footnote that says:

The cast and assignment operators must still perform their specific conversions as described in [expr.type.conv], [expr.cast], [expr.static.cast] and [expr.ass].

Now suppose that one turns

const double scale = 1.0 / i;
const double r = 1.0 - i * scale;

into separate expressions and statements:

const double scale = 1.0 / i;
const double tmp = i * scale;
const double r = 1.0 - tmp;

Here, in theory, the source code mandates that tmp is rounded; so a compiler cannot do a FMA when calculating r.

In practice, compilers violate the standard and apply FMA. :-)

In literature, these substitutions are called "floating point contractions." Let's read what the GCC manual has to say about them:

By default, -fexcess-precision=fast is in effect; this means that operations may be carried out in a wider precision than the types specified in the source if that would result in faster code, and it is unpredictable when rounding to the types specified in the source code takes place.

(Emph. mine)

Hence, you can turn these optimizations off by compiling under -std=c++XX, not -std=gnu++XX (the default). If you try to use -fexcess-precision=standard, then GCC lets you know that:

cc1plus: sorry, unimplemented: '-fexcess-precision=standard' for C++

The Origin

Where does all this nonsense come from? The first testcase is actually out of Qt Quick rendering code. It has been lingering around for a decade!

Qt Quick wants to give a unique Z value to each element of the scene. These Z values are then going to be used by the underlying graphics stack (GL, Vulkan, Metal) as the depth of the element. This allows Qt Quick to render a scene using the ordinary depth testing that a GPU provides.

The Z values themselves have no intrinsic meaning, as long as they establish an order. That's why they're simply picked to be equidistant in a given range (simplest strategy that maximizes the available resolution).

Now, the underlying 3D APIs want a depth coordinate precisely in [0.0, 1.0]. So that's picked as range and then inverted (going from 1.0 to 0.0) because, for various reasons, Qt Quick wants to render back-to-front (smaller depth means "closer to the camera", i.e. on top in the Qt Quick scene.).

When the bug above gets triggered the topmost element of the scene doesn't get rendered at all. That is because its calculated Z value is negative; instead of being the "closest to the camera" (it's the topmost element in the scene), the 3D API will think the object ended up being behind the camera and will cull it away.

So why didn't anyone notice so far in the last 10 years? On one hand, it's because no one seems to compile Qt with aggressive compiler optimizations enabled. For instance, on X86-64 one needs to opt-in to FMA instructions; on GCC, you need to pass -march=haswell or higher. On ARM(64), this manifests more "out of the box" since ARM7/8 have FMA instructions.

On the other hand, because by accident everything works fine on OpenGL. Unlike other 3D graphics APIs, on OpenGL the depth range in Normalized Device Coordinates is from -1 to +1, and not 0 to +1. So even a (slightly) negative value for the topmost element is fine. If one peeks at an OpenGL call trace (using apitrace or similar tools), one can clearly see the negative Z being set.

Only on a relatively more recent combination of components does the bug manifest itself, for instance, on Mac:

Qt 6 ⇒ Metal as graphics API (through RHI)
ARM8 ⇒ architecture with FMA
Clang 14 in the latest XCode ⇒ enables FP contractions by default

Windows and Direct3D (again through RHI) are also, in theory, affected, but MSVC does not generate FMA instructions at all. On Linux, including embedded Linux (e.g. running on ARM), most people still use OpenGL and not Vulkan. Therefore, although GCC has floating-point contractions enabled by default, the bug doesn't manifest itself.

Definitely an interesting one to research; many kudos to the original reporter. The proposed fix was simply to clamp the values to the wanted range. I'm not sure if one can find a numerical solution that works in all cases.

The post FMA Woes appeared first on KDAB.

The Top 100 QML Resources for Developers

Editor Team — Mon, 19 Dec 2022 10:03:00 GMT

The Top 100 QML Resources for Developers

If you’re a reader of this blog, you probably know that we have a huge amount of quality material on QML and Qt Quick, among other topics. In fact, there is so much material that it can be hard to find what you need.

If that sounds familiar, you’ll want to bookmark this page! This blog captures a snapshot of the top 100 resources we offer on QML and Qt Quick. This mix of blogs, instructional videos, and other resources has been organized into simple, easy-to-understand categories with simple descriptions added when necessary.

If you’re just getting started with Qt, you’ll want to begin with our training class. And if there’s a topic here you can’t find, you may also want to try using our search or visit our YouTube channel for even more content.

QML Tutorials for Beginners

Introduction to Qt/QML - Full KDAB Training Class

How-To Tutorials

A 3D Block Building Game in QML
Qt and the unu dashboard – using Redis and pub/sub

Get the most out of Qt Creator

Maximizing your IDE efficiency

Qt Creator cheat-sheet – double-sided page of the best keyboard shortcuts
Sessions in Qt Creator
Spell Checking in Qt Creator
Top 7 Shortcuts in Qt Creator
Writing Qt Creator Debugging Helpers – straightforward variable examination
Grepping in Qt Creator – getting the most out of search
Qt Creator Refactoring Part 1
Qt Creator Refactoring Part 2
Using Bookmarks in Qt Creator – moving through large code bases
Why doesn't my Qt Creator find my files anymore
Mass Text Editing in Qt Creator Using Macros, Block Commands, and more
Save Re-compile Time - Include moc Files in Source Files

Customizing the IDE

Changing the Font to Jetbrains Mono in Qt Creator
Document Templates in Qt Creator - Part 1 – customizing new files
Document Templates in Qt Creator - Part 2
Document Templates in Qt Creator - Part 3
Document Templates in Qt Creator - Part 4
Adding CPPreference to Qt Creator – extending the help system

QML Development Best practices

Development Patterns

Communicating between a View/Delegate and a Model
A Complete Proxy Model Implementation - Part 1
A Complete Proxy Model Implementation - Part 2
No More Booleans! – when _enums and other means are better
Subclassing isn't always the solution! – objects in C++ when they make sense
Checking Your QModelIndex(es) – defensive programming
Lazy Value – the how and why of delayed computation
Enum Class and Model/View – static casting that’s safer and less typing

Development Workflow

My Git Workflow in the Shell and from Qt Creator
Backing up Source Files Every 10 Minutes on Linux
Compile Just This File – get it right before building the world
Git Switch and Restore – recovering repository disasters
Using Clang-Format to Ensure Style Guidelines

Testing

Improved Graphics with QML

Graphics Sizing and Scaling

BorderImage is for Scaling!
Scalable UIs In QML
In Pixels we trust – scalable UIs in QML, part 2

QML and 3D

QML Components

QML Component Design – creating unbreakable bindings
Declarative Widgets – adding Qt Widgets to QML
Efficient custom shapes in Qt Quick – the perfect mix of triangles and shaders
Efficient Custom Shapes in QtQuick : Shaders – coding the fragment shader
QtDD - Breathing new QML life into a QWidget-based app from the 2000s
QtWidgets and QtQuick Controls – A Comparison
Migrating to Qt6 - QVariant – solving combobox problems with QVariant
Writing custom Qt Quick components using OpenGL

QML in-depth for advanced developers

Special Problems

Fun with Paths and URLs in QML
QtDD - Insights From Building A Desktop Productivity App Using QML
Understanding qAsConst / std::as_const
The C++ Explicit Keyword and Qt – understanding why and when it’s used
Using strong_typedef with Qt for Improved Safety
Avoiding QVariant::fromValue around your Own Types

Strings

Handling a Lot of Text in QML
Which String Class in Qt Should I Use?
QStringBuilder – what it is and how to use it
Converting Enums to and from Strings

Specific Environments

QML Internals

QML Engine Internals, Part 1: QML File Loading
QML Engine Internals, Part 2: Bindings
QML Engine Internals, Part 3: Binding Types
QML Engine Internals, Part 4: Custom Parsers
Building Qt – how to build Qt from source
Reading the Qt Source Code

Indispensable Tools for Your QML Projects

Debugging

Full Stack Tracing Part 1
Full Stack Tracing Part 2
Full Stack Tracing, Part 3
Fixing bugs via lateral thinking – finding really hard bugs
Reverse Debugging Using rr – finding bug root causes faster
Speeding up the Start-up of GDB
qDebug - Power User – making the best of simple console output

Profiling

Tools

The post The Top 100 QML Resources for Developers appeared first on KDAB.

KDAB

Simplifying 3D Stereo Visualization – an Automated Approach

Simplifying 3D Stereo Visualization – an Automated Approach

Static configurations in 3D visualization

Challenges of dynamic real-time 3D scenes

Automating parameter configuration for user comfort

Fine-tuning the 3D experience

Defining camera separation for optimal viewing

Conclusion: an immersive 3D stereo experience

KDGpu 0.5.0 is here!

KDGpu 0.5.0 is here!

Wider device support

External memory and images support

Easy & fast XR

More in-depth examples are now available

And more!

Projection Matrices with Vulkan - Part 2

Projection Matrices with Vulkan - Part 2

Recap

Defining the Problem

Deriving the Perspective Projection Matrix

Projection of the x-coordinate

Projection of the y-coordinate

Mapping the z-coordinate

Using the Projection Matrix in Practice

Summary

Projection Matrices with Vulkan - Part 1

Projection Matrices with Vulkan - Part 1

Introduction

Left- vs Right-handed Coordinate Systems

Common Coordinate Systems in 3D Graphics

Model Space or Object Space

World Space

Camera or View or Eye Space

Clip Space

Normalised Device Coordinates

Framebuffer or Window Space

Coordinate Systems in Practice

OpenGL Coordinate Systems

Vulkan Coordinate Systems

Optimizing and Sharing Shader Structures

Optimizing and Sharing Shader Structures

Reasoning

Alignment rules

Passing booleans

Sharing structures

Struct compiler

KDGpu v.0.1.0 is released

KDGpu v.0.1.0 is released

Who is this for?

What got KDAB started on this?

What does KDGpu Provide?

Show me some code!

Find out more and get up and running with KDGpu

Synchronization in Vulkan

Synchronization in Vulkan

GPU↔GPU Synchronization

A day at the races

I'm not ready to show you my painting

CPU↔GPU Synchronization

Other Considerations

Presentation Mode

Multiple Windows and Swapchains

Timeline Semaphores

Pipeline and Memory Barriers

It's up to you!

Shader Variants

Shader Variants

Background of Shaders

The Need for Shader Variants

Exploring the Problem

A Solution

Integrating into the Build System

Wrapping Up

FMA Woes

FMA Woes

What Did the Compiler Do to You?

Rounding in C++

The Origin

The Top 100 QML Resources for Developers