Several best practices of WebGPU

Posted by tech0925 on Wed, 26 Jan 2022 08:48:08 +0100

Slide from 2022 webgl & webgpu Meetup

1 use the label attribute where it can be used

Every object in WebGPU has a label attribute, whether you pass the label attribute of descriptor when creating it, or directly access its label attribute after creation. This attribute is similar to an id, which makes the object easier to debug and observe. There is almost no cost consideration to write it, but it will be very, very cool when debugging.

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer',
  size: 12 * Float32Array.BYTES_PER_ELEMENT, // Deliberately set 12, in fact, the matrix should be 16
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
})
const projectionMatrixArray = new Float32Array(16)

gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray)

The size of the GPUBuffer used in the matrix intentionally written wrong in the above code will bring the label information when checking the error:

// console output 
Write range (bufferOffset: 0, size: 64) does not fit in [Buffer "Projection Matrix Buffer"] size (48).

2 use the commissioning group

Command buffer allows you to add or delete debugging groups. A debugging group is actually a group of strings, which indicates which part of the code is executing. During error verification, the error message will display the call stack:

// ---First debug point: mark the current frame---
commandEncoder.pushDebugGroup('Frame ${frameIndex}');
  // ---First sub debug point: update of marker light---
  commandEncoder.pushDebugGroup('Clustered Light Compute Pass');
        // For example, update the light source here
    updateClusteredLights(commandEncoder);
  commandEncoder.popDebugGroup();
  // ---End the first sub debug point---
  // ---Second sub debug point: Mark render channel start---
  commandEncoder.pushDebugGroup('Main Render Pass');
    // Trigger drawing
    renderScene(commandEncoder);
  commandEncoder.popDebugGroup();
  // ---End the second sub debugging point
commandEncoder.popDebugGroup();
// ---End the first debug point---

In this way, if there is an error message, you will be prompted:

// console output 
Binding sizes are too small for bind group [BindGroup] at index 0

Debug group stack:
> "Main Render Pass"
> "Frame 234"

3 load texture image from Blob

Using ImageBitmaps created by Blob can obtain the best JPG/PNG texture decoding performance.

/**
 * The texture object is created asynchronously according to the texture image path, and the texture data is copied to the object
 * @param {GPUDevice} gpuDevice Device object
 * @param {string} url Texture picture path
 */
async function createTextureFromImageUrl(gpuDevice, url) {
  const blob = await fetch(url).then((r) => r.blob())
  const source = await createImageBitmap(blob)
  
  const textureDescriptor = {
    label: `Image Texture ${url}`,
    size: {
      width: source.width,
      height: source.height,
    },
    format: 'rgba8unorm',
    usage: GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.COPY_DST
  }
  const texture = gpuDevice.createTexture(textureDescriptor)
  gpuDevice.queue.copyExternalImageToTexture(
    { source },
    { texture },
    textureDescriptor.size,
  )
  
  return texture
}

It is more recommended to use texture resources in compressed format

Use it if you can.

WebGPU supports at least 3 compression texture types:

texture-compression-bc
texture-compression-etc2
texture-compression-astc

How much support depends on the hardware capability, according to the official discussion( Github Issue 2083 ), the whole platform should support BC format (also known as DXT and S3TC), or ETC2 and ASTC compression format to ensure that you can use texture compression capability.

It is highly recommended that you use a super compressed texture format (for example Basis Universal ), the advantage is that the device can be ignored, and it can be converted to the format supported by the device, so as to avoid preparing textures of two formats.

The original author wrote a library for loading compressed textures in WebGL and WebGPU Github toji/web-texture-tool

WebGL doesn't support compressed texture very well. Now WebGPU supports it natively, so use it as much as possible!

4 use glTF processing library glTF transform

This is an open source library. You can find it on GitHub. It provides command-line tools.

For example, you can use it to compress glb textures:

> gltf-transform etc1s paddle.glb paddle2.glb
paddle.glb (11.92 MB) → paddle2.glb (1.73 MB)

It is visually lossless, but the volume of the model derived from Blender can be much smaller. The texture of the original model is five PNG images of 2048 x 2048.

In addition to compressing textures, this library can also scale textures, resample, add Google Draco compression and many other functions to geometric data. Finally, after optimization, the volume of glb is less than 5% of the original.

> gltf-transform resize paddle.glb paddle2.glb --width 1024 --height 1024
> gltf-transform etc1s paddle2.glb paddle2.glb
> gltf-transform resample paddle2.glb paddle2.glb
> gltf-transform dedup paddle2.glb paddle2.glb
> gltf-transform draco paddle2.glb paddle2.glb

  paddle.glb (11.92 MB) → paddle2.glb (596.46 KB)

5 buffer data upload

There are many ways to transfer data into the buffer in webgpu. The writeBuffer() method is not necessarily an incorrect usage. When you call WebGPU in wasm, you should give priority to writeBuffer() API, which avoids extra buffer replication operation.

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer',
  size: 16 * Float32Array.BYTES_PER_ELEMENT,
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

// When the projection matrix changes (for example, the window changes size)
function updateProjectionMatrixBuffer(projectionMatrix) {
  const projectionMatrixArray = projectionMatrix.getAsFloat32Array();
  gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray);
}

The original author pointed out that it is not necessary to set mappedAtCreation when creating a buffer. Sometimes it is possible not to map when creating a buffer, such as loading the buffer in glTF.

6. It is recommended to create pipeline asynchronously

If you don't want to render or calculate pipelines right away, try to use createRenderPipelineAsync and createComputePipelineAsync API s instead of synchronous creation.

When creating pipeline synchronously, it is possible to compile the relevant resources of the pipeline at the bottom, which will interrupt the relevant steps of GPU.

For asynchronous creation, if the pipeline is not ready, it will not resolve Promise, that is to say, it can give priority to the things GPU is currently doing first, and then toss the pipeline I need.

Let's look at the comparison code:

// Create calculation routes synchronously
const computePipeline = gpuDevice.createComputePipeline({/* ... */})

computePass.setPipeline(computePipeline)
computePass.dispatch(32, 32) // At this time, the scheduling is triggered, and the shader may get stuck during compilation

Take another look at the code created asynchronously:

// Create calculation pipeline asynchronously
const asyncComputePipeline = await gpuDevice.createComputePipelineAsync({/* ... */})

computePass.setPipeline(asyncComputePipeline)
computePass.dispatch(32, 32) // At this time, the shader has already been compiled. There is no Caton. Great

7 use implicit pipeline layout with caution

Implicit pipeline layout, especially independent computing pipeline, may be great for writing js, but doing so will bring two potential problems:

Break shared resource binding group
Something strange happened while updating the shader

If your situation is particularly simple, you can use implicit pipeline layout, but if you can create pipeline layout explicitly, you can create it explicitly.

The following is the way to create an implicitly pipelined layout, first create the pipeline object, then call the getBindGroupLayout() API of the pipeline to infer the pipeline layout object required in the shader code.

const computePipeline = await gpuDevice.createComputePipelineAsync({
  // Do not transfer layout objects
  compute: {
    module: computeModule,
    entryPoint: 'computeMain'
  }
})

const computeBindGroup = gpuDevice.createBindGroup({
  // Gets the implicit route layout object
  layout: computePipeline.getBindGroupLayout(0),
  entries: [{
    binding: 0,
    resource: { buffer: storageBuffer },
  }]
})

7 shared resource binding group and binding group layout object

If some values do not change but need to be used frequently during rendering / calculation, you can create a simpler layout of resource binding group, which can be used on any pipeline object using the same binding group.

First, create a resource binding group and its layout:

// Create a resource binding group layout of camera UBO and its binding group ontology
const cameraBindGroupLayout = device.createBindGroupLayout({
  label: `Camera uniforms BindGroupLayout`,
  entries: [{
    binding: 0,
    visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
    buffer: {},
  }]
})

const cameraBindGroup = gpu.device.createBindGroup({
  label: `Camera uniforms BindGroup`,
  layout: cameraBindGroupLayout,
  entries: [{
    binding: 0,
    resource: { buffer: cameraUniformsBuffer, },
  }],
})

Then, create two rendering pipelines. Note that both pipelines use two resource binding groups. The difference is that the material resource binding groups are different and share the camera resource binding group:

const renderPipelineA = gpuDevice.createRenderPipeline({
  label: `Render Pipeline A`,
  layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutA]),
  /* Etc... */
});

const renderPipelineB = gpuDevice.createRenderPipeline({
  label: `Render Pipeline B`,
  layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutB]),
  /* Etc... */
});

Finally, in each frame of the rendering cycle, you only need to set the resource binding group of the camera once to reduce the data transmission from CPU to GPU:

const renderPass = commandEncoder.beginRenderPass({/* ... */});

// Set the camera's resource binding group only once
renderPass.setBindGroup(0, cameraBindGroup);

for (const pipeline of activePipelines) {
  renderPass.setPipeline(pipeline.gpuRenderPipeline)
  for (const material of pipeline.materials) {
      // For the material resource binding group in the pipeline, it is set separately
    renderPass.setBindGroup(1, material.gpuBindGroup)
    
    // Set VBO here and issue drawing instructions, omitted
    for (const mesh of material.meshes) {
      renderPass.setVertexBuffer(0, mesh.gpuVertexBuffer)
      renderPass.draw(mesh.drawCount)
    }
  }
}

renderPass.endPass()

Information attached to the original

Author: Brandon Jones, twitter @ Tojiro
Original slide: https://docs.google.com/prese...
Additional reading: https://toji.github.io/webgpu...
A great native WebGPU tutorial (English): https://alain.xyz/blog/raw-we...
For texture contrast details: https://toji.github.io/webgpu...
For details of buffered upload: https://toji.github.io/webgpu...

Topics: webgl gpu

Programmer Think