Bindless Graphics Tutorial
Want to increase your application performance several fold?
Links
Introduction
Bindless Graphics refers to changes to OpenGL that can enable close to an order of magnitude improvement in the CPU-limitedness of graphics applications. Recent improvements in programmability have focused on additional flexibility in shaders (expanding formats to include more float and integer types, better branching support, etc.) and enabling new features (geometry programs, transform feedback, etc), and while some of these allow offloading parts of certain workloads to the GPU, they don't directly attack the issues that dominate CPU time.The Modern CPU Bottleneck
OpenGL has evolved in a way that allows applications to replace many of the original state machine variables with blocks of user-defined data. For example, the current vertex state has been augmented by vertex buffer objects, fixed-function shading state and parameters have been replaced by shaders/programs and constant buffers, etc.. Applications switch between coarse sets of state by binding objects to the context or to other container objects (e.g. vertex array objects) instead of manipulating state variables of the context. In terms of the number of GL commands required to draw an object, this enables applications to be an order of magnitude more efficient. However, this explosion of objects bound to other objects has led to a new bottleneck - pointer chasing and CPU L2 cache misses in the driver, and general L2 cache pollution.Recent OpenGL graphics applications tend to change state at roughly these frequencies:
for (...) { // cold data downloads, render target changes, etc. for (...) { // warm bind textures for (...) { // hot bind constants bind vertex buffers Draw(); } } }
The most frequent state changes are binding vertex buffer objects (every draw), followed closely by binding constant buffers. Vertex buffer and constant buffer binds are significantly more expensive than one might expect. These binds require several reads from the driver internal object data structure to accomplish what the driver actually needs to do. In an OpenGL driver, it looks like this:
- name->obj (lookup object by name)
- obj->{refcount, GPU address, state, etc.} (dereference object to reference count it, to get its GPU virtual address, and validate its state).
Each of these dereferences has a high probability of causing a CPU L2 cache miss due to the inherently LRU-eviction-unfriendly nature of graphics applications (each frame starts over at the beginning). These L2 cache misses are a huge bottleneck in modern drivers, and a penalty paid for every frame rendered.
Bindless Graphics has the following desirable properties:
- The driver need not dereference a vertex buffer or constant buffer on the CPU in order for the GPU to use it.
- Relieves the limits on how many buffer objects can be accessed at once by shaders
- Buffer objects are accessed as C-style pointer dereferences in the shading language
- Allows for dependent pointer fetches, enabling more complex scene graph structures to be built into buffer objects providing significant new flexibility in the use of shaders.
Measurements have shown that bindless graphics can result in more than 7x speedup!
The Bindless Graphics presentation provides more detail, and explains the usage of the two OpenGL extensions GL_NV_shader_buffer_load and GL_NV_vertex_buffer_unified_memory.
Bindless Graphics is available starting with NVIDIA Release 185 drivers for hardware G80 and up.