Kevin Rogovin - FastUIDraw: a high performance 2D renderer for GPU's (only)
We will discuss and provide details for the open source project FastUIDraw (available at https://github.com/01org/fastuidraw).
FastUIDraw aims to be a high performance canvas renderer targeted for only GPU's. The renderer is a current work in progress supporting the following standard features: clip-in, clip-out, path filling, path stroking, linear gradient, radial gradient, images and text rendering. The feature list is not any different from other renderers out in the wild now. What is different is that FastUIDraw is written exclusively for GPU's and has an internal stack to make bringing FastUIDraw to other 3D API's relatively easy. FastUIDraw, because it targets GPU rendering only, achieves much higher performance results for scenes of sufficient complexity. Indeed, the branch "with_ports_of_painter-cells" contains ports of the painter-cells demo ported from FastUIDraw to Qt, Cairo and Skia. The painter-cells demo/benchmark draws a table of cells where each cell has its contents clipped to the cell where the contents are a rectangle and text that are moving and rotating within the cell. The cells can be rotated individually and the entire table as well. Running the demo ports (via example-benchmark.sh within the branch) on my 3+ year old laptop equipped with an Intel GPU, the performance numbers favor FastUIDraw quite heavily. Values are normalized to Cairo CPU performance.
- Cairo CPU : 1.00
- Cairo GL : 0.25
- Cairo Xlib: 1.60
- Qt Raster : 0.98
- Qt GL : 0.66
- Qt Native : 0.26
- Skia GL : 1.68
- FastUIDraw: 9.20
FastUIDraw has the following main feature goals:
- Provide all features as required to implement the HTML Canvas 2D Context (https://www.w3.org/TR/2dcontext/)
- Provide all the features needed by Blinks (and WebKit's) WebCore::GraphicsContext.
Together, these two features mean that once FastUIDraw implements the above, it can be used as a backend for rendering web content by browser engines. In addition, FastUIDraw is designed to support user specified shaders allowing for exotic effects that are not practical to perform on CPU.
Since FastUIDraw is built from the ground up for a GPU, it employs a number of very different techniques to render content. It also does not support various rendering options that both poorly map to GPU and are rarely used. The FastUIDraw stack is designed to support different GPU APIs (for example OpenGL, Vulkan or Metal) in a fashion that dramatically reduces the GPU state thrashing. Even with the new API's such as Vulkan, state thrashing can harm GPU utilization. Roughly speaking, if there is not enough work between pipeline state changes, then a GPU will not be fully utilized even for very low over- head API's such as Vulkan. Typically, applications change what and how they are drawing content. In most GL backed renderers, each different what and how to draw corresponds to a different pipeline state (and more often than not, a different shader). In contrast, FastUIDraw's GL backend needs only 3 different pipeline states (these different states are for supporting all of the Porter-Duff blend modes). Changing the properties of the brush (image, gradient) not only does not change the pipeline state they do not even break a draw call.
Images are placed into a global (dynamically resizable) atlas where there is no waste of space. Instead, images are broken into pieces of equal size tiles so that deleting images (or for that matter the order of creation of images) induces no memory waste. In addition, images can have both bilinear and bicubic filtering applied to them where, although an image is broken into pieces across a single large texture array, that bilinear filtering is entirely performed by the GPU sampler and bicubic filtering is optimized for GPU's by having the GPU sampler perform a significant portion of the work (for example see GPU Gems 2, Chapter 20).
Glyphs are rendered with the GPU with an algorithm of the author that preserves glyph corners (i.e. corners remain sharp as the original geometry dictated). In addition, the renderer also anti-aliases the text providing high quality text (once under sufficient PPI so that hinting is not an issue).
Path stroking has a hiqh quality renderer that provides anti- aliasing without resorting to CPU rendering or MSAA. On another interesting front, the width of stroking is implemented completely shader-side thus changing stroking width does NOT incur any CPU load to recompute attribute data to feed the GPU.
Lastly, the FastUIDraw interface dramatically reduces recomputation of attibute data from primitives. The attribute data can reused when the tranformation, brush or stroking styles change.