The irregular Nouveau Development Companion #40

Issue for October, 29th, 2008

Greetings everyone, the long awaited TiNDC #40 is finally here.

It has been a very long time since the last TiNDC, over five months. Also the Nouveau wiki has been fairly quiet, because no new shiny features have progressed to the level where we would like the general public to test them (or a little bit due to forgetting to update it). This does not say that the project is dead, not at all, the difficult parts take time, and changes in the DRI/DRM development process model and memory manager designs are not exactly pushing us forward. Having our beloved koala_BR hijacked by real life, the public output of the project has been very low. Getting frustrated with the silence, I (pq) decided to write something, but I don't plan to become a regular TiNDC writer. So, here is a review of the current things, but unfortunately I cannot cover everything from the past five months. Thanks to gQuigs for writing the Google Summer of Code section. Enjoy!

Topics:

The Nouveau Source Code Repositories
Renouveau, Mmiotrace and User Contributed Dumps
Displays and Suspend
Kernel Mode Setting
Gallium3D Progress
Memory Management
Fruits of The Google Summer of Code 2008
Short Topics Highlight: NV50 2D is now on par with the rest! For details, see Short Topics.

The Nouveau Source Code Repositories

The Nouveau project uses several git repositories and many branches. In these quiet times, the best place to monitor Nouveau's progress is to watch the repositories. Here is a short introduction to most of them.

http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/
- The home of the X.org driver part, the DDX, is where Xv, EXA and Xrender accelerations are implemented, where Randr 1.2 roams, and which normally sets the display modes. When you use Nouveau, you use the "master" branch. There is also a curious branch called "ng", which we will cover with darktama in section Memory Management. Other branches here are abandoned.
http://cgit.freedesktop.org/mesa/drm/
- DRM, also known as the kernel parts, manages the hardware and provides access and protection to user space. The "master" branch is the one to use, although there are very many other branches. There was also the branch "drm-gem" where initial development of the GEM API was done, it was merged into "master" in August. Nouveau KMS was prototyped in the branch "modesetting-101", which is now superseded by "modesetting-gem". Libdrm lives here, too.
http://cgit.freedesktop.org/nouveau/mesa/
- This is a new beast and off-limits to all but developers. This is a parallel Mesa repository for developing Nouveau Gallium3D support, i.e. what you all are eagerly waiting for: 3D acceleration. The "master" branch has long been untouched, but like in the upstream, there are the branches "gallium-0.1" and "gallium-0.2". In the upstream, i.e. the main Mesa repository, the "gallium-0.2" branch is used to prepare Gallium3D for merging it into "master" branch. Cleanups and possibly some interface changes are done there before the merge. Meanwhile, "gallium-0.1" is the branch where to develop drivers, and it is kept more stable.

Renouveau, Mmiotrace and User Contributed Dumps

We know that the list at http://people.freedesktop.org/~jpakkane/ren/ has not really been updated for some time, even though the web page is regenerated. The automatic script for fetching Renouveau dumps from gmail and putting them online broke, when freedesktop.org disabled all DSA keys due to the famous security bug in Debian's OpenSSL. No Renouveau dumps are lost, though, they are still available to the developers in gmail.

Sometimes people drop by and ask if they should make a Renouveau dump with their specific card. At this point we are not burning to get new dumps, but if you can make one, please do. We would prefer dumps for 9000-series and GTX200-based cards, so basically anything that is newer than 8000-series. On the other hand, mmiotraces from all card generations are warmly welcome. At the time of writing there are 165 emails in the mmiotrace dump gmail account, totaling in 1.5GB worth of compressed dumps.

Pmdata has improved Renouveau along the year, introducing an XML-based database of graphics commands for each card generation. Renouveau has gone through a major change due to that, now the dumping and interpreting the dump are two different steps. This means, that we do not need to dump again when a new command is identified. The command is added into the database, and existing dumps are reinterpreted. Pmdata has also written some more tests for Renouveau, and taking advantage of those requires new dumps.

After a long time, pq has managed to get Mmiotrace into the mainline kernel. The first version of in-tree Mmiotrace is in 2.6.27 and in 2.6.28 it will be fully functional as compared to the out-of-tree version. If you use these versions, use the documentation coming with the kernels. The instructions in the wiki are for out-of-tree versions of Mmiotrace. If you can choose, 2.6.28 is the preferred version.

Displays and Suspend

Malc0 has been working on perfecting the Randr 1.2 support and fixing bugs on the display mode setting front. He started the support for suspend, both suspend to disk and suspend to RAM. The current state on suspend is that there are fairly good chances of it working for pre-NV50 hardware, but NV50 and later cards will not work. Various cards from NV05 to NV40 have been reported to work.

To try suspend, you will need to patch the DDX. If you suspend to RAM, then on resume you need to POST the card by hand (or by scripts) before switching back to X. The suspend support is still a sort of a hack, and more kernel work is needed. In the far future, when kernel mode setting becomes reality, suspending should become clean and robust.

On other things, malc0 has changed the Randr 1.2 model on pre-NV50 cards to be connector based instead of being encoder based. (Think of connectors as physical connectors, and encoders as signal sources that are routed to connectors.) So now you configure which connector you want active, and it should also ease the migration to kernel mode setting world. Malc0 has also been tuning the video BIOS parser, which is essential for Randr 1.2 mode setting and initializing the card.

Kernel Mode Setting

Kernel mode setting (KMS) means moving the display mode setting from X server drivers into the kernel. This is a general development direction for Linux graphics, and some of the end-user visible effects will be flickerless boot, faster switching between virtual terminals and X sessions, high resolution virtual terminals, and the ability to see critical kernel messages even while running X. From the developer side, KMS should finally end the bloody battle between various kernel and user space drivers over the control the of graphics hardware, and make the design a lot cleaner and easier to understand.

Stillunknown has made a prototype implementation of KMS for his NV50 class card to make sure the KMS API is workable, before it is set in stone. The KMS work is lead by other projects than Nouveau, and indeed there were issues, which are hopefully fixed now. Stillunknown plans to get back to it when we have a memory manager, and also sends thanks to Luc Verhaegen for giving some good ideas about modesetting in general and the need for abstraction. This is what stillunknown himself has to say about things:

The first signs of KMS date from February 2008, and around April 2008 malc0 made his first attempt, mostly by copying DDX/userspace code and modifying it as needed. In all likelihood this is what will happen for NV04-NV4E. In May-June 2008, the decision was made that it's time to make sure the API is usable. It soon became apparent that the existing infrastructure followed the same ideas as Randr 1.2, which enforces a very strange driver design that does not suit most hardware. The first few attempts to convince people that something else must be done were not very successful. Fortunately, in early June some rather important changes were made. The cloned Randr 1.2 design was moved to so called helper functions and the driver could fill a single "setconfig" hook which did all mode setting. This allowed drivers to abstract away the hardware in whatever way they liked. The other major change was the adoption of connectors as the user visible objects. Before that it was an output object abused to carry connector properties. Most of the abuse went into the proper place, namely the connector. The rest (basically the link between a CRTC and a connector) went into the "encoder"._ That was enough to forget about making a second KMS API and proceed by writing a prototype for NV50 class hardware using the changed API. In the process some bugs were found, but nothing serious. Steps were also taken to standardize several connector properties, such as those that indicate and select which part of a DVI-I connector is running (there will obviously be a default autodetect mode). Another advantage of KMS is that all connector names and properties are enumerations, with no string-type names attached. This should allow for more consistent naming on the user side, and maybe even applications that "understand" what a property means, but that's all future talk. The KMS ordeal has illustrated a fairly typical "fight" between those who think that code should follow a well thought-out design versus just hacking away. To this day it seems that the few drivers that exist use the Randr 1.2 style helper functions, which indicates the intervention was necessary. The prototype for NV50 class hardware is not bad and will be finalized once a useful memory manager becomes available for Nouveau. Stillunknown has also taken a look at the modesetting equivalent of a fifo. It works much like a graphics fifo, just lacking multiple objects. He has looked at that several times, but it has proved to be a non-trivial effort getting it to work. Interesting registers, including the PUT and GET registers have been isolated, but the contents of some other registers remain a mystery, unfortunately. It seems that there even are more than one fifo available, quite how many is unknown. This functionality is not expected to work in the foreseeable future, but it is not a disaster, because the hardware offers indirect access via two MMIO registers. It is however possible that a functional tiled framebuffer depends on this fifo, but until proven otherwise this remains a guess (Note: The lack of a tiled framebuffer is the reason why a compositing manager is needed for NV50, the 3D engine cannot render to a linear buffer and the compositing manager ensures that windows have a backbuffer, which can be rendered to). For the moment it is very useful to know that an mmiotrace dump does not tell everything about modesetting, instead valgrind-mmt can be used to trace userspace fifos.

Gallium3D Progress

Some time ago there was the Mesa DRI driver model, where drivers were implemented directly between the OpenGL API and the hardware. This made the drivers big, complex and redundant. The Nouveau 3D driver was started for that old model, and it evolved fairly far in that most card generations were able to run at least glxgears. Then came the Gallium3D infrastructure, and the project was set back miles on 3D support. However, the Gallium3D model is far better than the old model. The user API, for instance OpenGL, is abstracted away and the drivers only need to implement a single core API, making the hardware drivers small and clean. Well, this statement is a simplification, but you can read about Gallium3D design at http://www.tungstengraphics.com/technologies/gallium3d.html .

As you might know, NV40 Gallium3D is currently the most advanced part of Nouveau, and some people claim to have played Quake 3 Arena with it. Do not jump for joy now, because the mantra still holds: Gallium3D is not supported yet, and we do not want bug reports about it. Of course, unless the bug report has a patch attached, that fixes the problem. Otherwise, trying to test it will likely lead to trouble, and we really do not want to waste the developer time or nerves on discussing something that is known to be broken. On the other hand, if you plan to contribute code, then come to talk with us already!

Pmdata has been developing NV30 Gallium3D after Marcheu started it. Pmdata followed what was happening with NV40 Gallium3D and made similar changes, because the Gallium3D APIs were still a bit in flux and somewhat uncharted territory. Pmdata records his progress on his personal wiki page, so you should check that for news and images: http://nouveau.freedesktop.org/wiki/PatriceMandin . Judging from the xmoto screencapture, geometry processing is working, and there are even textures, although the textures are swizzled when they should not be (or vice versa). Uploading properly swizzled textures has proven to be a little harder than he first thought. He has also made nv30_demo program, which pokes the card directly to try rendering commands.

Development on NV10 Gallium has been quiet for some time, and so has NV04. It is interesting to recall, that NV04-NV20 family range does not have real fragment shaders, and NV04-NV10 families do not have vertex shaders, but Gallium3D is built on the assumption that shaders do exist. Marcheu has investigated how these fixed pipeline cards could be used in Gallium3D, and it seems possible, but he has yet to make his mind about which approach is preferable. It could mean changes throughout the whole Gallium3D stack, or not.

Someone has yet to start NV20 Gallium3D work, there is currently nothing. It can be bootstrapped by copying in the NV10 Gallium bits and adding NV30 Gallium vertex program bits.

Darktama has started NV50 Gallium3D, but it does not do anything useful at all, yet. For instance, textures do not work. After doing NV50 2D work, he says he now has a much better view on how to implement things, and will get back to it when he can.

Marcheu has recently been working with LLVM (http://llvm.org/), that should optimize shader programs to the max. But, his work has been with x86 LLVM, specifically the SSE instruction set, and not with GPU instruction sets, for vertex programs. Why would one want to do vertex processing on the CPU? The answer is two-fold: early cards do not support vertex programs, or their fixed vertex pipelines are not worth the trouble to use, since modern CPUs do vertex processing faster. The trouble in using the fixed vertex pipelines is the adaptation of Gallium3D, so if that can be skipped, the better. CPU vertex processing is also required, when the vertex shader is so complex, that it cannot be realized on a GPU that does support vertex shaders. On the other hand, Gallium3D must be adapted to make use of fixed fragment pipelines, since there is no point trying to do that on the CPU, it would not be fast. Marcheu says the LLVM/x86-SSE vertex processing works, but the software fragment pipeline (a.k.a softpipe FP) he has to use at this point is a nest of bugs.

The major bottleneck in Nouveau's Gallium3D development is the lack of developer time. Granted, the current simple memory manager easily runs out of memory and falls over, but it is still enough to try and implement almost all 3D functionality.

Memory Management

The DRM has historically had a plethora of simple memory managers, and so does Nouveau have one currently. However, a full-featured memory manager is required for efficient use of resources, and previously this was supposed to be TTM (Translation Table Maps). Darktama had been working to that end, when Intel developers came up with GEM (Graphics Execution Manager) after trying for a year to get TTM going well enough on integrated Intel graphics. In the process the TTM user API was removed, and now Nouveau and also Radeon are going to use GEM. Actually, GEM is little more than just an API, and it needs a backend, so TTM will still be used internally. Darktama says, that not much work got wasted in the transition, since TTM is still around.

Darktama is practically our memory manager developer and he puts all the time he can afford into making Nouveau use GEM (and TTM). The "ng" branch in the DDX git repository is a part of his GEM playground, where he is trying things out and figuring out a proper design. Other parts of that playground are http://cgit.freedesktop.org/~darktama/drm/ branch "ng", which is based on the DRM branch "modesetting-gem", and nouveau/mesa repository's "gallium-0.2-ng". Darktama does not yet know, if that work will ever be merged into "master", or does it need to be rewritten after he has learnt what needs to be done. He has worked with NV40 and says that apart from rough edges, it is already working "fairly OK". Performance is not at the same level as with the simple memory manager in "master", but the new work should solve the various out-of-memory errors the current Gallium triggers. It also makes the 3D code interact well with the 2D code, in that moving a 3D application around on screen does not leave trashy trails. All in all, darktama is fairly happy with it, but it is definitely not ready to be merged into Nouveau mainline.

Fruits of The Google Summer of Code 2008

Ymanton has been working on video decoding, especially XvMC, via shader instructions creating the reference implementation with Gallium3D as his Google Summer of Code (GSoC) project. Ymanton comments:

GSoC has treated me very well, marcheu is an excellent mentor and the rest of the Nouveau contributors have also been a big help, the SoC project would not have gone as well as it did without all the good work that went into the project before I started. Since the summer is indeed over school has retaken most of ymanton's time, but he is still trying to determine why decoding is not as fast as expected. "Current consensus is that it's because we're using linear textures instead of tiled, so I've been trying to figure out how to get tiled textures and the related DMA functionality working." To do that we need to look at how we do DMA in Xv and the 2D driver and how we do it on NV50.

Ymanton wrote a simple OpenGL program that goes through the same steps as our decoding process and tested it on the blob. The blob "tears through 720p" at around 60 fps so we know the hardware (NV40) is capable of doing this.

It has been tested to not to crash on NV30, but without shader support it will not actually display anything. Ymanton does not have an NV50 card so he has not tested that, and he does not believe it is finished enough, yet.

Ymanton's Gallium3D video decoder currently gets 18-20 fps with the Nouveau driver on NV40. Xv is still a better option for now, as the CPU spikes because it does not get to 24 fps. 1080p has memory issues and will be worked on once 720p achieves reasonable performance.

Short Topics

Based on the DDX "ng" code, darktama has started http://cgit.freedesktop.org/~darktama/libnouveau_drm/ which is a library for factoring out all the common parts in the DDX and Nouveau Gallium3D code related to interfacing with the DRM and the hardware. Also all nvXX_demo programs are candidate users of this library.
Slowness with Nouveau, when not related to having software-only 3D rendering, can be due to having an old version of X.org server. The version 1.5.2 or later is recommended, and you can even try a git X server.
VT switching is nowadays more solid, but still relies on some assumptions that might not hold, especially when direct 3D acceleration comes into the picture. On NV50 cards and later, VT switching is still using real mode BIOS calls, which are not portable.
Nouveau's Randr 1.2 support is nearly perfect. Or it should be. If you find any problems with it, you should file a bug. Multi-card or multi-GPU setups on the other hand are not supported yet, and likely will not work even with luck. Another thing not yet supported at all is tv-out. Developers welcome :-)
NV50 class cards now have EXA acceleration and Xv texture adapter, thanks to darktama. The catch is, you need to use a compositing window manager to have those working. The reason for that was briefly mentioned in the end of the KMS section: our framebuffer is not tiled yet. This milestone brings NV50 2D-support feature-wise on par with all the older card families, and is only lacking testing by end-users, that is finding and fixing bugs.
Please, if and when you file bugs, acknowledge the fact that you need to spend some time helping the poor developers to solve your issues. It will involve trying things and testing patches. The developers do not have your setup and might not be able to reproduce the problem themselves. Also be precise when describing the problem and do attach full kernel and X.org logs to the bug. Thank you.
There are no releases planned for Nouveau yet. Before that can even be thought of, Nouveau's DRM interface must stabilize. That means KMS has to be done first, and the memory manager work must be practically completed. It is still a long way to go for even just a 2D-only release, but 3D-enabled release might not be too far when that point is reached. That's it for this time, folks! Thank you for your continued interest, and please, turn some stones and try to find us a couple more developers, will you? :-)

<<< Previous Issue | ?Next Issue >>>