MmioTrace information for developers and trace analysts.

How mmiotrace works inside

Kernel functions ioremap, ioremap_nocache and iounmap are replaced (for the driver module only) with wrappers to record MMIO areas. In ioremap the pages for the MMIO area are marked as not present, causing access to those addresses generate a page fault. In the page fault handler the mmio-traced addresses are detected and the attempted action recorded. The page is marked present and the page-faulting code is single-stepped to execute the instruction doing MMIO. Then, the page is marked again as not present.

The recording works by calling pre and post functions in mmio.ko before and after the single-stepping. Mmiotrace uses relayfs and debugfs to relay the data to user space.

Unfortunately the legacy ISA address range 0xa0000 - 0x100000 cannot be traced this way because marking those pages as not present crashes the kernel. There can also be machine instructions that are not decoded properly, but so far they have been rare enough.

An Alternative idea

While discussing about x86 instruction emulation, Avi Kivity proposed the following, quote:

  • However there is a simpler (for you) solution: run the driver-to-be-reverse-engineered in a kvm guest, and modify kvm userspace to log accesses to mmio regions. This requires the not-yet-merged pci passthrough support. You can reverse engineer Windows drivers with this as well. Reference: http://lkml.org/lkml/2008/4/5/13

Who would like to take that project?

Usage notes

  • You can inject markers (text lines) into the trace log by echo 'X is running' > /sys/kernel/debug/tracing/trace_marker
  • Only one active CPU is supported, do not use in multiprocessing system. You can disable extra processors/cores during boot time with a kernel argument or runtime via sysFS entries.
  • After tracing, check your kernel log for buffer overrun errors. If you have any, almost certainly some events were lost. You should redo the trace with bigger relay buffers, settable via /sys/kernel/debug/tracing/buffer_size_kb.
  • Low ISA range tracing, experimental: apply 0001-ioremap-do-not-handle-the-low-ISA-range-specially.patch to your kernel, and insmod mmio.ko ISA_trace=1 and let PekkaPaalanen know what happened. If you do not patch your kernel, your machine will crash, if the blob maps the ISA range. This is an untested feature.
  • Old Nvidia driver with 2.6.25 - To build the nvidia driver with 2.6.25 then you need the patches from nvnews, http://www.nvnews.net/vbulletin/showthread.php?t=110088. Another thing is that you will probably see messages like NVRM: bad caching on address XXXXXXX: actual 0x173 != expected 0x17b in dmesg. This didn't happen with 2.6.25-rc6 but it now exists in 2.6.25-rc7 because of commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d546b67a940eb42a99f56b86c5cd8d47c8348c2a.

rnndb connection

rnndb is a database definition describing hardware registers and includes a set of tools for using the database. One of the tools is demmio. This tool is designed to parse mmiotraces, detect which chipset is used, adjust all addresses, etc. It will look up all MMIO register accesses in its database and show "friendly" names for them.

To do, suggestions and known issues

  • kprobes has a generic instruction decoding facility, use that instead of homebrewn (or KVM), and use emulation instead of page faulting
  • kmemcheck may grow per-cpu page table support thanks to the PaX team, copy that
  • copy other useful tricks from kmemcheck, like P4 REP issue fix. "< vegard> you need to toggle a bit in IA32_MISC_MSR or something like that" Test Vegard's patch: 0001-x86-fix-REP-handling-for-mmiotrace.patch
  • support large pages
  • complete instruction support: get rid of "unknown type"
  • support tracing access from user space
  • event filtering based on device and BAR, maybe address ranges
  • Changes to the log format:
    • backtraces
    • cpu identifier?
  • PPC support?
  • think about how to trace ISA region; David suggests that commenting out the ISA region checks in arch/i386/mm/ioremap.c would be enough to enable ISA range tracing. Is it really so? Does it break something?