Samuel Thibault - StarPU: seamless computations among CPUs and GPUs

Heterogeneous accelerator-based parallel machines, featuring manycore CPUs and with GPU accelerators, provide an unprecedented amount of processing power per node, shipped in a very complex machinery. To fully tap into the potential of such machines, one has to deal with optimized task scheduling, multiple accelerators, overlapped data transfers, unused data eviction,... Achieving all that by hand becomes harder and harder, and directive-based languages are often too explicit to really achieve optimized execution. This talk will show how introducing a dynamic run-time systems allows to achieve such optimized execution, under the guidance of the application, through a task-based programming model. It will also discuss what kind of low-level support is useful to be able to optimize execution, and how such a run-time can be integrated as a backend for high-level languages which thus do not have to care about low-level execution.

Slides Video