Multi-level Data Translocation for Faster Processing of Scattered Data on Shared-Memory Computers


Processing and analysis of scattered data samples, acquired or generated in various applications, typically involve iterative calculation of all-point interactions. Approximation or compression techniques to reduce the arithmetic complexity nearly minimize the ratio of arithmetic operations to memory accesses. However, the computation on scattered data becomes acutely memory bounded on computers with hierarchical memories, due to irregular memory access patterns. There is a great potential, and a daunting challenge, in further accelerating scattered data processing and analysis. We introduce a multi-level data translocation methodology, in adaptation to memory hierarchy, toward optimal data locality, minimal memory access latency, and maximal utilization of parallel resources and scheduling schemes. We illustrate the methodology with local data translation between scattered data points and the points on an externally specified or internally-determined grid. Such data translation is fundamental to approximation or compression-decompression of scatter data interaction. We make additional and novel use of the grid to improve data locality and explore parallelization potential. We present experimental results that show significant acceleration with our methodology.

SPPEXA-COLOC workshop, Euro-Par 19
Gottingen, Germany