HOME PUBLICATIONS EDUCATION RESEARCH PROJECTS SOFTWARE FUNNY CONTACT

A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis
 
E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos and D. Soudris
ACM Transactions on Embedded Computing Systems (TECS), Vol. 16, No. 1, Article 8, 26 pages, November 2016.
 
   Download Paper [BibTeX] [EndNote] [ACM Ref]  
Abstract:
Many-accelerator Systems-on-Chip (SoC) have recently emerged as a promising platform paradigm that combines parallelization with heterogeneity, in order to cover the increasing demands for high performance and energy efficiency. To exploit the full potential of many-accelerator systems, automated design verification and analysis frameworks are required, targeted to both computational and interconnection optimization. Accurate simulation of interconnection schemes should use real stimuli, which are produced from fully functional nodes, requiring the prototyping of the processing elements and memories of the many-accelerator system. In this article, we argue that the Hierarchical Network-on-Chip (HNoC) scheme forms a very promising solution for many-accelerator systems in terms of scalability and data-congestion minimization. We present a parameterizable SystemC prototyping framework for HNoCs, targeted to domain-specific many-accelerator systems. The framework supports the prototyping of processing elements, memory modules, and underlying interconnection infrastructure, while it provides an API for their easy integration to the HNoC. Finally, it enables holistic system simulation using real node data. Using as a case study a many-accelerator system of an MRI pipeline, an analysis on the proposed framework is presented to demonstrate the impact of the system parameters on the system. Through extensive experimental analysis, we show the superiority of HNoC schemes in comparison to typical interconnection architectures. Finally, we show that, adopting the proposed many-accelerator design flow, significant performance improvements are achieved, from 1.2× up to 26× , as compared to a x86 software implementation of the MRI pipeline.

Last update: 15 November 2017