High-Level-Synthesis extensions for scalable Single-Chip Many-Accelerators on FPGAs
D. Diamantopoulos, S. Xydis, K. Siozios and D. Soudris
International Conference on Field-Programmable Logic and Applications (FPL)
, pp.1-2, Sept. 2015, London, England.
   Download Paper [BibTeX] [EndNote] [Plain]
Accelerator-coupled systems have been introduced sing architectural paradigm that can boost performance and improve power of general-purpose computing platforms. This research focuses on the accelerators’ scalability problem due to resource under-utilization in FPGA-based accelerator-coupled platforms. By recognizing that static memory allocation the de-facto memory management mechanism supported by modern design techniques and synthesis tools forms the main source of memory-induced under-utilization, i.e. leading up to 75% of dark silicon, we propose the development of a) a Single-Chip Many-Accelerator (SCMA) architecture that reduces energy budget by providing high-throughput processing nodes hooked under the same low-latency FPGA die and b) a novel design framework that extends conventional RTL and High Level Synthesis (HLS) design flows with dynamic memory management (DMM) features to leverage scalability by enabling accelerators to dynamically adapt their allocated memory to the runtime memory requirements, thus maximizing the overall accelerator count through effective sharing of FPGA’s memories resources. By applying these novel techniques in the state-of-art Vivado-HLS tool, we managed to increase accelerator density up to 3.8× for a Xilinx Ultrascale device and deliver architecture solutions that trade-off per-accelerator latency overhead (1.2×- 19.9×) with overall system’s throughput (2.6×- 23.1×) and performance-per-watt (0.09×- 21.7×).

Last Update: 09 October 2016