Conference paperJAROŠ Jiří and TYRALA Radek. GPU-accelerated Evolutionary Design of the Complete Exchange Communication on Wormhole Networks. In: GECCO '14 Proceedings of the 2014 conference on Genetic and evolutionary computation. New York, NY: Association for Computing Machinery, 2014, pp. 1023-1030. ISBN 978-1-4503-2662-9. Available from: http://dl.acm.org/citation.cfm?id=2576768.2598315 | Publication language: | english |
---|
Original title: | GPU-accelerated Evolutionary Design of the Complete Exchange Communication on Wormhole Networks |
---|
Title (cs): | Akcelerace evolučního návhru kolektivní komunikace Compelte Echange pomocí GPU |
---|
Pages: | 1023-1030 |
---|
Proceedings: | GECCO '14 Proceedings of the 2014 conference on Genetic and evolutionary computation |
---|
Conference: | Genetic and Evolutionary Computations Conference 2014 |
---|
Place: | New York, NY, US |
---|
Year: | 2014 |
---|
URL: | http://dl.acm.org/citation.cfm?id=2576768.2598315 |
---|
ISBN: | 978-1-4503-2662-9 |
---|
DOI: | 10.1145/2576768.2598315 |
---|
Publisher: | Association for Computing Machinery |
---|
Files: | |
---|
| Keywords |
---|
Complete exchange communication, Collective communications, communication scheduling, evolutionary design, GPU-based acceleration, multi-GPU systems. |
Annotation |
---|
The communication overhead is one of the main challenges in the exascale era, where millions of compute cores are expected to collaborate on solving complex jobs. However, many algorithms will not scale since they require complex global communication and synchronisation. In order to perform the communication as fast as possible, contentions, blocking and deadlock must be avoided. Recently, we have developed an evolutionary tool producing fast and safe communication schedules reaching the lower bound of the theoretical time complexity. Unfortunately, the execution time associated with the evolution process raises up to tens of hours, even when being run on a multi-core processor. In this paper, we propose a revised implementation accelerated by a single Graphic Processing Unit (GPU) delivering speed-up of 5 compared to a quad-core CPU. Subsequently, we introduce an extended version employing up to 8 GPUs in a shared memory environment offering a speed-up of almost 30. This significantly extends the range of interconnection topologies we can cover.
|
BibTeX: |
---|
@INPROCEEDINGS{
author = {Ji{\v{r}}{\'{i}} Jaro{\v{s}} and Radek Tyrala},
title = {GPU-accelerated Evolutionary Design of the
Complete Exchange Communication on Wormhole
Networks},
pages = {1023--1030},
booktitle = {GECCO '14 Proceedings of the 2014 conference on Genetic and
evolutionary computation},
year = {2014},
location = {New York, NY, US},
publisher = {Association for Computing Machinery},
ISBN = {978-1-4503-2662-9},
doi = {10.1145/2576768.2598315},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=10523}
} |
|