Browse wiki
From International Center for Computational Logic
The widely used Message Passing Interface … The widely used Message Passing Interface (MPI) with its multitude of communication functions is prone to usage errors. Runtime error detection tools aid in the removal of these errors. We develop MUST as one such tool that provides a wide variety of automatic correctness checks. Its correctness checks can be run in a distributed mode, except for its deadlock detection. This limitation applies to a wide range of tools that either use centralized detection algorithms or a timeout approach. In order to provide scalable and distributed deadlock detection with detailed insight into deadlock situations, we propose a model for MPI blocking conditions that we use to formulate a distributed algorithm. This algorithm implements scalable MPI deadlock detection in MUST. Stress tests at up to 4,096 processes demonstrate the scalability of our approach. Finally, overhead results for a complex benchmark suite demonstrate an average runtime increase of 34% at 2,048 processes.untime increase of 34% at 2,048 processes. +
@inproceedings{HSNPBM2013,
author = {Tobias Hilbrich and Bronis R. de Supinski and Wolfgang E. Nagel
and Joachim Protze and Christel Baier and Matthias S.
M{\"{u}}ller},
title = {Distributed wait state tracking for runtime {MPI} deadlock
detection},
booktitle = {Proc. of the International Conference for High Performance
Computing, Networking, Storage and Analysis (SC)},
publisher = {ACM},
year = {2013},
pages = {16:1--12},
doi = {10.1145/2503210.2503237}
}
author = {Tobias Hilbrich and Bronis R. de Supinski and Wolfgang E. Nagel
and Joachim Protze and Christel Baier and Matthias S.
M{\"{u}}ller},
title = {Distributed wait state tracking for runtime {MPI} deadlock
detection},
booktitle = {Proc. of the International Conference for High Performance
Computing, Networking, Storage and Analysis (SC)},
publisher = {ACM},
year = {2013},
pages = {16:1--12},
doi = {10.1145/2503210.2503237}
}
Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC) +
Hilbrich +
Tobias +
Tobias Hilbrich, Bronis R. de Supinski, Wo … Tobias Hilbrich, Bronis R. de Supinski, Wolfgang E. Nagel, Joachim Protze, Christel Baier, Matthias S. Müller<br/> '''[[Inproceedings1891207442|<b>Distributed wait state tracking for runtime MPI deadlock detection</b>]]''' <br/>__NOTOC__<i>Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)</i>, 16:1--12, 2013. ACM<br/><span class="fas fa-chevron-right" style="font-size: 85%;" ></span> [[Inproceedings1891207442|Details]]edings1891207442|Details]] +
Tobias Hilbrich, Bronis R. de Supinski, Wo … Tobias Hilbrich, Bronis R. de Supinski, Wolfgang E. Nagel, Joachim Protze, Christel Baier, Matthias S. Müller<br/> '''[[Inproceedings1891207442/en|<b>Distributed wait state tracking for runtime MPI deadlock detection</b>]]''' <br/>__NOTOC__<i>Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)</i>, 16:1--12, 2013. ACM<br/><span class="fas fa-chevron-right" style="font-size: 85%;" ></span> [[Inproceedings1891207442|Details]]edings1891207442|Details]] +
Display title of"Display title of" is a predefined property that can assign a distinct display title to an entity and is provided by <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Help:Special_properties">Semantic MediaWiki</a>.
Distributed wait state tracking for runtime MPI deadlock detection +
Modification date"Zuletzt geändert <span style="font-size:small;">(Modification date)</span>" is a predefined property that corresponds to the date of the last modification of a subject and is provided by <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Help:Special_properties">Semantic MediaWiki</a>.
5. März 2025, 13:41:20 +
Has query"Hat Abfrage <span style="font-size:small;">(Has query)</span>" is a predefined property that represents meta information (in form of a <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Subobject">subobject</a>) about individual queries and is provided by <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Help:Special_properties">Semantic MediaWiki</a>.
Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection +, Distributed wait state tracking for runtime MPI deadlock detection + and Distributed wait state tracking for runtime MPI deadlock detection +