大葉大學圖書館 |

Fault-tolerance techniques for high-performance computing[electronic resource] /

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
杜威分類號:	004.2
書名/作者:	Fault-tolerance techniques for high-performance computing/ edited by Thomas Herault, Yves Robert.
其他作者:	Herault, Thomas.
出版者:	Cham : : Springer International Publishing :, 2015.
面頁冊數:	ix, 320 p. : : ill., digital ;; 24 cm.
Contained By:	Springer eBooks
標題:	Fault-tolerant computing.
標題:	High performance computing.
標題:	Computer Science.
標題:	System Performance and Evaluation.
標題:	Performance and Reliability.
標題:	Numeric Computing.
ISBN:	9783319209432 (electronic bk.)
ISBN:	9783319209425 (paper)
內容註:	Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
摘要、提要註:	This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC) The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Superieure de Lyon, France, and a Visiting Research Scholar in the ICL.
電子資源:	http://dx.doi.org/10.1007/978-3-319-20943-2

Fault-tolerance techniques for high-performance computing[electronic resource] /
Fault-tolerance techniques for high-performance computing[electronic resource] /edited by Thomas Herault, Yves Robert. - Cham :Springer International Publishing :2015. - ix, 320 p. :ill., digital ;24 cm. - Computer communications and networks,1617-7975. - Computer communications and networks..

Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.

This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC) The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Superieure de Lyon, France, and a Visiting Research Scholar in the ICL.

ISBN: 9783319209432 (electronic bk.)

Standard No.: 10.1007/978-3-319-20943-2doiSubjects--Topical Terms:

342378
Fault-tolerant computing.

LC Class. No.: QA76.9.F38

Dewey Class. No.: 004.2

Fault-tolerance techniques for high-performance computing[electronic resource] /
LDR:03089nam a2200325 a 4500 001 442930
003 DE-He213
005 20160223100031.0
006 m d
007 cr nn 008maaau
008 160715s2015 gw s 0 eng d
020 $a 9783319209432 (electronic bk.)
020 $a 9783319209425 (paper)
024 7 $a 10.1007/978-3-319-20943-2 $2 doi
035 $a 978-3-319-20943-2
040 $a GP $c GP
041 0 $a eng
050 4 $a QA76.9.F38
072 7 $a UYD $2 bicssc
072 7 $a COM074000 $2 bisacsh
082 0 4 $a 004.2 $2 23
090 $a QA76.9.F38 $b F263 2015
245 0 0 $a Fault-tolerance techniques for high-performance computing $h [electronic resource] / $c edited by Thomas Herault, Yves Robert.
260 $a Cham : $b Springer International Publishing : $b Imprint: Springer, $c 2015.
300 $a ix, 320 p. : $b ill., digital ; $c 24 cm.
490 1 $a Computer communications and networks, $x 1617-7975
505 0 $a Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
520 $a This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC) The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Superieure de Lyon, France, and a Visiting Research Scholar in the ICL.
650 0 $a Fault-tolerant computing. $3 342378
650 0 $a High performance computing. $3 386197
650 1 4 $a Computer Science. $3 423143
650 2 4 $a System Performance and Evaluation. $3 466941
650 2 4 $a Performance and Reliability. $3 466950
650 2 4 $a Numeric Computing. $3 466954
700 1 $a Herault, Thomas. $3 633177
700 1 $a Robert, Yves. $3 633178
710 2 $a SpringerLink (Online service) $3 463450
773 0 $t Springer eBooks
830 0 $a Computer communications and networks. $3 468098
856 4 0 $u http://dx.doi.org/10.1007/978-3-319-20943-2
950 $a Computer Science (Springer-11645)