Tutorial #1 - How to Use Wait-free Shared Memory Algorithms to Efficiently Program Multicore Chips
By: Prof. Matthieu Roy (LAAS-CNRS - Laboratory for Analysis and Architecture of Systems, Toulouse, France)
Talk Outline: Shared memory algorithms have been extensively studied in the distributed computing community for decades now. With the advent of multicore chips, many concepts and algorithms from this line of research, such as lock-free or wait-free algorithms and simulations, shed a particular light on how to produce efficient code for multicore, and where do synchronisation bottlenecks reside.
After a brief introduction to the limits of performance scaling for programs on multicore architectures, we will introduce the wait-free class of algorithms, describe its properties and some of its characteristics that span theoretical and practical aspects. The tutorial will be focused on how to construct efficient data structures and algorithms for resilient multi-core programming.
Short Bio: Matthieu Roy holds an MsC from ENS Lyon, and a PhD from IRISA/University of Rennes. He is CNRS researcher since 2004 in LAAS-CNRS (Laboratory for Analysis and Architecture of Systems) in Toulouse, in the Critical Systems department. His core research interests include theory of distributed systems, embedded systems and real-time systems. On the experimental and practical side, he has strong ties with automotive industry, particularly regarding adaptation in automotive systems, and has recently developed platforms to study human behavior in the context of human-carried distributed systems. More...
Tutorial #2 - Reliability and Availability Modeling in Practice
By: Prof. Kishor Trivedi (Duke University, USA)
Talk Outline: High reliability and availability is a requirement for most technical systems. Reliability and availability assurance methods based on probabilistic models is the topic addressed in this tutorial. Non-state-space solution methods are often used to solve models based on reliability block diagrams, fault trees and reliability graphs. Relatively efficient algorithms are known to handle systems with hundreds of components and have been implemented in many software packages. Nevertheless, many practical problems cannot be handled by such algorithms. Bounding algorithms are then used in such cases as was done for Boeing 787. Non-state-space methods derive their efficiency from the independence assumption that is often violated in practice. State space methods based on Markov chains, stochastic Petri nets, semi-Markov and Markov regenerative processes can be used to model various kinds of dependencies among system components. However, the resulting state space explosion severely restricts the size of the problem that can be solved. Hierarchical and fixed-point iterative methods provide a scalable alternative that combines the strengths of state space and non-state-space methods and have been extensively used to solve real-life problems. We will take a journey through these model types via interesting real-world examples that the tutorial presenter has personally worked on. Examples include the availability model of IBM BladeCenter and High Availability implementation of SIP (Session Initiation Protocol) on IBM WebSphere. Cisco example is the availability model of one of their routers while the SUN Microsystem example is the availability model of their high availability platform. Boeing example shows the reliability analysis of the Current Return Network subsystem that was used for the FAA Certification of Boeing 787. All the techniques and case studies are drawn from a recent book by the tutorial presenter.
Tutorial Paper: download
Short Bio: Kishor Trivedi heads Duke High Availability Assurance Laboratory (DHAAL) and holds the Hudson Chair in the Department of Electrical and Computer Engineering at Duke University. He is known as a leading international expert in the domain of reliability and performability evaluation of Dependable systems, and has made seminal contributions to stochastic modeling formalisms and their efficient solution. He is currently carrying out experimental research in software reliability during operation where he is researching software fault tolerance through environmental diversity. This work, including software bug classification, empirical study of real failure data and associated theory of affordable software fault tolerance, has already gained significant attention. More...