The 22nd anniversary of Ariane 5 Flight 501 offers an opportunity to reflect upon software defects, project errors, and the best principles and practices for solution delivery in the IT industry. In this blog and my upcoming book, Bugs: A Short History of Software Imperfection, I will chronicle some important failures in the past, explain how we arrived at the present, and discuss some ideas for improving the future of software quality. As information technology becomes increasingly woven into Life, the quality of software impacts our commerce, health, infrastructure, military, politics, science, security, and transportation. The Big Idea is that we have no choice but to get better at delivering technology solutions because we have to.

On June 4, 1996 in Kourou, French Guiana, the maiden flight 501 of the Ariane 5 rocket ended almost as soon as it began. About 37 seconds after the initial launch sequence (30 seconds after takeoff), at an altitude of 4000 meters, the rocket deviated 90 degrees from its intended flight path due to a software failure, experienced severe aerodynamic stress tearing its boosters from the main stage, and thereby triggered a controlled self-destruction that culminated in the spacecraft exploding in a fireball of liquid hydrogen.

The European Space Agency (ESA) had ambitions to take a leadership role in the commercial space business and surpass Japan, Russia, and the USA. The Ariane 4 (A4) had been in service for more than 20 years and boasted an excellent record of more than 100 successful launches with no failures. The new Ariane 5 (A5) rocket would carry larger satellite payloads than earlier versions, and flight 501 was carrying a payload of four satellites intended for researching the Earth’s magnetosphere. ESA had spent 10 years and $7 billion developing the A5, and flight 501 itself cost $370 million. The success of the Ariane 4 and ESA budget pressures resulted in the reuse of A4 software by the A5 program team including its navigation system and flight path optimization libraries.

ESA organized an inquiry board immediately after the crash to investigate the disaster and using flight data, optical observations (IR camera, film), inspection of recovered material, and review of the software code, the board identified the following sequence of events that resulted in the crash.

Ariane 5 Flight 501 @ T+ 39 seconds

The inquiry board further analyzed the SRI software and overall A5 program and arrived at several conclusions:

The inquiry board made a number of recommendations, and they can be generalized into lessons learned from this case study that are useful to IT professionals.

The failure of the 501 highlighted risks with complex, costly computing systems to the general public, politicians, and business executives. It resulted in increased support for research on ensuring reliability of safety-critical systems. Automated analysis of the Ariane code written in Ada was one of the first examples of large scale static code analysis.

Afterwards, four replacement Cluster satellites were built and launched in pairs aboard Soyuz-U/Fregat rockets in 2000. The Ariane 5 program resumed, had dozens of successful launches and hundreds of satellite deployments, and is still active. The successor vehicle, the Ariane 6, is under development and there are plans to enter it into service in the 2020's.

References