August bank holiday mayhem blamed on lack of resilience and working from home. Ian Taylor reports
The interim report into the shutdown of the National Air Traffic Services (Nats) IT system last August Bank Holiday has attributed the scale of the disruption to engineers working from home.
The panel commissioned by the CAA to examine the failure which led to flight cancellations or delays affecting 700,000 passengers reported the problem proved “more protracted than it might otherwise have been” due to senior engineers not being available on site over the bank holiday.
It also noted a failure “to rehearse” for such an incident.
The ‘progress report’ published on March 14 found it took an engineer 90 minutes to arrive on site, the most senior engineer on duty was not called for more than three hours and Nats waited four hours to call the software supplier for help.
The issue arose when Nats’ systems could not process a flight plan for a Los Angeles to Paris service which included identical abbreviations for two ‘waypoints’ on its route – Deauville in France and Devil’s Lake, North Dakota, both designated DVL.
Both Nats’ primary and secondary systems shut down as designed to do within 20 seconds of each other to prevent transfer of the ‘corrupt’ data.
But the problem flight plan remained in the queue for processing so each time the systems were restarted they immediately shut down again. This cycle only ended “with the assistance of [the] system supplier four hours after the event”.
The Airlines UK association declared the report “damning evidence that Nats’ basic resilience planning and procedures were wholly inadequate”.
However, the report also highlighted poor treatment of passengers by airlines, noting it’s “startling that an air traffic control problem which was fixed within seven hours caused so many cancellations and delays . . . for so long”.
The report found “a good deal of dissatisfaction with the speed, style and effectiveness of [Nats’] communications”, including delays in warnings and limited explanation of what was happening and how long the problem was likely to last, and it suggested: “This resulted in more severe impacts on passengers than was necessary.”
But it also found “evidence of some very poor examples of consumer care” and noted: “Some passengers were not repatriated until the end of the week.”
The CAA estimates more than 300,000 passengers suffered flight cancellations, 95,000 delays of more than three hours and 300,000 delays of up to three hours, with “the worst affected those already in airports”.
The report notes: “Many complained about the shortage of visible and informed staff at airports, and the absence of any clear airport announcements. There also seems to have been some misinformation about passenger rights.”
It concludes “several factors” delayed “identification and rectification of the failure”, including the fact that there was “no single post-holder with accountability for overall management of the incident”, and there was “a lack of clear documentation identifying system connectivity”.
Crucially, the review panel found it was “common practice on public holidays for staff to be available on standby” when “major operations, such as a full system restart, cannot be performed remotely”.
The report also identified “a significant lack of pre-planning and coordination for major events and incidents”.
It noted: “There does not appear to have been any multi-agency rehearsal of the management of an incident of this nature and scale.”
Although it suggests a recurrence is “highly unlikely”, it warned: “A different set of factors could create a similar scenario without improvements to resilience planning.”
The final report due out later this year will “consider how well” CAA guidance on the treatment of passengers was followed by airlines and airports. It is expected to recommend greater oversight of Nats by the CAA.