Incident Report - Network Outage

Here we will post network problems, Planned & Unplanned downtime as well as restoration times and other network issues.

Incident Report - Network Outage

Postby matt » Thu May 29, 2014 8:27 pm

Incident Report

On the 29th of May 2014 at 16:37 some interface's on the router edge-george-1 were disabled, isolating the router's supervisor and it's redundant pair. As a result a wide spread outage on a number of voice and IP switched data services across the Spectrum Network occurred.

The interfaces were disabled by an incorrect application of an access list during a configuration change for the provisioning of a new customers IP address range. Spectrum Staff were alerted to the error instantly and the system was accessed via an out of band signaling procedure as documented in Spectrum emergency procedures and rolled back the change. Restoring the network at 16:40

The error was ultimately traced to a typographical error within the access-list. Spectrum staff will implement a new procedure for further changes on this system that will prevent a re-occurrence of this outage.
User avatar
matt
Site Admin
 
Posts: 325
Joined: Thu Apr 09, 2009 11:44 am
Location: George Street Sydney

Re: Incident Report - Network Outage

Postby matt » Fri Oct 17, 2014 4:08 pm

A network hazard will be in effect from 18:00 - 18:03 to remove the failed line card in edge-george-1.

We do not expect any problems or loss of connectivity during this hazard.

Matt
User avatar
matt
Site Admin
 
Posts: 325
Joined: Thu Apr 09, 2009 11:44 am
Location: George Street Sydney

Re: Incident Report - Network Outage

Postby matt » Mon Oct 20, 2014 7:41 am

Following the removal of the card and reset of the Router FIB, normal CPU load and normal function of edge-george-1 was restored

An analysis of the CPU load and packet forwarding data while edge-george-1 was process switching data concluded that transit times and loss was within the SLA.

There had been no previous outages on this system which was commissioned on October 7th 2010. Time from first fault report to restoration was 48 minutes also within the SLA.

We apologize for the inconvenience. Spectrum spend all our time striving to bring you the highest standard of personal service and connectivity. When outages occur we do our best to insure there are no re-occurrences. The Spectrum network is a highly complex network with lots of moving parts and from time to time, even with the most care and attention to detail unexpected problems occur. In this case the router did not fail in a way that was easily detectable and as such did not switch to a redundant system. We have introduced a system to detect this type of failure in a hope to pin point this type of fault quicker.

Your understanding is appreciated.

Matt
User avatar
matt
Site Admin
 
Posts: 325
Joined: Thu Apr 09, 2009 11:44 am
Location: George Street Sydney


Return to Service Availability & Announcements

Who is online

Users browsing this forum: No registered users and 1 guest

cron