Incident report - edge-george-1 Network outage

Here we will post network problems, Planned & Unplanned downtime as well as restoration times and other network issues.

Incident report - edge-george-1 Network outage

Postby matt » Fri Oct 17, 2014 12:57 pm

Incident report - edge-george-1

At 03:48 AM on October 16th an unused line card in edge-george-1 (slot 12) reported a fatal error on one of Spectrum's primary edge router's edge-george-1 This router handles many users traffic as well as some voice and co-location services in the Sydney Data Center in George Street.

The router attempted to reset the line card at 4:08, 4:41 and then again at 5:01 at which time the line card appeared to cause spurious bus errors on edge-george-1 which lasted to approximately 05:05 when the router shut the slot down completely. At that point the card was powered off. It appears however that the numerous bus errors caused instability across many services on edge-george-1.

At 8:30am a customer on another line card (slot 0) of edge-george-1 reported problems with packets exceeding 1k being dropped to the destination. Spectrum diagnosed that all customers on that line card also had the same error. Spectrum staff began moving customers on the effected line card to a spare line card which was completed approximately 09:45. All customers on the effected line card were restored by 09:55

At approximately 10:30 automated systems detected high cpu load on edge-george-1. Spectrum staff diagnosed the problem to be caused by a malfunction of the FIB that fast switches traffic on the router. As a result the router appears to be process switching all traffic causing a resource deficit within the router.

At approximately 11:30 and 13:30 during peek load some higher then normal packet loss and latency is detected on packets transiting edge-george-1 caused by the resource deficit. Poor voice quality was observed for some of the customer voice transiting the router.

A plan is being formulated to restore full line card FIB functionality which will abate the resource deficit during a planned Hazard. Information to follow.

Matt.
User avatar
matt
Site Admin
 
Posts: 325
Joined: Thu Apr 09, 2009 11:44 am
Location: George Street Sydney

Return to Service Availability & Announcements

Who is online

Users browsing this forum: No registered users and 1 guest

cron