Incident report - Node failure in L2-GNF

Here we will post network problems, Planned & Unplanned downtime as well as restoration times and other network issues.

Incident report - Node failure in L2-GNF

Postby matt » Thu Oct 12, 2017 3:12 pm

Root Cause

Failure of l2-gnf-global-1-b and restart of node.

Timeline

At 12:12 on Thursday the 12th Of October 2017 a 100 Gigabit switching node within the Spectrum Networks Layer 2 Global network Fabric locked up and stopped routing traffic.

As a result services within the fabric including internet and some wan services were interrupted.

At 12:15 The fabric marked the node as faulty and the system self healed to route around the failed node restoring most WAN and Point to point services.

Between 12:17 and 12:19 The core internet network reloaded bgp peers and normal services were restore at 12:19

At 12:45 The Failed nodes automatic watchdog detected the fault and restarted the node which re-configured the L2-GNF returning the node to service. This process is seamless and no interruption was seen during the re-conference.

Resolution

A case has been logged with Cumulus and logs of the cause of the node lockup have been identified. Further information to follow from cumulus.
User avatar
matt
Site Admin
 
Posts: 325
Joined: Thu Apr 09, 2009 11:44 am
Location: George Street Sydney

Re: Incident report - Node failure in L2-GNF

Postby matt » Thu Oct 19, 2017 8:45 am

We have isolated the cause of the outage to what appears to be caused by in insert of an SFP+ We are working with the vendor to identify and mitigate the exact cause of the outage.


Matt.
User avatar
matt
Site Admin
 
Posts: 325
Joined: Thu Apr 09, 2009 11:44 am
Location: George Street Sydney


Return to Service Availability & Announcements

Who is online

Users browsing this forum: No registered users and 1 guest

cron