Root Cause
Failure of l2-gnf-global-1-b and restart of node.
Timeline
At 12:12 on Thursday the 12th Of October 2017 a 100 Gigabit switching node within the Spectrum Networks Layer 2 Global network Fabric locked up and stopped routing traffic.
As a result services within the fabric including internet and some wan services were interrupted.
At 12:15 The fabric marked the node as faulty and the system self healed to route around the failed node restoring most WAN and Point to point services.
Between 12:17 and 12:19 The core internet network reloaded bgp peers and normal services were restore at 12:19
At 12:45 The Failed nodes automatic watchdog detected the fault and restarted the node which re-configured the L2-GNF returning the node to service. This process is seamless and no interruption was seen during the re-conference.
Resolution
A case has been logged with Cumulus and logs of the cause of the node lockup have been identified. Further information to follow from cumulus.