Quis custodiet ipsos custodes?
Posted by qualityofservice on September 2, 2008
I’m playing around with a lot of bleeding edge IOS releases lately, with my newest trick based on Embedded Event Manager documentation from Cisco’s mgmt configuration guide: http://www.cisco.com/en/US/docs/ios/netmgmt/configuration/guide/nm_erm_resource_ps6441_TSD_Products_Configuration_Guide_Chapter.html
Ever want to immediately pinpoint a transient CPU spike? Consistently high utilization is one thing, but quick spikes are less obvious and don’t tend to show up on graphs averaged over 5-minute intervalic measurements. The following is employed on an 1841, using IOS 12.4(15)T7:
resource policy
policy HighCPU type iosprocess
system
cpu process
critical rising 40 falling 25
major rising 20 falling 10
!
!
!
user group Hogger type iosprocess
instance “IP Input”
policy HighCPU
!
snmp-server enable traps resource-policy
The (utilization percentage-based) numbers are arbitrary, but I’ll use those in production based on the fact that our resources are heavily oversized for the kind of work they have to do. To test it out, I hammered the router’s control plane with 100 Mbps worth of ICMP to the router’s interface. Here’s what immediately happened:
000247: *Sep 2 2008 16:38:28.991 UTC: %SYS-4-CPURESRISING: Resource group Hogger is seeing local cpu util 25% at process level more than the configured major limit 20 %
Then immediately after I stopped:
000278: *Sep 2 2008 16:39:33.971 UTC: %SYS-6-CPURESFALLING: Resource group Hogger is no longer seeing local high cpu at process level for the configured major limit 10%, current value 0%
EEM is phenomenal; it basically lets your router monitor itself. The above will generate a warning and informational-level syslog message for alert triggering and reset, respectively.
Next on the list is Control-Plane Policing, or how to mitigate the effect of someone trying to blast your router’s interfaces just like I did for the purposes of this test (which isn’t to say that a sufficiently motivated user couldn’t simply point the firehose at your link to fill it full of DoS, but that’s a seperate issue best dealt with controls elsewhere). =P
Trolan said
So the switch ran into Hogger and died?
qualityofservice said
It’s only an 1841, after all. That’s like sending a Catalyst 2950 to do battle with an XMR lololol.
God I hate myself.