Quality of Service

Any sufficiently advanced incompetence is indistinguishable from malice.

Archive for the ‘Management’ Category

NetFlow

Posted by qualityofservice on June 20, 2009

NetFlow is a Cisco proprietary standard, soon to become (if it’s not already) an international standard in the form of IPFIX (http://www.ietf.org/html.charters/ipfix-charter.html). It tracks flows ingress into an interface and does accounting based on source/dest IP/port, TOS, originating autonomous system, and all manner of other cool things.  This info can be exported to central collectors which can store the data in a DB and mangle it as they see fit.

NetFlow is supported on any and all recent IOS routers (read as: 1800/2800/3800 ISR series, 7200/7600, etc).  Alas, no support on Catalyst dumb Layer-2 and multilayer switches outside of the 4500/6500 line, and even then it requires special hardware in the form of proper line cards/Supervisor Engine(s).

However, you can do a poor-man’s NetFlow by building a “probe” that accepts mirrored traffic from a SPAN port on a switch, and crafts its own NetFlow data from the observed traffic (see also: nTop).  Your mileage may vary depending on your IOS version; this note’s test router uses 12.4(15)T7.

You don’t need a collector to get some use out of the feature, though; it maintains a local cache and that’s what this note’s going to be about.  Quite easy to turn on:

interface FastEthernet0/1
ip address x.x.x.x y.y.y.y
ip flow ingress

Verification:

TEST-VPN-Hub-01#sho ip flow interface
FastEthernet0/1
ip flow ingress

Then turn on the top-talkers feature:

TEST-VPN-Hub-01#conf t
TEST-VPN-Hub-01(config)#ip flow-top-talkers
TEST-VPN-Hub-01(config-flow-top-talkers)#top 100

Then we get the option of viewing un-aggregated cache data, or aggregated cache data:

TEST-VPN-Hub-01#sho ip flow top-talkers ?

Display aggregated top talkers:
<1-100>  Number of aggregated top talkers to show

Display unaggregated top flows:
verbose  Display extra information about unaggregated top flows
|        Output modifiers

Un-aggregated provides a very granular view of flows stored in cache; one flow per source/dest IP/port and IP Protocol number (with protocol number and src/dst ports reported in very obnoxious hex), and by default sorted by bytes ingress to the interface:

TEST-VPN-Hub-01#sho ip flow top-talkers

SrcIf         SrcIPaddress    DstIf         DstIPaddress    Pr SrcP DstP Bytes
Tu110232      10.0.30.63      Fa0/1         192.168.141.81  06 170C 88ED  2074K
Tu110232      10.0.30.40      Fa0/1         192.168.141.71  06 0FD9 F727  1519K
Tu110232      10.0.30.140     Fa0/1         192.168.141.144 06 0BFE DB22  1275K
Tu110232      10.0.30.140     Fa0/1         10.1.250.81     06 0BFE B70E  1243K
Tu110232      10.0.30.140     Fa0/1         10.1.250.81     06 0BFE B70F  1242K
Tu110232      10.0.30.62      Fa0/1         192.168.141.80  06 170C 83ED   532K
Tu110232      10.0.30.140     Fa0/1         10.1.250.81     06 0BFE A204   340K
Tu110232      10.0.30.140     Fa0/1         192.168.141.144 06 0BFE D3D8   251K
Fa0/1         192.168.141.81  Tu110232      10.0.30.63      06 88ED 170C    69K
Fa0/1         192.168.141.80  Tu110232      10.0.30.62      06 83ED 170C    60K
Fa0/1         192.168.141.144 Tu110232      10.0.30.140     06 D3D8 0BFE    38K

Useful if you have a source that’s just pounding away; you can easily see where it’s coming from (and the interface through which it enters) and where it’s going (and the interface through which it leaves).

Aggregated view allows you to aggregate the NetFlow data a whole bunch of different ways (I’ve cut a bunch of ways out for sake of brevity):

TEST-VPN-Hub-01#sho ip flow top-talkers 100 aggregate ?
bytes                  number of bytes
destination-address    Destination address
destination-interface  Destination interface
destination-port       Destination port
icmp                   ICMP type and code
ip-nexthop-address     IP nexthop address
max-packet-length      Maximum packet length
min-packet-length      Minimum packet length
packets                number of packets
source-address         Source address
source-interface       Source interface
source-port            Source port
tcp-flags              TCP flags

What follows are ways to find the hot destination ports from your router’s point of view:

TEST-VPN-Hub-01#sho ip flow top-talkers 100 aggregate destination-port sorted-by packets

There are 20 top talkers:

TRNS DST PORT       bytes        pkts       flows
=============  ==========  ==========  ==========
35053     1638362        8922           1
54232     1462512        4017           1
33773      861529        3757           1
63271     1161960        2904           1
56098      950000        2609           1
46862      916876        2518           1
46863      916472        2516           1
5900      110858        2226           2
0      688278        1030          13
2048      658800         549           1
3070       12480         312           5
4056        3492          70           1
4057        2680          67           1
57556        6804          67           1
41476       15288          42           1
3092        2860          35           3
161        2556          35           3

Note that “Port 0” shows up in the above; I believe this may be related to packet fragmentation.  Non-initial fragments will not contain a transport-layer header; rather, they’ll simply have more transport-layer payload.  NetFlow can relate such a packet to a particular transport-layer protocol on account of the IP Protocol field of the IP packet (6 = UDP, 17 = TCP), but that’s as good as it can do without reassembling the entire packet.

Mind you, the traffic could also be IPSEC, which uses IP Protocol 50 or 51 for AH or ESP, respectively, and does not have port numbers for NetFlow to count.  This test bed was also running EIGRP and GRE tunnels; this traffic may have also been counted as “Port 0” traffic.

And to see some equally hot source hosts:

TEST-VPN-Hub-01#sho ip flow top-talkers 100 aggregate source-add sorted-by packets

There are 25 top talkers:

IPV4 SRC ADDR         bytes        pkts       flows
===============  ==========  ==========  ==========
10.0.30.63          1758749        9609           1
10.0.30.140         3161180        8681           5
10.0.30.62           996875        4319           1
10.0.30.40          1266040        3226           5
192.168.141.80       121738        2444           1
10.1.250.81           35960         899           3
192.168.139.66       990000         825           1
192.168.139.129      988800         824           1
192.168.141.144       24640         616           2
192.168.141.81        22520         451           2
192.168.141.71        12372         309           2
192.168.191.234       19008         288           1
192.168.191.242        9900         150           1
192.168.141.70         3944          81           2
192.168.141.66         3360          56           1
192.168.141.65         3300          55           1
192.168.191.238        2508          38           1
192.168.141.70         1680          28           1
192.168.141.76         1680          28           1
192.168.141.75         1680          28           1
192.168.141.71         1620          27           1
192.168.141.72         1620          27           1
192.168.191.230        1650          25           1
10.1.40.169              72           1           1

The command “show ip cache flow” also produces interesting results, including timers associated with the flow cache.

TEST-VPN-Hub-01#sho ip cache flow

IP packet size distribution (26090 total packets):
1-32   64   96  128  160  192  224  256  288  320  352  384  416  448  480
.001 .500 .155 .007 .005 .005 .006 .007 .007 .007 .006 .204 .045 .004 .004

512  544  576 1024 1536 2048 2560 3072 3584 4096 4608
.003 .002 .002 .012 .008 .000 .000 .000 .000 .000 .000

IP Flow Switching Cache, 278544 bytes
60 active, 4036 inactive, 675 added
29520 ager polls, 0 flow alloc failures
Active flows timeout in 30 minutes
Inactive flows timeout in 15 seconds
IP Sub Flow Cache, 25800 bytes
0 active, 1024 inactive, 0 added, 0 added to flow
0 alloc failures, 0 force free
1 chunk, 0 chunks added
last clearing of statistics 00:07:23

Protocol         Total    Flows   Packets Bytes  Packets Active(Sec) Idle(Sec)
——–         Flows     /Sec     /Flow  /Pkt     /Sec     /Flow     /Flow
TCP-other          338      0.7        29   155     22.6      18.6       9.6
UDP-NTP             37      0.0         1    76      0.0       0.0      15.4
UDP-other          184      0.4        13    75      5.8       6.2      15.5
ICMP               100      0.2         2   757      0.5       0.5      15.6
Total:             659      1.4        19   151     29.1      11.4      12.5

From the above output, you can see that flows will age out of the cache 15 seconds after data associated with the flow stops flowing.  You can test this by pinging something through the router (in my tests, locally-originated ICMP traffic was not counted by NetFlow, but there’s a chance I may have just been doing it wrong), and filtering the output of “show ip flow top-talkers” or “show ip cache flow”, until there’s been enough transferred data associated with the flow for it to work its way into the cache.

Then stop the ping.  15 seconds later, the flow won’t be there anymore; so by definition, flows that have accumulated a lot of traffic have been active for a very, very long time.  This technique is incredibly handy for tracking DoS activity; if you’re able to log into a terminal, you can work backwards to find the source address and input interface of potential DoS’ers, misbehaving hosts, etc.  Taken to its logical conclusion – assuming cooperation with a supportive and clueful ISP — you can even trace a spoofed IP address back to its real source. How this would be accomplished is left as an exercise for the reader.

There’s also a packet-size histogram; from the above, you can deduce that 50% of the packets transiting the router are between 32-64 bytes; 15.5% are between 64-96 bytes; and 20% are between 352-384 bytes.

Over at $dayJob, I use http://www.plixer.com/products/free-netflow.php to keep track of a day’s worth of NetFlow data; for a free tool, it’s incredible for providing point-in-time analysis of application use on my network.  As they say,  in network analysis, there is no substitute for knowing your network.  While longer-term analysis would be ideal, I don’t have long-term enterprise NetFlow collection in my budget, nor the time to build out my own; though after you’ve kept a watchful eye on links for a few weeks, you start to see patterns, and deviations from that pattern should be either easily explained or quickly investigated.

Posted in Management, Security | Tagged: , , | 1 Comment »

Monitoring/managing logins and config changes with IOS

Posted by qualityofservice on June 9, 2009

For the purposes of this note, I’m going to pretend Telnet doesn’t exist.  Most of the stuff applies regardless of whether you use it or not, but I’m happier working under the assumption that all VTY configs look like this:

line vty 0 15
transport input ssh

I’m going to digress already and say that it’s a good idea to restrict access to certain networks:

line vty 0 15
access-class 101 in
transport input ssh

And that some go a step further and protect the last VTY line as a last resort in the event that the other 14 or so are occupied by someone with less-than-benevolent purposes; that way, the host(s) specified in ACL 102 can still manage the router:

line vty 0 14
access-class 101 in
transport input ssh

line vty 15
access-class 102 in
transport input ssh

But back to the point.  In the early days of IOS 12.3, they introduced the “login” command-set for login security enhancement (http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gt_login.html).

login block-for 60 attempts 3 within 60
login delay 3
login on-failure log
login on-success log

This gives you three chances to pass the test before the router blocks all logins for 60 seconds (this period is called “quiet mode”).  There’s also a 3-second delay between attempts.  This mitigates someone throwing the kitchen sink at your device; it takes them 9 seconds just to try three times, and they can only do so once a minute without changing IP addresses.  A “quiet mode” list can be configured to allow certain hosts to get around these restrictions; this is a good idea, because someone spamming login attempts can lock you out, and it’s a race to log in when quiet time ends.  Luckily, the “on-failure log” will tell you which IP address is responsible for the attack. Info on configuring quiet-mode bypass is in the documentation linked at the end of this note.

Of course, the problem with this is that by default, IOS will let you attempt four SSH logins before terminating the session.  You can fix that, too. I use this:

ip ssh authentication-retries 2
ip ssh logging events
ip ssh version 2

“authentication-retries” is, literally, retries.  It lets you make two additional attempts after the first failed attempt; hence three in total, which matches up with the three attempts before you’re locked out for a minute, configured above in the “login” section. “Version 2” forces the use of SSHv2 by the client side; SSHv1/v1.5 considered insecure and deprecated for well over a decade.

Finally, the built-in config-change archiver/logger:

archive
log config
logging enable
logging size 200
notify syslog
hidekeys

This will take any change made in config mode, save a small local copy of said changes to a local buffer, and spit them out to syslog. “hidekeys” keeps sensitive info obscured (syslog packets being unencrypted and all).  How many times have you asked yourself “well, what’s changed?”  This lets you know in real-time.

All this and more over at the IOS Security Configuration Guide and Command Reference, which can be found here for IOS 12.4: http://www.cisco.com/en/US/docs/ios/security/configuration/guide/12_4/sec_12_4_book.html

Whole bunch of examples below the cut!

Read the rest of this entry »

Posted in Management, Security | Tagged: , , | Leave a Comment »

Add a little flash (to your IOS router)

Posted by qualityofservice on May 19, 2009

Can’t believe I’ve never played with these before, they’re brilliant.   12.4T Advanced IP Services images are over 32MB in size and it’s not possible to store two different images on the same stock flash drive, which introduces a risk when remote upgrades are required.  If an upgrade goes bad, there are some sites where I can count on remote hands capable of solid support; others, I’m not so fortunate.  So all the remote sites are getting USB keys, now, which will do more for my ability to keep my sites consistent and stable than any other measure implemented in my three years in this position.

 The ISR routers come with a USB port.  Insert USB stick, router recognizes it immediately. 

 Do a “format usbflash0” and it was ready to go.  TFTP’d an image, and set it to boot from the USB stick with “boot system usbflash0:[imagename]”, rebooted, and came back up on an upgraded image.  Removed the memory key, rebooted, and it ignored the “boot system” specification and booted back into the old image from flash.

 Copied the old image from flash onto the USB stick (“copy flash:[oldimage] usbflash0:”), deleted the old image from flash, copied the new image to flash, and done.  Known working image in flash, and both old and new images stored on the USB stick.  In my case, an 1841 recognized a 4GB USB key, which provides 16x more image storage capacity over the default 64MB of Flash that ships with the ISR bundles I order. 

 No need to worry about a reboot leaving you high-and-dry mid-upgrade after you’ve removed an old image to make room for the new one; which should remove any reticence to keeping IOS images current.  Just copy to USB and boot from the stick, first (caveat: takes about 220 seconds to load a 36MB image from USB into RAM on an 1841; takes about 120 seconds to load the same image from flash).  Worst case, you fall back to a known good image in flash.

 For the security conscious, yes, this opens up the ability to have someone stick their own file onto the USB key and somehow get your router to load it; but if they have the physical access to permit them to do this in the first place, it’s simpler for them to just reboot into password recovery mode and do whatever they like.

 Caveats: Cisco will sell you their own USB keys, but they’re about $300 after discount to add 256MB (part number: MEMUSB-64/128/256FT);   I’d rather pay $10 to add 4GB.  I’ve only tested this with a Kingston DataTraveller stick; YMMV.  I also move the “new” image to Flash once I’m ready to go into production with it; the risk being that if you find yourself having to work through a TAC case and they notice that you’re booting from a non-Cisco flash, they may tell you to suck rocks — which is a risk I’m willing to take in order to be able to test and upgrade on my own terms

Posted in Awesome, Management | Leave a Comment »

Quis custodiet ipsos custodes?

Posted by qualityofservice on September 2, 2008

I’m playing around with a lot of bleeding edge IOS releases lately, with my newest trick based on Embedded Event Manager documentation from Cisco’s mgmt configuration guide: http://www.cisco.com/en/US/docs/ios/netmgmt/configuration/guide/nm_erm_resource_ps6441_TSD_Products_Configuration_Guide_Chapter.html

 

Ever want to immediately pinpoint a transient CPU spike?  Consistently high utilization is one thing, but quick spikes are less obvious and don’t tend to show up on graphs averaged over 5-minute intervalic measurements.  The following is employed on an 1841, using IOS 12.4(15)T7:

 

resource policy

  policy HighCPU type iosprocess

   system

    cpu process

     critical rising 40 falling 25

     major rising 20 falling 10

    !

   !

  !

  user group Hogger type iosprocess

   instance “IP Input”

   policy HighCPU

  !

snmp-server enable traps resource-policy

 

The (utilization percentage-based) numbers are arbitrary, but I’ll use those in production based on the fact that our resources are heavily oversized for the kind of work they have to do.  To test it out, I hammered the router’s control plane with 100 Mbps worth of ICMP to the router’s interface.  Here’s what immediately happened:

 

000247: *Sep  2 2008 16:38:28.991 UTC: %SYS-4-CPURESRISING: Resource group Hogger is seeing local cpu util 25% at process level more than the configured major limit 20 %

 

Then immediately after I stopped:

 

000278: *Sep  2 2008 16:39:33.971 UTC: %SYS-6-CPURESFALLING: Resource group Hogger is no longer seeing local high cpu at process level for the configured major limit 10%, current value 0%

 

EEM is phenomenal; it basically lets your router monitor itself.  The above will generate a warning and informational-level syslog message for alert triggering and reset, respectively.

 

Next on the list is Control-Plane Policing, or how to mitigate the effect of someone trying to blast your router’s interfaces just like I did for the purposes of this test (which isn’t to say that a sufficiently motivated user couldn’t simply point the firehose at your link to fill it full of DoS, but that’s a seperate issue best dealt with controls elsewhere).  =P

 

Posted in Management | Tagged: , , , | 2 Comments »