Most of the OSPF documentation you’ll read states that OSPF uses multicast traffic to build adjacencies and flood Link State Advertisements. Turns out that at one key point in the building of neighbour relationships, it actually unicasts a packet (IF the media is broadcast; point-to-point media will use multicast for everything). One little packet out of millions can potentially bring down your network should you fail to account for it. Rare, but it can happen, and I’d like to document why (because in our case, it actually did happen; thankfully not to a network under my control, which made figuring out the cause all the more enjoyable). I am indebted to the excellent Cisco Press Troubleshooting IP Routing Protocols text for pointing me in the correct direction.
If you happen to block access to an interface’s IP address for security reasons — without explictly permitting all OSPF traffic earlier in the ACL with something like “permit ip ospf any any”) — you will find yourself in an outage situation if a previously-established neighbour relationship breaks and tries to re-form. This can easily happen in hastily-configured infrastructure ACL’s.
This creates a “time-bomb” on your network: it could be stable for MONTHS…until the relationship is torn down. Hell breaks loose when it tries to come back up. Without going into too much detail, OSPF goes through a series of states while building the relationship. The circumstances described above will cause the router to become stuck in the EXSTART state. For more background on OSPF states, please see the following: http://www.cisco.com/warp/public/104/1.html#t20
To test this, I set up OSPF between two routers, applied an access-list to block unicast packets destined to the IP address of one of the router interfaces (and permit everything else), and created another ACL to apply to “debug ip packet” that would monitor all OSPF traffic (IP Protocol #89).
All the major work was done on one router…important parts of test config are as follows:
interface Loopback0
description OSPF Picks Loopback Address to Become Router ID
ip address 20.20.20.20 255.255.255.255
!
interface Ethernet0/0
ip address 172.20.28.1 255.255.255.0
ip access-group BlockOSPF in
ip ospf priority 100
half-duplex
!
router ospf 1
log-adjacency-changes
passive-interface Loopback0
network 20.20.20.20 0.0.0.0 area 0
network 172.20.28.0 0.0.0.255 area 0
ip access-list extended BlockOSPF
deny ip any host 172.20.28.1
permit ip any any
!
access-list 198 permit ospf any any
Then I enabled debug logging to the VTY and started the debug:
conf f
logging monitor debugging
exit
terminal monitor
debug ip packet 198
debug ip ospf adjacency
When the ACL was applied, the neighbour adjacency remained up as expected (the issue only shows up during adjacency formation; after the relationship is established, it remains in the FULL state and only sends HELLO and LSA packets). So I shut the interface down and brought it back up to force it to go through adjacency setup again. Here’s what happened. “ip packet” debugs are indicated by “IP,” “ospf adj” debugs indicated by “OSPF”:
*Apr 22 02:36:42.684: OSPF: Interface Ethernet0/0 going Up
*Apr 22 02:36:42.684: IP: s=172.20.28.1 (local), d=224.0.0.5 (Ethernet0/0), len 76, sending broad/multicast
*Apr 22 02:36:42.688: IP: s=172.20.28.2 (Ethernet0/0), d=172.20.28.1, len 80, access denied
*Apr 22 02:36:44.296: IP: s=172.16.120.105 (local), d=224.0.0.5 (Serial1/0), len 120, sending broad/multicast
*Apr 22 02:36:44.296: OSPF: Build router LSA for area 0, router ID 20.20.20.20, seq 0×8000003B
*Apr 22 02:36:44.672: %LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
*Apr 22 02:36:45.672: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
*Apr 22 02:36:46.268: IP: s=172.16.120.105 (local), d=224.0.0.5 (Serial1/0), len 80, sending broad/multicast
*Apr 22 02:36:51.064: IP: s=172.20.28.2 (Ethernet0/0), d=224.0.0.5, len 80, rcvd 0
*Apr 22 02:36:51.064: OSPF: 2 Way Communication to 30.30.30.30 on Ethernet0/0, state 2WAY
*Apr 22 02:36:51.064: OSPF: Backup seen Event before WAIT timer on Ethernet0/0
*Apr 22 02:36:51.068: OSPF: DR/BDR election on Ethernet0/0
*Apr 22 02:36:51.068: OSPF: Elect BDR 20.20.20.20
*Apr 22 02:36:51.068: OSPF: Elect DR 30.30.30.30
*Apr 22 02:36:51.068: OSPF: Elect BDR 20.20.20.20
*Apr 22 02:36:51.068: OSPF: Elect DR 30.30.30.30
*Apr 22 02:36:51.068: DR: 30.30.30.30 (Id) BDR: 20.20.20.20 (Id)
*Apr 22 02:36:51.068: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×1BEE opt 0×52 flag 0×7 len 32
*Apr 22 02:36:51.068: IP: s=172.20.28.1 (local), d=172.20.28.2 (Ethernet0/0), len 64, sending
*Apr 22 02:36:51.072: IP: s=172.20.28.1 (local), d=172.20.28.2 (Ethernet0/0), len 80, sending
*Apr 22 02:36:52.684: IP: s=172.20.28.1 (local), d=224.0.0.5 (Ethernet0/0), len 80, sending broad/multicast
*Apr 22 02:36:56.072: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×1BEE opt 0×52 flag 0×7 len 32
*Apr 22 02:36:56.072: IP: s=172.20.28.1 (local), d=172.20.28.2 (Ethernet0/0), len 64, sending
*Apr 22 02:36:56.072: OSPF: Retransmitting DBD to 30.30.30.30 on Ethernet0/0 [1]
*Apr 22 02:36:56.072: IP: s=172.20.28.2 (Ethernet0/0), d=172.20.28.1, len 64, access denied
*Apr 22 02:36:56.268: IP: s=172.16.120.105 (local), d=224.0.0.5 (Serial1/0), len 80, sending broad/multicast
*Apr 22 02:37:01.064: IP: s=172.20.28.2 (Ethernet0/0), d=224.0.0.5, len 80, rcvd 0
*Apr 22 02:37:01.064: OSPF: Neighbor change Event on interface Ethernet0/0
*Apr 22 02:37:01.064: OSPF: DR/BDR election on Ethernet0/0
*Apr 22 02:37:01.064: OSPF: Elect BDR 20.20.20.20
*Apr 22 02:37:01.068: OSPF: Elect DR 30.30.30.30
*Apr 22 02:37:01.068: DR: 30.30.30.30 (Id) BDR: 20.20.20.20 (Id)
*Apr 22 02:37:01.072: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×1BEE opt 0×52 flag 0×7 len 32
*Apr 22 02:37:01.072: IP: s=172.20.28.1 (local), d=172.20.28.2 (Ethernet0/0), len 64, sending
*Apr 22 02:37:01.072: OSPF: Retransmitting DBD to 30.30.30.30 on Ethernet0/0 [2]
*Apr 22 02:37:01.072: IP: s=172.20.28.2 (Ethernet0/0), d=172.20.28.1, len 64, access denied
*Apr 22 02:37:02.684: IP: s=172.20.28.1 (local), d=224.0.0.5 (Ethernet0/0), len 80, sending broad/multicast
The output on the neighbouring router is equally interesting: it will be sending and receiving DBD packets. But since the two are trying to establish which one will be the master and which will be the slave during the initial exchange, each will just keep retransmitting because Router1 isn’t sure who the master will be (flag 0×7 means “I am initializing the relationship, I have more to send, and I am the Master”; normally, both sides send DBD packets with this message and the one with the highest Router ID wins.
For reference, the DBD packet contains three bits: I (Initialization), M (More), and Master/Slave (MS). During the EXSTART stage, each router sets them all (giving the flag 0×7 seen above). During the actual exchange, only the M and MS bits will be sent. The master will have a flag 0×3, the slave will have 0×2, assuming both have DBD packets to send. When they’re finished, the master will say so by setting its flag to 0×1, and the slave will be 0×0.
Now watch what happens when all OSPF traffic is explicitly permitted to hit the interface; note the state of the DBD flags as each device progresses through the EXCHANGE state:
*Apr 22 03:19:40.204: OSPF: Interface Ethernet0/0 going Up
*Apr 22 03:19:40.208: OSPF: 2 Way Communication to 30.30.30.30 on Ethernet0/0, state 2WAY
*Apr 22 03:19:40.208: OSPF: Backup seen Event before WAIT timer on Ethernet0/0
*Apr 22 03:19:40.208: OSPF: DR/BDR election on Ethernet0/0
*Apr 22 03:19:40.208: OSPF: Elect BDR 20.20.20.20
*Apr 22 03:19:40.208: OSPF: Elect DR 30.30.30.30
*Apr 22 03:19:40.208: OSPF: Elect BDR 20.20.20.20
*Apr 22 03:19:40.212: OSPF: Elect DR 30.30.30.30
*Apr 22 03:19:40.212: DR: 30.30.30.30 (Id) BDR: 20.20.20.20 (Id)
*Apr 22 03:19:40.212: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×115E opt 0×52 flag 0×7 len 32
*Apr 22 03:19:40.704: OSPF: Build router LSA for area 0, router ID 20.20.20.20, seq 0×80000041
*Apr 22 03:19:45.212: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×115E opt 0×52 flag 0×7 len 32
*Apr 22 03:19:45.212: OSPF: Retransmitting DBD to 30.30.30.30 on Ethernet0/0 [1]
*Apr 22 03:19:45.212: OSPF: Rcv DBD from 30.30.30.30 on Ethernet0/0 seq 0×2122 opt 0×52 flag 0×7 len 32 mtu 1500 state EXSTART
*Apr 22 03:19:45.216: OSPF: NBR Negotiation Done. We are the SLAVE
*Apr 22 03:19:45.216: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×2122 opt 0×52 flag 0×2 len 112
*Apr 22 03:19:45.220: OSPF: Rcv DBD from 30.30.30.30 on Ethernet0/0 seq 0×2123 opt 0×52 flag 0×3 len 92 mtu 1500 state EXCHANGE
*Apr 22 03:19:45.220: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×2123 opt 0×52 flag 0×0 len 32
*Apr 22 03:19:45.224: OSPF: Rcv DBD from 30.30.30.30 on Ethernet0/0 seq 0×2124 opt 0×52 flag 0×1 len 32 mtu 1500 state EXCHANGE
*Apr 22 03:19:45.224: OSPF: Exchange Done with 30.30.30.30 on Ethernet0/0
*Apr 22 03:19:45.224: OSPF: Send LS REQ to 30.30.30.30 length 12 LSA count 1
*Apr 22 03:19:45.224: OSPF: Send DBD to 30.30.30.30 on Ethernet0/0 seq 0×2124 opt 0×52 flag 0×0 len 32
*Apr 22 03:19:45.228: OSPF: Rcv LS REQ from 30.30.30.30 on Ethernet0/0 length 48 LSA count 2
*Apr 22 03:19:45.228: OSPF: Send UPD to 172.20.28.2 on Ethernet0/0 length 108 LSA count 2
*Apr 22 03:19:45.232: OSPF: Rcv LS UPD from 30.30.30.30 on Ethernet0/0 length 76 LSA count 1
*Apr 22 03:19:45.232: OSPF: Synchronized with 30.30.30.30 on Ethernet0/0, state FULL
*Apr 22 03:19:45.232: %OSPF-5-ADJCHG: Process 1, Nbr 30.30.30.30 on Ethernet0/0 from LOADING to FULL, Loading Done
After both sides agree that they are finished with the EXCHANGE state, they proceed to the LOADING state, and ask for updated routes. Once both sides agree that their link-state databases are they same, they proceed to the FULL state and stay there until disrupted. The state of a relationship can be viewed with “show ip ospf neighbour”:
Neighbor ID Pri State Dead Time Address Interface
10.10.10.10 0 FULL/ – 00:00:31 172.16.120.120 Serial1/0
30.30.30.30 1 FULL/BDR 00:00:32 172.20.28.2 Ethernet0/0
Here we see that the neighbour at 30.30.30.30 is the Backup Designated Router for the segment; it has the higher Router ID (> 20.20.20.20), but note that I’ve configured the Eth0/0 interface of my test router with “ip ospf priority 100″ to force the test router to be the DR for the segment; higher priority will always win the DR election**. The default priority is 1, and if both neighbours use the default, the higher Router ID will win. In Cisco IOS, Router ID can be statically defined as a “router ospf”-level command, or it will be the highest loopback address on the router. If there are no loopback addresses or statically defined ID’s, OSPF will select the highest interface IP on the router (which can lead to instabilities if that interface goes down) to become the Router ID.
Note that even though 20.20.20.20 is the DR, 30.30.30.30 still becomes the Master of the EXCHANGE state, since master/slave is determined by highest Router ID and has nothing to do with interface priority. Also note that the neighbour at 10.10.10.10 has no DR or BDR; as this link is a point-to-point serial link, no DR election occurs.
I’m venturing into new and exciting territory here, by putting my studies up to public scrutiny. If you’ve stumbled upon this page by accident (perhaps you were looking to read up on MQC or something) and you see a statement that strikes you as dumb or wrong, I welcome questions, comments, concerns and critiques.
*comma purposefully omitted
**In IOS at least, the DR is “sticky”; after it’s elected, you have to clear the OSPF process on the DR in order to force a re-election on the segment.