IPSec protocol helps the IP layer establish a secure and trusted packet transmission channel. At present, more mature and stable open source projects such as strongswan and openswan have been controlled by the protocol layer. But in the end, they all use the XFRM framework of the kernel to package, send, receive and unseal messages, but the forwarding table entry data of the kernel is generated by them.
XFRM is short for transfrom.
######IPSec packet receiving and unsealing process
Process path: IP_ rcv() --> ip_ rcv_ finish() --> ip_ local_ deliver() --> ip_ local_ deliver_ finish()
The unsealing side must be the destination of the ip message_ rcv_ The route found in finish must be a local route (RTCF_LOCAL). Call ip_ local_ Delivery processing.
Below is a picture posted on the Internet.
ip_ local_ deliver_ In finish, call the corresponding processing function according to the last protocol type. inet_ The operation sets of various protocols are mounted in the PROTOS. For AH or ESP, it is xfrm4_rcv, in the case of ipsec nat-t, is the processing function of udp protocol_ RCV, the inside is the encapsulated ipsec message (AH or ESP).
static int ip_local_deliver_finish(struct sk_buff *skb) { ...... hash = protocol & (MAX_INET_PROTOS - 1); ipprot = rcu_dereference(inet_protos[hash]); ...... if (ipprot != NULL) { ...... ret = ipprot->handler(skb); ...... } ...... }
xfrm4_rcv --> xfrm4_rcv_spi --> xfrm4_rcv_encap --> xfrm_input
Finally call xfrm_input is used to unpack the package.
1. Create a secure path for SKB;
2. Analyze the message, obtain daddr and spi, and add the protocol type (esp, ah, etc.), you can query SA. These are the keys of SA. Below is a group of linux ipsec state (SA) and policy, which is convenient to see the key information at a glance;
3. Call the input function of the protocol type corresponding to SA, unpack and return the protocol type of the higher layer. The type can be esp,ah,ipcomp, etc. Corresponding processing function esp_input,ah_input, etc;
4. After decoding, unpack according to ipsec mode. Tunnel mode and transmission mode are commonly used. Corresponding xfrm4_mode_tunnel_input and xfrm4_ transport_ The processing of inout is relatively simple. The tunnel mode removes the outer header, and the transmission mode only sets some skb data.
5. The protocol type can be encapsulated in multiple layers, such as ESP+AH, so it is necessary to parse the memory protocol again. If it is still AH, ESP and COMP, parse the new spi and return 2 to query the new SA processing message.
6. After the above process, the user data message (IP message) is leaked. According to the ipsec mode:
- tunnel mode, call netif with new ip header (user message)_ RX reentry protocol stack;
- Transfer mode, call xfrm4_ transport_ After finishing, re-enter part of the protocol stack. The first step is to process the pre routing point. Although the pre routing and input points have passed, the new protocol type and port number of the decrypted packet may still need to be nat. Then re route and call the input function of the route (skb_dst (SKB) - > input (SKB);), It could be ip_local_deliver() or ip_forward(), complete the following protocol stack.
#ip xfrm state src 11.11.11.11 dst 12.12.12.12 proto esp spi 0x0bd7c7c3 reqid 245 mode tunnel replay-window 0 flag af-unspec auth-trunc hmac(sha1) 0xf45ccce0353a76dbfd260902acb2d9b6a58140f2 96 enc cbc(aes) 0xa8d0767a7a9c14046a83bc8d10b47d10b7b1d7f473e894c265b246f0b9e8096c src 12.12.12.12 dst 11.11.11.11 proto esp spi 0xcdbc8e20 reqid 245 mode tunnel replay-window 0 flag af-unspec auth-trunc hmac(sha1) 0x582534112007077449011c1a6f955f1df9b14b04 96 enc cbc(aes) 0xb6e944ba2f29ccea436aece32558173592fadf5fe61ae8c66f78d17046869ae1 # ip xfrm policy src 0.0.0.0/0 dst 0.0.0.0/0 dir out priority 399999 ptype main tmpl src 11.11.11.11 dst 12.12.12.12 proto esp spi 0x0bc742d3 reqid 245 mode tunnel src 0.0.0.0/0 dst 0.0.0.0/0 dir fwd priority 399999 ptype main tmpl src 12.12.12.12 dst 11.11.11.11 proto esp reqid 245 mode tunnel src 0.0.0.0/0 dst 0.0.0.0/0 dir in priority 399999 ptype main tmpl src 12.12.12.12 dst 11.11.11.11 proto esp reqid 245 mode tunnel
int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) { struct net *net = dev_net(skb->dev); int err; __be32 seq; struct xfrm_state *x; xfrm_address_t *daddr; struct xfrm_mode *inner_mode; unsigned int family; int decaps = 0; int async = 0; /* A negative encap_type indicates async resumption. */ if (encap_type < 0) { async = 1; x = xfrm_input_state(skb); seq = XFRM_SKB_CB(skb)->seq.input; goto resume; } /* Allocate new secpath or COW existing one. */ // The security path is created when receiving packets if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) { struct sec_path *sp; sp = secpath_dup(skb->sp); if (!sp) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR); goto drop; } if (skb->sp) secpath_put(skb->sp); skb->sp = sp; } daddr = (xfrm_address_t *)(skb_network_header(skb) + XFRM_SPI_SKB_CB(skb)->daddroff); family = XFRM_SPI_SKB_CB(skb)->family; seq = 0; if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR); goto drop; } do { if (skb->sp->len == XFRM_MAX_DEPTH) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR); goto drop; } // Check SA x = xfrm_state_lookup(net, skb->mark, daddr, spi, nexthdr, family); if (x == NULL) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES); xfrm_audit_state_notfound(skb, family, spi, seq); goto drop; } skb->sp->xvec[skb->sp->len++] = x; spin_lock(&x->lock); if (unlikely(x->km.state != XFRM_STATE_VALID)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEINVALID); goto drop_unlock; } if ((x->encap ? x->encap->encap_type : 0) != encap_type) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMISMATCH); goto drop_unlock; } if (x->props.replay_window && xfrm_replay_check(x, skb, seq)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATESEQERROR); goto drop_unlock; } if (xfrm_state_check_expire(x)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEEXPIRED); goto drop_unlock; } spin_unlock(&x->lock); XFRM_SKB_CB(skb)->seq.input = seq; // Decode according to the protocol type and return the protocol type of the higher layer. The type can be esp,ah,ipcomp, etc. Corresponding processing function esp_input,ah_input, etc nexthdr = x->type->input(x, skb); if (nexthdr == -EINPROGRESS) return 0; resume: spin_lock(&x->lock); if (nexthdr <= 0) { if (nexthdr == -EBADMSG) { xfrm_audit_state_icvfail(x, skb, x->type->proto); x->stats.integrity_failed++; } XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEPROTOERROR); goto drop_unlock; } /* only the first xfrm gets the encap type */ encap_type = 0; if (x->props.replay_window) xfrm_replay_advance(x, seq); x->curlft.bytes += skb->len; x->curlft.packets++; spin_unlock(&x->lock); XFRM_MODE_SKB_CB(skb)->protocol = nexthdr; inner_mode = x->inner_mode; if (x->sel.family == AF_UNSPEC) { inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol); if (inner_mode == NULL) goto drop; } // After decoding, unpack according to the mode if (inner_mode->input(x, skb)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR); goto drop; } if (x->outer_mode->flags & XFRM_MODE_FLAG_TUNNEL) { decaps = 1; break; } /* * We need the inner address. However, we only get here for * transport mode so the outer address is identical. */ daddr = &x->id.daddr; family = x->outer_mode->afinfo->family; /* See whether the inner layer protocol needs to be unpacked, 1=no, 0=yes, - 1=err The protocol type can be encapsulated in multiple layers, such as ESP+AH */ err = xfrm_parse_spi(skb, nexthdr, &spi, &seq); if (err < 0) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR); goto drop; } } while (!err); // Clear netfilter data and create a new forwarding process later nf_reset(skb); if (decaps) { // Tunnel mode. After unpacking, use a new ip header and re-enter the protocol stack skb_dst_drop(skb); netif_rx(skb); return 0; } else { /* Transmission mode, enter xfrm4_ transport_ To finish, you also need to re-enter part of the protocol stack If netfilter is supported, go to the preouting point for processing and re route Although this place is already at the INPUT point, the decoded protocol type and port number change, and NAT operation may be required If netfilter is not supported, the IP protocol needs to be processed again because the protocol type has changed. */ return x->inner_mode->afinfo->transport_finish(skb, async); } drop_unlock: spin_unlock(&x->lock); drop: kfree_skb(skb); return 0; }
######IPSec contracting and encapsulation process
The process path is shown in the figure below. Here, take the forwarding process as an example. The main processes of packets sent by the machine are similar.
Process forwarding:
ip_ Calling xfrm4_ in forward function route_ Forward, this function:
1. Analyze the user message and find the corresponding Ipsec policy (_xfrm_policy_lookup);
2. Then find the corresponding optimal SA (xfrm_tmpl_resolve) according to the template tmpl of the policy. See the ip xfrm command posted above for the content of the template and the corresponding relationship with the Sa;
3. Finally, a secure route is generated according to SA and mounted on the dst of skb; A user flow can declare multiple security policies, so it will correspond to multiple SAS, and each SA processing will generate a security routing item struct dst_entry structure (xfrm_resolve_and_create_bundle). These security routing items are linked into a linked list through the child pointer, and its member output is attached with the processing functions of different security protocols, so that the data packets can be processed continuously, such as compression, ESP encapsulation and AH encapsulation.
The last routing item of the secure routing chain must be the ordinary IP routing item, because the final messages must be forwarded through the ordinary route. If it is in tunnel mode, after the tunnel output encapsulates the IP header, it will check the route again and mount it to the last one of the secure routing chain.
Note: SA security alliance is the foundation and essence of IPsec. SA is the agreement between communication peers on some elements, such as which protocol to use, the operation mode of the protocol, the encryption algorithm, the shared key to protect data in a specific stream, and the life cycle of SA.
Then, after FORWARD point, call ip_. forward_ finish()–>dst_ Output, and finally call SKB_ DST (SKB) - > output (SKB), and xfrm4 is mounted at this time_ output
int ip_forward(struct sk_buff *skb) { ...... // ipsec routing and routing, replacing skb_dst(skb) if (!xfrm4_route_forward(skb)) goto drop; ...... return NF_HOOK(NFPROTO_IPV4, NF_INET_FORWARD, skb, skb->dev, rt->u.dst.dev, ip_forward_finish); ...... } static int ip_forward_finish(struct sk_buff *skb) { struct ip_options * opt = &(IPCB(skb)->opt); IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS); if (unlikely(opt->optlen)) ip_forward_options(skb); return dst_output(skb); } int __xfrm_route_forward(struct sk_buff *skb, unsigned short family) { struct net *net = dev_net(skb->dev); struct flowi fl; struct dst_entry *dst; int res; if (xfrm_decode_session(skb, &fl, family) < 0) { XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); return 0; } skb_dst_force(skb); dst = skb_dst(skb); // Find a secure route and replace the dst of skb res = xfrm_lookup(net, &dst, &fl, NULL, 0) == 0; skb_dst_set(skb, dst); return res; }
Simply record the sending process of this machine, which is the same as the forwarding process:
Query secure route: IP_ queue_ xmit --> ip_ route_ output_ flow --> __ xfrm_ lookup
Package send: IP_ queue_ xmit --> ip_ local_ out --> dst_ output --> xfrm4_ output
Note:
1). Regardless of forwarding or local sending, the general route will be checked before querying the secure route. If it is not found, the message will be discarded. However, this route does not necessarily need to point to the real outgoing interface of the next hop, as long as it can match the message DIP, such as configuring the default of other interfaces of one hop.
2). strongswan is a widely used open source ipsec software. After the negotiation, you can see that it has created 220 table s. People often ask what is the use of routing and why it is sometimes available and sometimes not. Here is a test record: 1. In 220, it seems that the route related to the stream of interest can be configured only when the tunnel mode and the stream of interest is initiated locally (the IP address of the stream of interest is configured locally), and the route specifies the source; 2. It doesn't matter if you don't configure it. As mentioned in 1), as long as there is a route for the stream of interest, but you need to specify the source when ping ing. Otherwise, the stream of interest may not be matched. So I feel that the first watch of 220 is to ensure
# StrongThe streams of interest in Swan negotiation are 6.6.6.6-7.7.7.7 [root@master conf.d]# swanctl -l gw-gw: #1, ESTABLISHED, IKEv2, 05696f6ebb126b03_i* b8b3e87df66f496f_r local 'moon.strongswan.org' @ 172.16.70.1[500] remote 'sun.strongswan.org' @ 172.16.70.2[500] AES_CBC-192/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024 established 865s ago, reauth in 7887s net-net: #1, reqid 1, INSTALLED, TUNNEL, ESP:AES_CBC-192/HMAC_SHA1_96 installed 865s ago, rekeying in 4305s, expires in 5075s in c1d3090f, 1260 bytes, 15 packets, 780s ago out cb124865, 1260 bytes, 15 packets, 780s ago local 6.6.6.6/32 remote 7.7.7.7/32 [root@master conf.d]# ip rule 0: from all lookup local 220: from all lookup 220 32766: from all lookup main 32767: from all lookup default [root@master conf.d]# ip route ls table 220 7.7.7.7 via 172.16.70.2 dev br0 proto static src 6.6.6.6 # Specify the pass of DIP ping [root@master conf.d]# ping 7.7.7.7 PING 7.7.7.7 (7.7.7.7) 56(84) bytes of data. 64 bytes from 7.7.7.7: icmp_seq=1 ttl=64 time=0.273 ms 64 bytes from 7.7.7.7: icmp_seq=2 ttl=64 time=0.204 ms 64 bytes from 7.7.7.7: icmp_seq=3 ttl=64 time=0.229 ms # Delete route [root@master conf.d]# ip route del 7.7.7.7 via 172.16.70.2 table 220 [root@master conf.d]# ip route ls table 220 [root@master conf.d]# #Specifying SIP ping is the same [root@master conf.d]# ping 7.7.7.7 PING 7.7.7.7 (7.7.7.7) 56(84) bytes of data. ^C --- 7.7.7.7 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1048ms [root@master conf.d]# ping 7.7.7.7 -I 6.6.6.6 PING 7.7.7.7 (7.7.7.7) from 6.6.6.6 : 56(84) bytes of data. 64 bytes from 7.7.7.7: icmp_seq=1 ttl=64 time=0.241 ms 64 bytes from 7.7.7.7: icmp_seq=2 ttl=64 time=0.189 ms 64 bytes from 7.7.7.7: icmp_seq=3 ttl=64 time=0.200 ms
ipsec encapsulation sending process:
xfrm4_output–>xfrm4_output_finish–>xfrm_output–>xfrm_output2–>xfrm_output_resume–>xfrm_output_one
xfrm4_ The output function passes through the POSTROUTING point first, and SNAT can be done before encapsulation. Then call xfrm_output_resume–>xfrm_ output_ One does IPSEC encapsulation, and finally takes the ordinary route and IP transmission.
int xfrm4_output(struct sk_buff *skb) { return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, skb, NULL, skb_dst(skb)->dev, xfrm4_output_finish, !(IPCB(skb)->flags & IPSKB_REROUTED)); }
// Send circularly and apply all policies static int xfrm_output_one(struct sk_buff *skb, int err) { struct dst_entry *dst = skb_dst(skb); struct xfrm_state *x = dst->xfrm; struct net *net = xs_net(x); if (err <= 0) goto resume; do { // SA legitimacy check err = xfrm_state_check_space(x, skb); if (err) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR); goto error_nolock; } // Call the output function of the mode, such as the encapsulation of tunnel mode, encapsulate the external ip header, and the protocol type is IPIP temporarily, which will be replaced later err = x->outer_mode->output(x, skb); if (err) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEMODEERROR); goto error_nolock; } spin_lock_bh(&x->lock); err = xfrm_state_check_expire(x); if (err) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEEXPIRED); goto error; } if (x->type->flags & XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)->seq.output = ++x->replay.oseq; if (unlikely(x->replay.oseq == 0)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATESEQERROR); x->replay.oseq--; xfrm_audit_state_replay_overflow(x, skb); err = -EOVERFLOW; goto error; } if (xfrm_aevent_is_on(net)) xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); } x->curlft.bytes += skb->len; x->curlft.packets++; spin_unlock_bh(&x->lock); // Protocol type output, such as ESP AH, ESP = esp4_output, encapsulates the protocol header, and sets the IP header protocol type as esp err = x->type->output(x, skb); if (err == -EINPROGRESS) goto out_exit; resume: if (err) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEPROTOERROR); goto error_nolock; } // Update dst and SA to the next sub secure route and SA, and continue processing dst = skb_dst_pop(skb); if (!dst) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR); err = -EHOSTUNREACH; goto error_nolock; } skb_dst_set_noref(skb, dst); x = dst->xfrm; } while (x && !(x->outer_mode->flags & XFRM_MODE_FLAG_TUNNEL)); err = 0; out_exit: return err; error: spin_unlock_bh(&x->lock); error_nolock: kfree_skb(skb); goto out_exit; } int xfrm_output_resume(struct sk_buff *skb, int err) { while (likely((err = xfrm_output_one(skb, err)) == 0)) { // Release netfilter information and recreate nf_reset(skb); // Go through local_ Hook function on out err = skb_dst(skb)->ops->local_out(skb); if (unlikely(err != 1)) goto out; // Whether SA is also associated, not associated, ip_output POSTROUTING if (!skb_dst(skb)->xfrm) return dst_output(skb); // If not, go through POSTROUTING and enter xfrm again_ output2 err = nf_hook(skb_dst(skb)->ops->family, NF_INET_POST_ROUTING, skb, NULL, skb_dst(skb)->dev, xfrm_output2); if (unlikely(err != 1)) goto out; } if (err == -EINPROGRESS) err = 0; out: return err; }
Post some data structure diagrams on the Internet
1. Secure routing
2. Policy related protocol processing structure
3. State dependent protocol processing structure