synproxy for connection tracing

Posted by john010117 on Fri, 11 Oct 2019 11:24:45 +0200

Principle analysis

Synproxy is a convenient function to mitigate syn flood attacks. linux supports synproxy with a connection tracking extension.

The following article explains in detail why linux synproxy is implemented. The article links to https://lwn.net/Articles/563151/:

The following patches against nf-next.git implement a SYN proxy for
netfilter. The series applies on top of the patches I sent last week
and is split into five patches:

- a patch to split out sequence number adjustment from NAT and make it
  usable from other netfilter subsystems. This is used to translate
  sequence numbers from the server to the client once the full connection
  has been established.

  This patch contains a bit of churn, but the core is to simply move the
  code to a new file and move the sequence number adjustment data into a
  ct extend.

- a patch to extract the TCP stack independant parts of syncookie generation
  and validation and make the usable from netfilter

- the SYN proxy core and IPv4 SYNPROXY target. See below for more details.

- a similar patch to the second one for IPv6

- an IPv6 version of the SYNPROXY target


The SYNPROXY operates by marking the initial SYN from the client as UNTRACKED
and directing it to the SYNPROXY target. The target responds with a SYN/ACK
containing a cookie and encodes options such as window scaling factor, SACK
perm etc. into the timestamp, if timestamps are used (similar to TCP). The
window size is set to zero. The response is also sent as untracked packet.

When the final ACK is received the cookie is validated, the original options
extracted and a SYN to the original destination is generated. The SYN to the
original destination uses the avertised window from the final ACK and the
options from the initial SYN packet. The SYN is not sent as untracked, so
from a connection tracking POV it will look like the original packet from
the client and instantiate a new connection. When the server responds with
a SYN/ACK a final ACK for the server is generated and a window update with
the window size announced by the server is sent to the client. At this
point the connection is handed of to conntrack and the only thing the
target is still involved in is timestamp translation through the registerd
hooks.

Since the SYN proxy can't know the options the server supports, they have
to be specified as parameters to the SYNPROXY target. The assumption is that
these options are constant as long as you don't change settings on the
server. Since the SYN proxy can't know the initial sequence number and
timestamp values the server will use, both have to be translated in the
direction server->client. Sequence number translation is done using the
standard sequence number translation mechanisms originally only used for
NAT, timestamps are translated in a hook registered by the SYNPROXY target.

Martin Topholm made some performance measurements with an earlier version
(that should still be valid, the only difference was that the core and IPv4
parts were in the same file) and measured a load of about 7% on a 8 way
system with 2 million SYNs per second, which without the target basically
killed the server (Martin, please correct me if I'm wrong).

The iptables patches will follow in a seperate thread, testing can be done
by:

iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
iptables -A INPUT -i eth0 -p tcp --dport 80 -m state --state UNTRACKED,INVALID \
    -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

The second rule catches untracked packets and directs them to the target.
The purpose of disabling loose tracking is to have the final ACK from the
client not be picked up by conntrack, so it won't create a new conntrack
entry and will be marked INVALID and also get directed to the target.

Unfortunately I couldn't come up with a nicer way to catch just the
first SYN and final ACK from the client and not have any more packets
hit the target, but even though it doesn't look to nice, it works well.

Comments welcome.

synproxy Tool Principle

When synproxy is turned on, synproxy is transparent to the client. The three handshakes are first performed between the client and synproxy:
Client sends TCP SYN to server A
When a message reaches the firewall, it is set to UNTRACKED by the first rule above, then the syn message will not be tracked.
This UNTRACKED TCP SYN message will hit the second rule and execute the SYNPROXY action.
SYNPROXY will capture the message, record the relevant information in the message, and then send a TCP SYN+ACK to the client (the source IP is the server's IP) imitating server A. The message goes out from the OUTPUT node because the syn message does not track the connection and sets nf_conntrack_tcp_loose=0. The synack message is set to INVALID by connection tracking (note that it is not UNTRACKED), and no CT is created.
The client responds to a TCP ACK. Similarly, the message will be set to INVALID. The message will hit the second rule and execute the SYNPROXY action.
After the client has completed three handshakes with SYNPROXY, SYNPROXY will immediately automatically complete three handshakes with the real server. Fake a SYN package to make the real server think the client is trying to connect to it:
SYNPROXY sends a TCP SYN real server server A. This is a new connection. The message enters the netfilter through the OUTPUT node. The message creates a connection trace in the state of NEW. Source IP is the source IP of client and destination IP is the IP of real server.
Real server server A sends a SYN+ACK to the client.
SYNPROXY will respond to an ACK message when it receives a SYN+ACK message from a real server. The status of CT is marked ESTABLISHED.
Once connection tracking enters the ESTABLISHED state, SYNPROXY will allow the client to communicate directly with the real server.

Therefore, SYNPROXY can handle any type of TCP traffic. It can also be used to encrypt traffic because SYNPROXY does not care about TCP load.

Pictures come from the Internet

In fact, when implementing synproxy, there will be a little difference from the above, because synproxy also needs to solve some details of TCP, which are mainly embodied in:

syncookie
tcp-options
sequence number

Implementation of Source Code Analysis

Implementation of synproxy taget

Three handshake messages with the client, through rule 1 above:

iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

Connection tracking is not created in netfilter. It hits the second rule and is processed by SYNPROXY.

iptables -A INPUT -i eth0 -p tcp --dport 80 -m state --state UNTRACKED,INVALID \
    -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

static struct xt_target synproxy_tg4_reg __read_mostly = {
    .name        = "SYNPROXY",
    .family        = NFPROTO_IPV4,
    .hooks        = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD),
    .target        = synproxy_tg4,
    .targetsize    = sizeof(struct xt_synproxy_info),
    .checkentry    = synproxy_tg4_check,
    .destroy    = synproxy_tg4_destroy,
    .me        = THIS_MODULE,
};

Processing of tcp options

Because SYNPROXY does not know the TCP options supported by the real server, it is necessary to set the corresponding TCP option parameters when configuring SYNPROXY. Once these parameters are set, they will not change, which is equivalent to a constant. Rule 2 sets five options -- sack-perm --timestamp --mss 1480 --wscale 7 --ecn, which will be filled into the following structure:

#define XT_SYNPROXY_OPT_MSS        0x01
#define XT_SYNPROXY_OPT_WSCALE        0x02
#define XT_SYNPROXY_OPT_SACK_PERM    0x04
#define XT_SYNPROXY_OPT_TIMESTAMP    0x08
#define XT_SYNPROXY_OPT_ECN        0x10

struct xt_synproxy_info {
    __u8    options;//Specifically supported option flags
    __u8    wscale;//Window scaling factor
    __u16    mss;//Maximum message segment size
};

Initialization

static int __net_init synproxy_net_init(struct net *net)
{
    struct synproxy_net *snet = synproxy_pernet(net);
    struct nf_conn *ct;
    int err = -ENOMEM;
    //Create a ct template that sets the IPS_TEMPLATE_BIT flag
    ct = nf_ct_tmpl_alloc(net, &nf_ct_zone_dflt, GFP_KERNEL);
    if (!ct)
        goto err1;
    //Add Sequence Number Adjustment Extended Control Block
    if (!nfct_seqadj_ext_add(ct))
        goto err2;
    //Add synproxy extension control block
    if (!nfct_synproxy_ext_add(ct))
        goto err2;
    //Setting IPS_CONFIRMED_BIT
    __set_bit(IPS_CONFIRMED_BIT, &ct->status);
    nf_conntrack_get(&ct->ct_general);
    snet->tmpl = ct;

    snet->stats = alloc_percpu(struct synproxy_stats);
    if (snet->stats == NULL)
        goto err2;

    err = synproxy_proc_init(net);
    if (err < 0)
        goto err3;

    return 0;

err3:
    free_percpu(snet->stats);
err2:
    nf_ct_tmpl_free(ct);
err1:
    return err;
}

synproxy_tg4

static unsigned int
synproxy_tg4(struct sk_buff *skb, const struct xt_action_param *par)
{
    //Get synproxy configuration parameters
    const struct xt_synproxy_info *info = par->targinfo;
    struct net *net = xt_net(par);
    struct synproxy_net *snet = synproxy_pernet(net);
    struct synproxy_options opts = {};
    struct tcphdr *th, _th;

    if (nf_ip_checksum(skb, xt_hooknum(par), par->thoff, IPPROTO_TCP))
        return NF_DROP;

    th = skb_header_pointer(skb, par->thoff, sizeof(_th), &_th);
    if (th == NULL)
        return NF_DROP;
    //Resolving syn message options
    if (!synproxy_parse_options(skb, par->thoff, th, &opts))
        return NF_DROP;
    
    if (th->syn && !(th->ack || th->fin || th->rst)) {//syn message
        /* Initial SYN from client */
        this_cpu_inc(snet->stats->syn_received);

        if (th->ece && th->cwr)
            opts.options |= XT_SYNPROXY_OPT_ECN;
        //intersect
        opts.options &= info->options;
        //syn-cookie is possible only if both client and server support timestamp options
        //Because only in syn messages will there be sack-perm, mss, window scaling factor and other options.
        //syn-cookie saves this information with the timestamp option so that it can be restored in ack message
        //This information.
        if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
            synproxy_init_timestamp_cookie(info, &opts);
        else//If the timestamp options are not supported, you can only declare that they are not supported.
            opts.options &= ~(XT_SYNPROXY_OPT_WSCALE |
                      XT_SYNPROXY_OPT_SACK_PERM |
                      XT_SYNPROXY_OPT_ECN);
        //Send synack message to client
        synproxy_send_client_synack(net, skb, th, &opts);
        consume_skb(skb);
        return NF_STOLEN;
    } else if (th->ack && !(th->fin || th->rst || th->syn)) {//ack message
        /* ACK from client */
        if (synproxy_recv_client_ack(net, skb, th, &opts, ntohl(th->seq))) {
            consume_skb(skb);
            return NF_STOLEN;
        } else {
            return NF_DROP;
        }
    }

    return XT_CONTINUE;
}

Send syn+ack to client

static void
synproxy_send_client_synack(struct net *net,
                const struct sk_buff *skb, const struct tcphdr *th,
                const struct synproxy_options *opts)
{
    struct sk_buff *nskb;
    struct iphdr *iph, *niph;
    struct tcphdr *nth;
    unsigned int tcp_hdr_size;
    u16 mss = opts->mss;

    iph = ip_hdr(skb);

    tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
    nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER,
             GFP_ATOMIC);
    if (nskb == NULL)
        return;
    skb_reserve(nskb, MAX_TCP_HEADER);
    //Building IP header
    niph = synproxy_build_ip(net, nskb, iph->daddr, iph->saddr);

    skb_reset_transport_header(nskb);
    nth = skb_put(nskb, tcp_hdr_size);
    nth->source    = th->dest;
    nth->dest    = th->source;
    //Calculate the initial sequence number of synproxy, where syncookie is used. Hide the mss in the initial sequence number.
    //After receiving the ack message from the client, it will be restored.
    nth->seq    = htonl(__cookie_v4_init_sequence(iph, th, &mss));//Computing syncookie using mss as a factor
    nth->ack_seq    = htonl(ntohl(th->seq) + 1);
    tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK;
    if (opts->options & XT_SYNPROXY_OPT_ECN)
        tcp_flag_word(nth) |= TCP_FLAG_ECE;
    nth->doff    = tcp_hdr_size / 4;
    nth->window    = 0;
    nth->check    = 0;
    nth->urg_ptr    = 0;
    //Building tcp options
    synproxy_build_options(nth, opts);
    //The first package is notrack, so there is no ct
    synproxy_send_tcp(net, skb, nskb, skb_nfct(skb),
              IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size);
}

Send syn message to real server

//Send syn message to real server.
//recv_seq is the sending sequence number of ack message, which is 1 more than the sending sequence number of syn message.
//Using this serial number minus 1 as the sending serial number of the syn message sent to the server, the request direction is not the same.
//The serial number needs to be adjusted.
static void
synproxy_send_server_syn(struct net *net,
             const struct sk_buff *skb, const struct tcphdr *th,
             const struct synproxy_options *opts, u32 recv_seq)
{
    struct synproxy_net *snet = synproxy_pernet(net);
    struct sk_buff *nskb;
    struct iphdr *iph, *niph;
    struct tcphdr *nth;
    unsigned int tcp_hdr_size;

    iph = ip_hdr(skb);
    //Calculate the header size of a message
    tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
    nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER,
             GFP_ATOMIC);
    if (nskb == NULL)
        return;
    skb_reserve(nskb, MAX_TCP_HEADER);
    //Building IP header
    niph = synproxy_build_ip(net, nskb, iph->saddr, iph->daddr);
    //Reset Transport Layer Head
    skb_reset_transport_header(nskb);
    nth = skb_put(nskb, tcp_hdr_size);
    nth->source    = th->source;
    nth->dest    = th->dest;
    nth->seq    = htonl(recv_seq - 1);//Subtract the serial number of ack message from 1 as the serial number of syn message
    /* ack_seq is used to relay our ISN to the synproxy hook to initialize
     * sequence number translation once a connection tracking entry exists.
     * It is important to set nth - > ack_seq here, where the client is sent to synproxy's response serial number - 1 (in fact, it is)
     * synproxy Initial Send Sequence Number) is filled in nth-> ack_seq to be recorded during hook processing
     * synproxy Extended control block. Look at the function ipv4_synproxy_hook in detail.
     */
    nth->ack_seq    = htonl(ntohl(th->ack_seq) - 1);
    tcp_flag_word(nth) = TCP_FLAG_SYN;//Setting syn flag
    if (opts->options & XT_SYNPROXY_OPT_ECN)//Setting ECN flag
        tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR;
    nth->doff    = tcp_hdr_size / 4;
    nth->window    = th->window;//Use client windows
    nth->check    = 0;
    nth->urg_ptr    = 0;
    //Construction options
    synproxy_build_options(nth, opts);
    //The flag SYN is set up, where a CT requesting direction is created for the syn agent, and the message creates a ct. Here we pass a & snet - > tmpl - > ct_general
    //The synproxy_send_tcp is given by the ct template, and the message nfct is set. In the future, the ct will be created according to the template at the output hook point.
    //The template adds seqadj and synproxy control blocks.
    synproxy_send_tcp(net, skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
              niph, nth, tcp_hdr_size);
}

static void
synproxy_send_tcp(struct net *net,
          const struct sk_buff *skb, struct sk_buff *nskb,
          struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
          struct iphdr *niph, struct tcphdr *nth,
          unsigned int tcp_hdr_size)
{
    nth->check = ~tcp_v4_check(tcp_hdr_size, niph->saddr, niph->daddr, 0);
    nskb->ip_summed   = CHECKSUM_PARTIAL;
    nskb->csum_start  = (unsigned char *)nth - nskb->head;
    nskb->csum_offset = offsetof(struct tcphdr, check);

    skb_dst_set_noref(nskb, skb_dst(skb));
    nskb->protocol = htons(ETH_P_IP);
    if (ip_route_me_harder(net, nskb, RTN_UNSPEC))
        goto free_nskb;

    if (nfct) {
        nf_ct_set(nskb, (struct nf_conn *)nfct, ctinfo);
        nf_conntrack_get(nfct);
    }

    ip_local_out(net, nskb->sk, nskb);
    return;

free_nskb:
    kfree_skb(nskb);
}

The third handshake between synproxy and client is completed in synproxy-tg4, and the third handshake between synproxy and server is started at the same time. These messages are processed by synproxy-tg4, and the hook function registered by synproxy will not process these three messages.

Hook functions registered with synproxy

static const struct nf_hook_ops ipv4_synproxy_ops[] = {
    {
        .hook        = ipv4_synproxy_hook,
        .pf        = NFPROTO_IPV4,
        .hooknum    = NF_INET_LOCAL_IN,
        .priority    = NF_IP_PRI_CONNTRACK_CONFIRM - 1,//Priority is very low, before CONFIRM.
    },
    {
        .hook        = ipv4_synproxy_hook,
        .pf        = NFPROTO_IPV4,
        .hooknum    = NF_INET_POST_ROUTING,
        .priority    = NF_IP_PRI_CONNTRACK_CONFIRM - 1,//Priority is very low, before CONFIRM.
    },
};

ipv4_synproxy_hook

When synproxy receives an ack from the client, it sends a syn message to the server. At this time, it creates a CT based on the template and the state is NEW. When you go to the NF_INET_POST_ROUTING node, you pass through the ipv4_synproxy_hook hook function.

static unsigned int ipv4_synproxy_hook(void *priv,
                       struct sk_buff *skb,
                       const struct nf_hook_state *nhs)
{
    struct net *net = nhs->net;
    struct synproxy_net *snet = synproxy_pernet(net);
    enum ip_conntrack_info ctinfo;
    struct nf_conn *ct;
    struct nf_conn_synproxy *synproxy;
    struct synproxy_options opts = {};
    const struct ip_ct_tcp *state;
    struct tcphdr *th, _th;
    unsigned int thoff;
    //The first few syn, syn-ack, ack messages do not have ct, so exit directly
    ct = nf_ct_get(skb, &ctinfo);
    if (ct == NULL)
        return NF_ACCEPT;
    //Get the synproxy control block for connection tracking
    synproxy = nfct_synproxy(ct);
    if (synproxy == NULL)
        return NF_ACCEPT;
    //Messages received from lo interface, non-tcp messages exit directly.
    if (nf_is_loopback_packet(skb) ||
        ip_hdr(skb)->protocol != IPPROTO_TCP)
        return NF_ACCEPT;
    //Get the tcp header address
    thoff = ip_hdrlen(skb);
    th = skb_header_pointer(skb, thoff, sizeof(_th), &_th);
    if (th == NULL)
        return NF_DROP;

    state = &ct->proto.tcp;
    switch (state->state) {
    case TCP_CONNTRACK_CLOSE:
        if (th->rst && !test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
            nf_ct_seqadj_init(ct, ctinfo, synproxy->isn -
                              ntohl(th->seq) + 1);
            break;
        }

        if (!th->syn || th->ack ||
            CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL)
            break;

        /* Reopened connection - reset the sequence number and timestamp
         * adjustments, they will get initialized once the connection is
         * reestablished.
         */
        nf_ct_seqadj_init(ct, ctinfo, 0);
        synproxy->tsoff = 0;
        this_cpu_inc(snet->stats->conn_reopened);

        /* fall through */
    case TCP_CONNTRACK_SYN_SENT:
        if (!synproxy_parse_options(skb, thoff, th, &opts))
            return NF_DROP;

        if (!th->syn && th->ack &&
            CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
            /* Keep-Alives are sent with SEG.SEQ = SND.NXT-1,
             * therefore we need to add 1 to make the SYN sequence
             * number match the one of first SYN.
             */
            if (synproxy_recv_client_ack(net, skb, th, &opts,
                             ntohl(th->seq) + 1)) {
                this_cpu_inc(snet->stats->cookie_retrans);
                consume_skb(skb);
                return NF_STOLEN;
            } else {
                return NF_DROP;
            }
        }
        //The th - > ack_seq of the syn message sent to the server was filled with the initial sending sequence number of synproxy when the message was filled in.
        synproxy->isn = ntohl(th->ack_seq);
        //Here's the record.
        if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
            synproxy->its = opts.tsecr;//Response timestamp is the initial timestamp of syn+ack message

        nf_conntrack_event_cache(IPCT_SYNPROXY, ct);
        break;
    case TCP_CONNTRACK_SYN_RECV://The syn+ack message sent by the server is in this state after passing through the output node.
        if (!th->syn || !th->ack)
            break;
        //Parsing options
        if (!synproxy_parse_options(skb, thoff, th, &opts))
            return NF_DROP;
        //This code is inappropriate to put here, which will result in an error in the ack message timestamp sent to the server.
        //It should be placed after synproxy_send_server_ack.
        if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP) {
            //Record time stamp difference
            synproxy->tsoff = opts.tsval - synproxy->its;
            nf_conntrack_event_cache(IPCT_SYNPROXY, ct);
        }

        opts.options &= ~(XT_SYNPROXY_OPT_MSS |
                  XT_SYNPROXY_OPT_WSCALE |
                  XT_SYNPROXY_OPT_SACK_PERM);
        //Optis.tsecr is a combination timestamp of synproxy
        //opts.tsval is the syn+ack timestamp on the server side, where you need to respond to an ACK to the server
        swap(opts.tsval, opts.tsecr);
        //Send a reply message to the server. The timestamp of the message will be modified. Here is a bug.
        //Actually, the message does not need to be modified.
        synproxy_send_server_ack(net, state, skb, th, &opts);
        //Initialize the sequence number to adjust the context. This is the connection tracking in the direction of the response.
        //You need to adjust the serial number that the server sends to the client, because it's the first handshake.
        //The client records the initial sending sequence number provided by synproxy. Here's a comparison between synproxy and real server
        //The difference of the initial sending sequence number.
        //At the same time, we need to adjust the serial number of the response sent by the client to the server. The difference is recorded here.
        //Here th - > SEQ is the initial serial number sent by the server to the client.
        nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq));
        nf_conntrack_event_cache(IPCT_SEQADJ, ct);
        //Send an ack message to the client for synchronization timestamp, window scaling factor and other options.
        //Swap it back again, instead of the timestamp sent to server, where the message will go out of output again
        //Then, post-routing enters the function again and calls synproxy_tstamp_adjust ment.
        //Timestamp adjustment.
        swap(opts.tsval, opts.tsecr);
        synproxy_send_client_ack(net, skb, th, &opts);

        consume_skb(skb);
        return NF_STOLEN;
    default:
        break;
    }
    //The tcp window update message sent to the client will go here to adjust the timestamp.
    synproxy_tstamp_adjust(skb, thoff, th, ct, ctinfo, synproxy);
    return NF_ACCEPT;
}

Timestamp adjustment

Timestamp has the following functions (from the network, see details) https://blog.csdn.net/u011130...):

(1) Round Trip Time Measurement

RTT is very important for congestion control (such as calculating how long to retransmit data). Typically, the method of measuring RTT is to send a message and record the sending time t1; when the confirmation of the message is received, the recording time t2, t2 - t1 can get RTT. However, TCP uses a delayed acknowledgment mechanism, and ACK may be lost, making it impossible to determine which message to respond to when receiving ACK.

(2) Quick Revolving of Sequence Number

TCP determines whether the data is new or old by checking whether the serial number of the data is in the range of sun.una to sun.una+2**31, while the total size of the serial number space is 2*32, or about 4.29G. In a 10,000-megabyte LAN, 4.29G byte data can only be rounded back in a few seconds, at which time TCP can not accurately judge whether the data is old or new.

(3) Option information for SYN Cookie

When TCP opens SYN Cookie, because Server does not save connections after receiving SYN requests, the options carried in SYN packages (WScale, SACK) cannot be saved. When SYN Cookie is verified and a new connection is established, these options cannot be opened.

The above problem can be solved by using the timestamp option.

Problem (1) Solution: Write the sending time to the timestamp option when sending a message. When receiving an ACK message, the echo value of the timestamp option can tell when it confirms the message sent. A RTT can be obtained by subtracting the echo time from the current time.

Problem (2) Solution: When receiving a message, record the time stamp value in the option. When receiving the next message, compare the time stamp with the last one. The speed of timestamp wrapping is only related to the clock frequency of the host. Linux uses local clock counts (jiffies) as the value of the timestamp. Assuming that the clock count plus 1 takes 1 ms, it takes about 24.8 days to go around half. As long as the message's lifetime is less than this value, it will not make a mistake to judge the old and new data. This function is called PAWS (Protect Against Wrapped Sequence numbers). This can solve the problem (2), but with the increase of hardware clock frequency, the speed of timestamp wrapping will also accelerate, and the method of using timestamp to solve the problem of sequence number wrapping will encounter difficulties sooner or later.

Problem (3) Solution: Encoding WScale and SACK option information into 32 bit timestamp value, ACK message will be received when establishing connection. Decoding the echo information of message timestamp option can restore WScale and SACK information (see 3.6 SYN Cookie for this part).

Because the initial timestamp received by the client is the syn-cookie combination timestamp sent by synproxy to the client. The subsequent message received by the client is a timestamp from the server side. There is a difference between the two timestamps (which may be smaller than the initial timestamp or larger than the initial timestamp). The timestamp of the first syn message received by the server is the timestamp in the timestamp option of the ack message of the client.

Message serial number	objective	objective	Message type	time stamp	Time error echo
1	client	synproxy	syn	client-timestam1	0
2	synproxy	client	syn+ack	syn-cookie-timestamp	client-timestam1
3	client	synproxy	ack	client-timestam2	syn-cookie-timestamp
4	synproxy	server	syn	client-timestam2	syn-cookie-timestamp
5	server	synproxy	syn+ack	server-timestam1	client-timestam2
6	synproxy	server	ack	client-timestam2	server-timestam1 (The timestamp will be adjusted, actually It doesn't need to be modified. Here's a bug.
7	synproxy	client	ack(tcp window update)	server-timestam1 (The timestamp will be adjusted)	client-timestam2

Generally speaking, the request direction needs to modify the echo timestamp and the reply direction needs to modify the send timestamp.

struct nf_conn_synproxy {
    u32    isn;//Response direction, synproxy's initial sending sequence number
    u32    its;//Initial timestamp, syn-cookie serial number calculated by synproxy, is passed through ack-seq.
    u32    tsoff;//The time stamp difference is obtained by subtracting its from the send timestamp when the sever syn+ack is received.
};

unsigned int synproxy_tstamp_adjust(struct sk_buff *skb,
                    unsigned int protoff,
                    struct tcphdr *th,
                    struct nf_conn *ct,
                    enum ip_conntrack_info ctinfo,
                    const struct nf_conn_synproxy *synproxy)
{
    unsigned int optoff, optend;
    __be32 *ptr, old;

    if (synproxy->tsoff == 0)
        return 1;

    optoff = protoff + sizeof(struct tcphdr);
    optend = protoff + th->doff * 4;

    if (!skb_make_writable(skb, optend))
        return 0;

    while (optoff < optend) {
        unsigned char *op = skb->data + optoff;

        switch (op[0]) {
        case TCPOPT_EOL:
            return 1;
        case TCPOPT_NOP:
            optoff++;
            continue;
        default:
            if (optoff + 1 == optend ||
                optoff + op[1] > optend ||
                op[1] < 2)
                return 0;
            if (op[0] == TCPOPT_TIMESTAMP &&
                op[1] == TCPOLEN_TIMESTAMP) {
                if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) {//Response direction, need to modify the send event stamp
                    ptr = (__be32 *)&op[2];
                    old = *ptr;
                    *ptr = htonl(ntohl(*ptr) -
                             synproxy->tsoff);
                } else {//Request direction requires modification of response timestamp
                    ptr = (__be32 *)&op[6];
                    old = *ptr;
                    *ptr = htonl(ntohl(*ptr) +
                             synproxy->tsoff);
                }
                inet_proto_csum_replace4(&th->check, skb,
                             old, *ptr, false);
                return 1;
            }
            optoff += op[1];
        }
    }
    return 1;
}

seqadj adjustment

Because synproxy uses the same serial number as client when sending synproxy message to server, seqadj only needs to modify the response sequence number of request direction and the transmission sequence number of response direction.

When synproxy establishes a connection, synproxy is unable to determine the initial serial number of sever, and synproxy uses syncookie to generate the initial serial number, which results in the inconsistency of the initial serial number synproxy sends to client and the need for seqadj.

Synproxy calls the function nf_ct_seqadj_init (ct, ctinfo, synproxy-> is-ntohl (th-> seq) when it receives sever's syn+ack message; initializes the timestamp to expand the control block context, and then adjusts it in the ipv4_confirm function.

int nf_ct_seqadj_init(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
              s32 off)
{
    enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
    struct nf_conn_seqadj *seqadj;
    struct nf_ct_seqadj *this_way;

    if (off == 0)
        return 0;

    set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);

    seqadj = nfct_seqadj(ct);
    this_way = &seqadj->seq[dir];
    this_way->offset_before     = off;//The two values are the same, which is different from that caused by alg.
    this_way->offset_after     = off;
    return 0;
}

Role of nf_conntrack_tcp_loop

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

Setting this value means that the connection tracking is required to strictly verify the state change of tcp, which will make the packets that do not conform to the order of three handshakes not create CT (in the user state, it is the packets with the status of INVALID). It needs to be used in conjunction with the following command, which prevents syn packages from tracing connections, thereby breaking the three handshakes of connection tracing.

iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK

Since the syn message does not create a connection trace, the subsequent syn+ack message enters the connection trace in the state of new. The following functions are called:

/* Called when a new connection for this protocol found. */
static bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
            unsigned int dataoff, unsigned int *timeouts)
{
    ...

    /* Don't need lock here: this conntrack not in circulation yet Get the next state*/
    new_state = tcp_conntracks[0][get_conntrack_index(th)][TCP_CONNTRACK_NONE];
    //New? State is set to TCP? Contrack? Max
    /* Invalid: delete conntrack Inappropriate state, direct abandonment of connection tracking */
    if (new_state >= TCP_CONNTRACK_MAX) {
        pr_debug("nf_ct_tcp: invalid new deleting.\n");
        return false;
    }

    if (new_state == TCP_CONNTRACK_SYN_SENT) {/* Headpack, initialize relevant information */
        memset(&ct->proto.tcp, 0, sizeof(ct->proto.tcp));
        /* SYN packet */
        ct->proto.tcp.seen[0].td_end =
            segment_seq_plus_len(ntohl(th->seq), skb->len,
                         dataoff, th);
        ct->proto.tcp.seen[0].td_maxwin = ntohs(th->window);
        if (ct->proto.tcp.seen[0].td_maxwin == 0)
            ct->proto.tcp.seen[0].td_maxwin = 1;
        ct->proto.tcp.seen[0].td_maxend =
            ct->proto.tcp.seen[0].td_end;

        tcp_options(skb, dataoff, th, &ct->proto.tcp.seen[0]);
    } else if (tn->tcp_loose == 0) {/* Strictly, no passage is allowed. Here is the function of nf_conntrack_tcp_loose== 0. */
        /* Don't try to pick up connections. */
        return false;//Return false directly without connection tracking.
    } else 
    ...
    return true;
}

synproxy Connection Tracking Processing

When synproxy sends syn message to server, there are the following calls. In this case, a template CT is set for the message. This is the biggest difference from other cases, which directly affects the creation of the message when OUTPUT node enters connection tracking:

//The flag SYN is set up, where a CT requesting direction is created for the syn agent, and the message creates a ct.
synproxy_send_tcp(net, skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
                  niph, nth, tcp_hdr_size);

unsigned int
nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum,
        struct sk_buff *skb)
{
    const struct nf_conntrack_l3proto *l3proto;
    const struct nf_conntrack_l4proto *l4proto;
    struct nf_conn *ct, *tmpl;
    enum ip_conntrack_info ctinfo;
    unsigned int *timeouts;
    unsigned int dataoff;
    u_int8_t protonum;
    int ret;
    //For this syn message, there is a template
    tmpl = nf_ct_get(skb, &ctinfo);
    if (tmpl || ctinfo == IP_CT_UNTRACKED) {
        /* Previously seen (loopback or untracked)?  Ignore. */
        if ((tmpl && !nf_ct_is_template(tmpl)) ||//synproxy sets the template flag, so it won't go in.
             ctinfo == IP_CT_UNTRACKED) {
            NF_CT_STAT_INC_ATOMIC(net, ignore);
            return NF_ACCEPT;
        }
        skb->_nfct = 0;//Qing 0
    }

    ...

    /* It may be an special packet, error, unclean...
     * inverse of the return code tells to the netfilter
     * core what to do with the packet. */
    if (l4proto->error != NULL) {
        ret = l4proto->error(net, tmpl, skb, dataoff, pf, hooknum);
        if (ret <= 0) {
            NF_CT_STAT_INC_ATOMIC(net, error);
            NF_CT_STAT_INC_ATOMIC(net, invalid);
            ret = -ret;
            goto out;
        }
        /* ICMP[v6] protocol trackers may assign one conntrack. */
        if (skb->_nfct)
            goto out;
    }
repeat:
    //Find the corresponding connection tracking module. If it is the first package, it will be created.
    ret = resolve_normal_ct(net, tmpl, skb, dataoff, pf, protonum,
                l3proto, l4proto);
    ...

    return ret;
}

/* On success, returns 0, sets skb->_nfct | ctinfo */
static int
resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
          struct sk_buff *skb,
          unsigned int dataoff,
          u_int16_t l3num,
          u_int8_t protonum,
          const struct nf_conntrack_l3proto *l3proto,
          const struct nf_conntrack_l4proto *l4proto)
{
    const struct nf_conntrack_zone *zone;
    struct nf_conntrack_tuple tuple;
    struct nf_conntrack_tuple_hash *h;
    enum ip_conntrack_info ctinfo;
    struct nf_conntrack_zone tmp;
    struct nf_conn *ct;
    u32 hash;
    ... 

    /* look for tuple match Get zone from template */
    zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
    hash = hash_conntrack_raw(&tuple, net);
    h = __nf_conntrack_find_get(net, zone, &tuple, hash);
    if (!h) {
        //Initialization based on templates
        h = init_conntrack(net, tmpl, &tuple, l3proto, l4proto,
                   skb, dataoff, hash);
        if (!h)
            return 0;
        if (IS_ERR(h))
            return PTR_ERR(h);
    }
    ...
    return 0;
}

/* Allocate a new conntrack: we return -ENOMEM if classification
   failed due to stress.  Otherwise it really is unclassifiable. */
static noinline struct nf_conntrack_tuple_hash *
init_conntrack(struct net *net, struct nf_conn *tmpl,
           const struct nf_conntrack_tuple *tuple,
           const struct nf_conntrack_l3proto *l3proto,
           const struct nf_conntrack_l4proto *l4proto,
           struct sk_buff *skb,
           unsigned int dataoff, u32 hash)
{
    struct nf_conn *ct;
    struct nf_conn_help *help;
    struct nf_conntrack_tuple repl_tuple;
    struct nf_conntrack_ecache *ecache;
    struct nf_conntrack_expect *exp = NULL;
    const struct nf_conntrack_zone *zone;
    struct nf_conn_timeout *timeout_ext;
    struct nf_conntrack_zone tmp;
    unsigned int *timeouts;

    if (!nf_ct_invert_tuple(&repl_tuple, tuple, l3proto, l4proto)) {
        pr_debug("Can't invert tuple.\n");
        return NULL;
    }

    zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
    //Assignment connection tracking
    ct = __nf_conntrack_alloc(net, zone, tuple, &repl_tuple, GFP_ATOMIC,
                  hash);
    if (IS_ERR(ct))
        return (struct nf_conntrack_tuple_hash *)ct;
    //Template must add synproxy extension control block
    if (!nf_ct_add_synproxy(ct, tmpl)) {
        nf_conntrack_free(ct);
        return ERR_PTR(-ENOMEM);
    }
    //Timeout Using Template Timeout
    timeout_ext = tmpl ? nf_ct_timeout_find(tmpl) : NULL;
    if (timeout_ext) {
        timeouts = nf_ct_timeout_data(timeout_ext);
        if (unlikely(!timeouts))
            timeouts = l4proto->get_timeouts(net);
    } else {
        timeouts = l4proto->get_timeouts(net);
    }

    /* Protocol-related initialization, for tcp state transition checks, such as the first package is not syn package can not build a connection */
    if (!l4proto->new(ct, skb, dataoff, timeouts)) {
        nf_conntrack_free(ct);
        pr_debug("can't track with proto module\n");
        return NULL;
    }

    if (timeout_ext)
        nf_ct_timeout_ext_add(ct, rcu_dereference(timeout_ext->timeout),
                      GFP_ATOMIC);

    nf_ct_acct_ext_add(ct, GFP_ATOMIC);
    nf_ct_tstamp_ext_add(ct, GFP_ATOMIC);
    nf_ct_labels_ext_add(ct);

    ecache = tmpl ? nf_ct_ecache_find(tmpl) : NULL;
    nf_ct_ecache_ext_add(ct, ecache ? ecache->ctmask : 0,
                 ecache ? ecache->expmask : 0,
                 GFP_ATOMIC);
    ...

    return &ct->tuplehash[IP_CT_DIR_ORIGINAL];
}
//When synproxy control blocks are added, seqadj extension control blocks are added
static inline bool nf_ct_add_synproxy(struct nf_conn *ct,
                      const struct nf_conn *tmpl)
{
    if (tmpl && nfct_synproxy(tmpl)) {
        if (!nfct_seqadj_ext_add(ct))
            return false;

        if (!nfct_synproxy_ext_add(ct))
            return false;
    }

    return true;
}

Computation and Function of syn-cookie

In this regard, please check the following blog, written in great detail:

https://blog.csdn.net/u011130...

Experiment

On a device running Ubuntu 18.04:

#Running nginx on a terminal with an ip of 172.17.0.2
admin@ubuntu:~$ sudo docker run -it --name synproxynginx nginx bash
//Implementation of nginx

#curl is executed on another terminal with an ip of 172.17.0.3
admin@ubuntu:~$ sudo docker run -it --name synproxyclient ubuntu bash
//Execute cur 172.17.0.2

#Execute on the third terminal
admin@ubuntu:~$ sudo iptables -t raw -A PREROUTING -p tcp --dport 80 --syn -j NOTRACK
admin@ubuntu:~$ sudo iptables -A FORWARD -i eth0 -p tcp --dport 80 -m state --state UNTRACKED,\
                        INVALID -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

admin@ubuntu:~$ sudo sysctl -w net.netfilter.nf_conntrack_tcp_loose=0

#Analysis of Packet Grabbing on Two Containers Connected with Docker 0 Bridge

Topics: Linux iptables Ubuntu sudo

Programmer Think