发送时保守,接收时宽容
基于 packetdrill TCP 三次握手脚本,通过构造模拟服务器端场景,研究测试接收缓存自动调整现象。
基础脚本
# cat tcp_rcv_000.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1460>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4#
TCP 接收缓存
TCP 接收缓存是操作系统内核为每个 TCP 连接维护的一块内存区域,用来暂存从对端收到的数据。在数据抵达时,内核会先把它们放入接收缓存:如果数据按序到达,就进入接收队列,等待应用层取走;如果有乱序数据,还会先存放在乱序队列中,等缺失的片段补齐后再交给应用层。接收缓存的大小通常由系统参数或 socket 选项(如 SO_RCVBUF)控制,它不会直接限制 TCP 报文的大小,但会影响能同时积压多少未读数据。
接收缓存和 TCP 中的接收窗口紧密相关:窗口大小 = 缓存总大小 − 已占用空间。接收端在 ACK 报文里通告这个窗口大小,告诉发送端自己还能接收多少数据,从而实现流量控制。如果应用层处理很慢,缓存被占满,窗口会收缩甚至为零,发送端就会暂停发送,直到应用层读取数据腾出空间。
Linux 系统参数中,net.ipv4.tcp_rmem (包含 min, default, max):其中最小值定义了即使在内存压力下,每个 TCP socket 也能保证拥有的接收缓冲区大小,默认值定义了单个 TCP socket 接收缓冲区的初始大小,内核会根据网络状况自动调整,最大值定义了每个 TCP socket 接收缓冲区能被自动调整到的上限。内核的自动调整功能(由 net.ipv4.tcp_moderate_rcvbuf 控制,通常默认为 1 开启)会在这个范围内动态调整缓冲区大小,以适应网络状况。
# sysctl -a|grep -E 'tcp_rmem|rmem_max|moder'net.core.rmem_max = 212992net.ipv4.tcp_moderate_rcvbuf = 1net.ipv4.tcp_rmem = 4096 131072 6291456#
基础测试
以基础脚本直接测试,如下,在 SYN/ACK 中可知 win 值为 64240 。
# packetdrill tcp_rcvbuf_000.pkt## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes22:40:49.897799 ? In IP 192.0.2.1.58155 > 192.168.144.214.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 022:40:49.897831 ? Out IP 192.168.144.214.8080 > 192.0.2.1.58155: Flags [S.], seq 3089256822, ack 1, win 64240, options [mss 1460], length 022:40:49.907963 ? In IP 192.0.2.1.58155 > 192.168.144.214.8080: Flags [.], ack 1, win 10000, length 022:40:49.908117 ? Out IP 192.168.144.214.8080 > 192.0.2.1.58155: Flags [F.], seq 1, ack 1, win 64240, length 022:40:49.908134 ? In IP 192.0.2.1.58155 > 192.168.144.214.8080: Flags [R.], seq 1, ack 1, win 10000, length 0#
简单来说,win 值是在 tcp_select_initial_window() 中 *rcv_wnd 值所决定,如下:
space = rounddown(space, mss),值为 64240
(*rcv_wnd) = min_t(u32, space, U16_MAX),值为 64240
voidtcp_init_sock(struct sock *sk){...WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]));...}staticinlineinttcp_full_space(conststruct sock *sk){return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf));}staticinlineinttcp_win_from_space(conststruct sock *sk, int space){int tcp_adv_win_scale = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);return tcp_adv_win_scale <= 0 ?(space>>(-tcp_adv_win_scale)) :space - (space>>tcp_adv_win_scale);}voidtcp_openreq_init_rwin(struct request_sock *req,const struct sock *sk_listener,const struct dst_entry *dst){struct inet_request_sock *ireq = inet_rsk(req);const struct tcp_sock *tp = tcp_sk(sk_listener);int full_space = tcp_full_space(sk_listener);.../* tcp_full_space because it is guaranteed to be the first packet */tcp_select_initial_window(sk_listener, full_space,mss - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),&req->rsk_rcv_wnd,&req->rsk_window_clamp,ireq->wscale_ok,&rcv_wscale,rcv_wnd);...}/* Determine a window scaling and initial window to offer.* Based on the assumption that the given amount of space* will be offered. Store the results in the tp structure.* NOTE: for smooth operation initial space offering should* be a multiple of mss if possible. We assume here that mss >= 1.* This MUST be enforced by all callers.*/voidtcp_select_initial_window(conststruct sock *sk, int __space, __u32 mss,__u32 *rcv_wnd, __u32 *window_clamp,int wscale_ok, __u8 *rcv_wscale,__u32 init_rcv_wnd){unsigned int space = (__space < 0 ? 0 : __space);/* If no clamp set the clamp to the max possible scaled window */if (*window_clamp == 0)(*window_clamp) = (U16_MAX << TCP_MAX_WSCALE);space = min(*window_clamp, space);/* Quantize space offering to a multiple of mss if possible. */if (space > mss)space = rounddown(space, mss);/* NOTE: offering an initial window larger than 32767* will break some buggy TCP stacks. If the admin tells us* it is likely we could be speaking with such a buggy stack* we will truncate our initial window offering to 32K-1* unless the remote has sent us a window scaling option,* which we interpret as a sign the remote TCP is not* misinterpreting the window field as a signed quantity.*/if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))(*rcv_wnd) = min(space, MAX_TCP_WINDOW);else(*rcv_wnd) = min_t(u32, space, U16_MAX);if (init_rcv_wnd)*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);*rcv_wscale = 0;if (wscale_ok) {/* Set window scaling on max possible window */space = max_t(u32, space, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));space = max_t(u32, space, READ_ONCE(sysctl_rmem_max));space = min_t(u32, space, *window_clamp);*rcv_wscale = clamp_t(int, ilog2(space) - 15,0, TCP_MAX_WSCALE);}/* Set the clamp no higher than max representable value */(*window_clamp) = min_t(__u32, U16_MAX << (*rcv_wscale), *window_clamp);}EXPORT_SYMBOL(tcp_select_initial_window);
通过修改 tcp_rmem 默认值,降低初始 Window 值。
# sysctl -a|grep -E tcp_rmemnet.ipv4.tcp_rmem = 4096 131072 6291456## sysctl -q net.ipv4.tcp_rmem="4096 6000 6291456"#
执行脚本,通过 tcpdump 捕获数据包如下,可以看到 SYN/ACK 中的 Win 变为 2920。
# packetdrill tcp_rcvbuf_000.pkt## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes10:25:46.937781 tun0 In IP 192.0.2.1.52843 > 192.168.88.219.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 010:25:46.937809 tun0 Out IP 192.168.88.219.8080 > 192.0.2.1.52843: Flags [S.], seq 1671523994, ack 1, win 2920, options [mss 1460], length 010:25:46.948024 ? In IP 192.0.2.1.52843 > 192.168.88.219.8080: Flags [.], ack 1, win 10000, length 010:25:46.948492 ? Out IP 192.168.88.219.8080 > 192.0.2.1.52843: Flags [F.], seq 1, ack 1, win 2920, length 010:25:46.948551 ? In IP 192.0.2.1.52843 > 192.168.88.219.8080: Flags [R.], seq 1, ack 1, win 10000, length 0#
继续修改脚本,增加数据段,如下。
# cat tcp_rcvbuf_001.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1460>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 < P. 1:1461(1460) ack 1 win 10000+0.01 < P. 1461:2921(1460) ack 1 win 10000+0.01 read(4,...,2920) = 2920+0.01 < P. 2921:4381(1460) ack 1 win 10000+0.01 < P. 4381:5841(1460) ack 1 win 10000+0.01 read(4,...,2920) = 2920+0.01 < P. 5841:7301(1460) ack 1 win 10000+0.01 < P. 7301:8761(1460) ack 1 win 10000+0.01 read(4,...,2920) = 2920+0 `sleep 1`
通过 tcpdump 捕获数据包如下,可见服务器端发送的 Win 值变化,由 2920 -> 5840 -> 8760,自动调整逐步增大。
如之前介绍,内核的自动调整功能(由 net.ipv4.tcp_moderate_rcvbuf 控制,通常默认为 1 开启)会在这个范围内动态调整缓冲区大小,以适应网络状况。
# packetdrill tcp_rcvbuf_001.pkt## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes10:34:27.997871 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 010:34:27.998131 tun0 Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [S.], seq 1834648171, ack 1, win 2920, options [mss 1460], length 010:34:28.008596 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [.], ack 1, win 10000, length 010:34:28.018674 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [P.], seq 1:1461, ack 1, win 10000, length 1460: HTTP10:34:28.018704 tun0 Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [.], ack 1461, win 1460, length 010:34:28.028675 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [P.], seq 1461:2921, ack 1, win 10000, length 1460: HTTP10:34:28.038740 tun0 Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [.], ack 2921, win 2920, length 010:34:28.048790 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [P.], seq 2921:4381, ack 1, win 10000, length 1460: HTTP10:34:28.058835 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [P.], seq 4381:5841, ack 1, win 10000, length 1460: HTTP10:34:28.068985 tun0 Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [.], ack 5841, win 2920, length 010:34:28.079130 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [P.], seq 5841:7301, ack 1, win 10000, length 1460: HTTP10:34:28.079206 tun0 Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [.], ack 7301, win 5840, length 010:34:28.089037 tun0 In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [P.], seq 7301:8761, ack 1, win 10000, length 1460: HTTP10:34:28.089054 tun0 Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [.], ack 8761, win 8760, length 010:34:29.102294 ? Out IP 192.168.192.52.8080 > 192.0.2.1.42373: Flags [F.], seq 1, ack 8761, win 8760, length 010:34:29.102348 ? In IP 192.0.2.1.42373 > 192.168.192.52.8080: Flags [R.], seq 8761, ack 1, win 10000, length 0#
尝试修改 net.ipv4.tcp_moderate_rcvbuf 值为 0 ,关闭缓冲区自动调整。
# sysctl -a|grep rcvbufnet.ipv4.tcp_moderate_rcvbuf = 1# sysctl -q net.ipv4.tcp_moderate_rcvbuf=0#
执行脚本,可以发现之前实验中,服务器端发送的 Win 值变化,由 2920 -> 5840 -> 8760,自动调整逐步增大的现象不再出现,服务器端发送的 Win 值维持在 2920 大小。
# packetdrill tcp_rcvbuf_001.pkt## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes13:28:44.937784 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [S], seq 0, win 10000, options [mss 1460], length 013:28:44.937812 tun0 Out IP 192.168.8.86.8080 > 192.0.2.1.32839: Flags [S.], seq 818106914, ack 1, win 2920, options [mss 1460], length 013:28:44.948051 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [.], ack 1, win 10000, length 013:28:44.958142 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [P.], seq 1:1461, ack 1, win 10000, length 1460: HTTP13:28:44.958164 tun0 Out IP 192.168.8.86.8080 > 192.0.2.1.32839: Flags [.], ack 1461, win 1460, length 013:28:44.968245 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [P.], seq 1461:2921, ack 1, win 10000, length 1460: HTTP13:28:44.978364 tun0 Out IP 192.168.8.86.8080 > 192.0.2.1.32839: Flags [.], ack 2921, win 2920, length 013:28:44.988399 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [P.], seq 2921:4381, ack 1, win 10000, length 1460: HTTP13:28:44.998399 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [P.], seq 4381:5841, ack 1, win 10000, length 1460: HTTP13:28:45.008454 tun0 Out IP 192.168.8.86.8080 > 192.0.2.1.32839: Flags [.], ack 5841, win 2920, length 013:28:45.018483 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [P.], seq 5841:7301, ack 1, win 10000, length 1460: HTTP13:28:45.028501 tun0 In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [P.], seq 7301:8761, ack 1, win 10000, length 1460: HTTP13:28:45.038659 tun0 Out IP 192.168.8.86.8080 > 192.0.2.1.32839: Flags [.], ack 8761, win 2920, length 013:28:46.041518 ? Out IP 192.168.8.86.8080 > 192.0.2.1.32839: Flags [F.], seq 1, ack 8761, win 2920, length 013:28:46.041571 ? In IP 192.0.2.1.32839 > 192.168.8.86.8080: Flags [R.], seq 8761, ack 1, win 10000, length 0#
往期推荐
推荐站内搜索:最好用的开发软件、免费开源系统、渗透测试工具云盘下载、最新渗透测试资料、最新黑客工具下载……




还没有评论,来说两句吧...