任何事都没有表面看起来那么简单
基于 packetdrill TCP 三次握手脚本,通过构造模拟服务器端场景,研究测试 TCP TLP 尾重传行为。
基础脚本
# cat tcp_tail_loss_probe_000.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
#
TCP Tail Loss Probe
TCP Tail Loss Probe (TLP) 是一种用于改进 TCP 在发生尾部丢包时的性能的机制,简单来说,就是在检测到可能发生尾部丢包时,也就是当连接一段时间(称为 Probe Timeout,即 PTO,且 PTO ≤ RTO)未收到 ACK 时,强制发送一个探测包(还没有收到 ACK 确认的数据包里面的最后一个数据包或者未发送的新数据包),目的是快速获取接收端的 SACK 确认信息,从而提前触发丢失数据包的快速重传,避免等待 RTO 超时,最终提升 TCP 在尾部丢包场景下的性能。
Linux 系统下相关的 tcp 参数为 tcp_early_retrans,默认值为 3,说明如下。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
基础测试
首先模拟基础的 TLP 尾重传现象,写入两个 1000 字节的数据段,脚本如下。
# cat tcp_tail_loss_probe_001.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000, nop, nop, sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
+0.01 write(4, ..., 1000) = 1000
+0 write(4, ..., 1000) = 1000
+0 `sleep 1`
#
通过 tcpdump 捕获数据包如下,发送端发出 Seq 1:1001 和 Seq 1001:2001 两个数据包时,并未得到 ACK 确认,在 PTO 超时,也就是 30ms+ 后,进行了 TLP 尾重传,因此时没有新数据发送,所以使用的是未得到 ACK 确认的数据包中的最后一个数据包,即 Seq 1001:2001 数据包。
之后仍未得到 ACK 确认的情况下,发生了超时重传,此时重传的是 Seq 1:1001 数据包。
# packetdrill tcp_tail_loss_probe_001.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:35:35.478962 tun0 In IP 192.0.2.1.43251 > 192.168.192.249.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:35:35.478989 tun0 Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [S.], seq 976019219, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:35:35.489138 tun0 In IP 192.0.2.1.43251 > 192.168.192.249.8080: Flags [.], ack 1, win 10000, length 0
22:35:35.499338 tun0 Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:35:35.499354 tun0 Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP
22:35:35.530689 tun0 Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP
22:35:35.746685 tun0 Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:35:36.174654 tun0 Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:35:36.501711 ? Out IP 192.168.192.249.8080 > 192.0.2.1.43251: Flags [F.], seq 2001, ack 1, win 64240, length 0
22:35:36.501736 ? In IP 192.0.2.1.43251 > 192.168.192.249.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
#
此时如果关闭 tcp_early_retrans ,也就是修改值为 0,即关闭了 TLP。
# sysctl -q net.ipv4.tcp_early_retrans=0
#
再次运行 tcp_tail_loss_probe_001.pkt 时,通过 tcpdump 捕获数据包,会发现之前测试出现的 TLP 尾重传 Seq 1001:2001 数据包没有再出现,而是最后 RTO 超时重传了 Seq 1:1001 数据包。
# packetdrill tcp_tail_loss_probe_001.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:44:38.346955 tun0 In IP 192.0.2.1.60927 > 192.168.64.210.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:44:38.346983 tun0 Out IP 192.168.64.210.8080 > 192.0.2.1.60927: Flags [S.], seq 3284483134, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:44:38.357064 tun0 In IP 192.0.2.1.60927 > 192.168.64.210.8080: Flags [.], ack 1, win 10000, length 0
22:44:38.367316 tun0 Out IP 192.168.64.210.8080 > 192.0.2.1.60927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:44:38.367378 tun0 Out IP 192.168.64.210.8080 > 192.0.2.1.60927: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP
22:44:38.582821 tun0 Out IP 192.168.64.210.8080 > 192.0.2.1.60927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:44:39.022662 tun0 Out IP 192.168.64.210.8080 > 192.0.2.1.60927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:44:39.371002 tun0 Out IP 192.168.64.210.8080 > 192.0.2.1.60927: Flags [F.], seq 2001, ack 1, win 64240, length 0
22:44:39.371027 tun0 In IP 192.0.2.1.60927 > 192.168.64.210.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
#
另外 TLP 尾重传是需要 SACK 开启下才支持的,如果还原 tcp_early_retrans 值为 3 后,修改脚本如下。
# cat tcp_tail_loss_probe_002.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
+0.01 write(4, ..., 1000) = 1000
+0 write(4, ..., 1000) = 1000
+0 `sleep 1`
#
通过 tcpdump 捕获数据包,会发现 TLP 尾重传 Seq 1001:2001 数据包不会出现,同样是最后 RTO 超时重传了 Seq 1:1001 数据包,因此在该 TCP 连接中不支持 SACK。
# packetdrill tcp_tail_loss_probe_002.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:48:21.226949 tun0 In IP 192.0.2.1.52729 > 192.168.169.59.8080: Flags [S], seq 0, win 10000, options [mss 1000], length 0
22:48:21.226976 tun0 Out IP 192.168.169.59.8080 > 192.0.2.1.52729: Flags [S.], seq 3640135527, ack 1, win 64240, options [mss 1460], length 0
22:48:21.237197 tun0 In IP 192.0.2.1.52729 > 192.168.169.59.8080: Flags [.], ack 1, win 10000, length 0
22:48:21.247371 tun0 Out IP 192.168.169.59.8080 > 192.0.2.1.52729: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:48:21.247385 tun0 Out IP 192.168.169.59.8080 > 192.0.2.1.52729: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP
22:48:21.462666 tun0 Out IP 192.168.169.59.8080 > 192.0.2.1.52729: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:48:21.902710 tun0 Out IP 192.168.169.59.8080 > 192.0.2.1.52729: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:48:22.260962 tun0 Out IP 192.168.169.59.8080 > 192.0.2.1.52729: Flags [F.], seq 2001, ack 1, win 64240, length 0
22:48:22.260989 tun0 In IP 192.0.2.1.52729 > 192.168.169.59.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
#
以上还有一个比较隐蔽的情况是,当写入数据段只有一个时,如下。
# cat tcp_tail_loss_probe_003.pkt
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000, nop, nop, sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
+0.01 write(4, ..., 1000) = 1000
+0 `sleep 1`
#
通过 tcpdump 捕获数据包,会发现一个现象,第二个 Seq 1:1001 重传数据包与原始数据包 Seq 1:1001 间隔时间是 215ms,而第三个 Seq 1:1001 重传数据包与第二个 Seq 1:1001 数据包间隔时间也是 215ms,如果都以 RTO 超时重传角度来看,第三个超时重传的间隔时间并未翻倍。
因此在这个测试中,第二个 Seq 1:1001 重传数据包实际上并不是第一个 RTO 超时重传数据包,而是一个 TLP 尾重传数据包,而在之后第三个 Seq 1:1001 重传数据包才是第一个 RTO 超时重传数据包,之后再不断翻倍。
# packetdrill tcp_tail_loss_probe_003.pkt
#
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:57:22.218975 tun0 In IP 192.0.2.1.33927 > 192.168.139.53.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:57:22.219006 tun0 Out IP 192.168.139.53.8080 > 192.0.2.1.33927: Flags [S.], seq 2153236276, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:57:22.229081 tun0 In IP 192.0.2.1.33927 > 192.168.139.53.8080: Flags [.], ack 1, win 10000, length 0
22:57:22.239259 tun0 Out IP 192.168.139.53.8080 > 192.0.2.1.33927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:57:22.454738 tun0 Out IP 192.168.139.53.8080 > 192.0.2.1.33927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:57:22.670708 tun0 Out IP 192.168.139.53.8080 > 192.0.2.1.33927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:57:23.118784 tun0 Out IP 192.168.139.53.8080 > 192.0.2.1.33927: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP
22:57:23.242989 tun0 Out IP 192.168.139.53.8080 > 192.0.2.1.33927: Flags [F.], seq 1001, ack 1, win 64240, length 0
22:57:23.243018 tun0 In IP 192.0.2.1.33927 > 192.168.139.53.8080: Flags [R.], seq 1, ack 1, win 10000, length 0
#
扩展测试
上面说到 TLP 尾重传会强制发送一个探测包,要么是还没有收到 ACK 确认的数据包里面的最后一个数据包,要么是未发送的新数据包。
以下扩展测试下未发送的新数据包场景,其中一个是之前受到拥塞控制的限制而未发送的新数据包,脚本如下。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
0
通过 tcpdump 捕获数据包,会发现之前出现的 TLP 尾重传 Seq 1001:2001 数据包不会出现,而是由 Seq 2001:3001 数据包所代替,这里需要注意的是,由于初始 cwnd 2 的限制,实际上 Seq 2001:3001 一开始是无法发送的,只有在 PTO 超时后,受 TLP 尾重传机制触发,之前受到拥塞控制的限制而未发送的新数据包 Seq 2001:3001 发送。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
1
另外一种场景,之前受到 Nagle 的限制而未发送的新数据包,脚本如下。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
2
通过 tcpdump 捕获数据包,现象如下,由于 TCP Nagle 的限制,实际上第二个小数据包 Seq 101:201 一开始是无法发送的,只有在 PTO 超时后,受 TLP 尾重传机制触发,之前受到 Nagle 的限制而未发送的新数据包 Seq 101:201 发送。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
3
再说一种限制场景,是接收端所通告的接收窗口的限制,对于这种情况,TLP 尾重传是不会发送之前未发送的新数据包,脚本如下。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
4
通过 tcpdump 捕获数据包,现象如下,因接收端接收窗口 1000 的限制,仅能发送 Seq 1001:2001 数据包,而 Seq 2001:3001 数据包无法发送。在经过 PTO 超时后,TLP 尾重传机制生效,尽管存在一个之前未发送的新数据包 Seq 2001:3001,但却无法发送,而是尾重传了还没有收到 ACK 确认的数据包里面的最后一个数据包,即 Seq 1001:2001。
tcp_early_retrans - INTEGER
Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold for triggering fast retransmit when the amount of outstanding data is small and when no previously unsent data can be transmitted (such that limited transmit could be used). Also controls the use of Tail loss probe (TLP) that converts RTOs occurring due to tail losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
1 enables ER
2 enables ER but delays fast recovery and fast retransmit by a fourth of RTT. This mitigates connection falsely recovers when network has a small degree of reordering (less than 3 packets).
3 enables delayed ER and TLP.
4 enables TLP only.
Default: 3
5
往期推荐
推荐站内搜索:最好用的开发软件、免费开源系统、渗透测试工具云盘下载、最新渗透测试资料、最新黑客工具下载……
还没有评论,来说两句吧...