diff options
author | Eric Dumazet <eric.dumazet@gmail.com> | 2012-04-03 09:37:01 +0000 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2012-04-03 17:35:43 -0400 |
commit | 2f53384424251c06038ae612e56231b96ab610ee (patch) | |
tree | e3524210033983e5727c88e21bb3e6a766b20ae4 /net/ipv4/tcp.c | |
parent | e675f0cc9a872fd152edc0c77acfed19bf28b81e (diff) | |
download | linux-3.10-2f53384424251c06038ae612e56231b96ab610ee.tar.gz linux-3.10-2f53384424251c06038ae612e56231b96ab610ee.tar.bz2 linux-3.10-2f53384424251c06038ae612e56231b96ab610ee.zip |
tcp: allow splice() to build full TSO packets
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.
4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)
This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)
In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.
Call tcp_push() only if MSG_MORE is not set in the flags parameter.
This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.
In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp.c')
-rw-r--r-- | net/ipv4/tcp.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cfd7edda0a8..2ff6f45a76f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -860,7 +860,7 @@ wait_for_memory: } out: - if (copied) + if (copied && !(flags & MSG_MORE)) tcp_push(sk, flags, mss_now, tp->nonagle); return copied; |