aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* conn: use CmsgSpace() for ancillary data buf sizingJordan Whited2023-03-161-5/+7
| | | | | | | | CmsgLen() does not account for data alignment. Reviewed-by: Adrian Dewhurst <adrian@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* global: buff -> bufJason A. Donenfeld2023-03-1318-189/+189
| | | | | | This always struck me as kind of weird and non-standard. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: use right cmsghdr len types on 32-bit in sticky testJason A. Donenfeld2023-03-101-4/+4
| | | | | | | | Cmsghdr uses uint32 and uint64 on 32-bit and 64-bit respectively for the Len member, which makes assignments and comparisons slightly more irksome than usual. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: make StdNetBind.BatchSize() return 1 for non-LinuxJordan Whited2023-03-101-1/+4
| | | | | | | | | | | This commit updates StdNetBind.BatchSize() to return 1 instead of IdealBatchSize for non-Linux platforms. Non-Linux platforms do not yet benefit from values > 1, which only serves to increase memory consumption. Reviewed-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: enable TCP Selective AcknowledgementsJordan Whited2023-03-101-1/+6
| | | | | | | | | | Enable TCP SACK for the gVisor Stack used in tun/netstack. This can improve throughput by an order of magnitude in the presence of packet loss. Reviewed-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: ensure control message size is respected in StdNetBindJordan Whited2023-03-101-2/+2
| | | | | | | | | | | | | This commit re-slices received control messages in StdNetBind to the value the OS reports on a successful read. Previously, the len of this slice would always be srcControlSize, which could result in control message values leaking through a sync.Pool round trip. This is unlikely with the IP_PKTINFO socket option set successfully, but should be guarded against. Reviewed-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: fix StdNetBind fallback on WindowsJordan Whited2023-03-102-64/+150
| | | | | | | | | | | | | | | If RIO is unavailable, NewWinRingBind() falls back to StdNetBind. StdNetBind uses x/net/ipv{4,6}.PacketConn for sending and receiving datagrams, specifically via the {Read,Write}Batch methods. These methods are unimplemented on Windows and will return runtime errors as a result. Additionally, only Linux benefits from these x/net types for reading and writing, so we update StdNetBind to fall back to the standard library net package for all platforms other than Linux. Reviewed-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: inch BatchSize toward being non-dynamicJason A. Donenfeld2023-03-108-23/+19
| | | | | | | | There's not really a use at the moment for making this configurable, and once bind_windows.go behaves like bind_std.go, we'll be able to use constants everywhere. So begin that simplification now. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: set SO_{SND,RCV}BUF to 7MB on the Bind UDP socketJordan Whited2023-03-104-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conn.Bind UDP sockets' send and receive buffers are now being sized to 7MB, whereas they were previously inheriting the system defaults. The system defaults are considerably small and can result in dropped packets on high speed links. By increasing the size of these buffers we are able to achieve higher throughput in the aforementioned case. The iperf3 results below demonstrate the effect of this commit between two Linux computers with 32-core Xeon Platinum CPUs @ 2.9Ghz. There is roughly ~125us of round trip latency between them. The first result is from commit 792b49c which uses the system defaults, e.g. net.core.{r,w}mem_max = 212992. The TCP retransmits are correlated with buffer full drops on both sides. Starting Test: protocol: TCP, 1 streams, 131072 byte blocks [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 4.74 GBytes 4.08 Gbits/sec 2742 285 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 4.74 GBytes 4.08 Gbits/sec 2742 sender [ 5] 0.00-10.04 sec 4.74 GBytes 4.06 Gbits/sec receiver The second result is after increasing SO_{SND,RCV}BUF to 7MB, i.e. applying this commit. Starting Test: protocol: TCP, 1 streams, 131072 byte blocks [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 6.14 GBytes 5.27 Gbits/sec 0 3.15 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 6.14 GBytes 5.27 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 6.14 GBytes 5.25 Gbits/sec receiver The specific value of 7MB is chosen as it is the max supported by a default configuration of macOS. A value greater than 7MB may further benefit throughput for environments with higher network latency and lower CPU clocks, but will also increase latency under load (bufferbloat). Some platforms will silently clamp the value to other maximums. On Linux, we use SO_{SND,RCV}BUFFORCE in case 7MB is beyond net.core.{r,w}mem_max. Co-authored-by: James Tucker <james@tailscale.com> Signed-off-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* go.mod: bump depsJason A. Donenfeld2023-03-102-9/+9
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn, device, tun: implement vectorized I/O on LinuxJordan Whited2023-03-1024-787/+1870
| | | | | | | | | | | | | | | | | | | | Implement TCP offloading via TSO and GRO for the Linux tun.Device, which is made possible by virtio extensions in the kernel's TUN driver. Delete conn.LinuxSocketEndpoint in favor of a collapsed conn.StdNetBind. conn.StdNetBind makes use of recvmmsg() and sendmmsg() on Linux. All platforms now fall under conn.StdNetBind, except for Windows, which remains in conn.WinRingBind, which still needs to be adjusted to handle multiple packets. Also refactor sticky sockets support to eventually be applicable on platforms other than just Linux. However Linux remains the sole platform that fully implements it for now. Co-authored-by: James Tucker <james@tailscale.com> Signed-off-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn, device, tun: implement vectorized I/O plumbingJordan Whited2023-03-1025-494/+1026
| | | | | | | | | | | | | | | | | | | | Accept packet vectors for reading and writing in the tun.Device and conn.Bind interfaces, so that the internal plumbing between these interfaces now passes a vector of packets. Vectors move untouched between these interfaces, i.e. if 128 packets are received from conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is no internal buffering. Currently, existing implementations are only adjusted to have vectors of length one. Subsequent patches will improve that. Also, as a related fixup, use the unix and windows packages rather than the syscall package when possible. Co-authored-by: James Tucker <james@tailscale.com> Signed-off-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bump snapshot0.0.20230223Jason A. Donenfeld2023-02-231-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: uniformly check ECDH output for zerosJason A. Donenfeld2023-02-165-38/+45
| | | | | | | | For some reason, this was omitted for response messages. Reported-by: z <dzm@unexpl0.red> Fixes: 8c34c4c ("First set of code review patches") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun: guard Device.Events() against chan writesJordan Whited2023-02-098-11/+11
| | | | | Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* global: bump copyright yearJason A. Donenfeld2023-02-0775-75/+75
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: make http examples communicate with each otherSoren L. Hansen2023-02-072-9/+9
| | | | | | | | This seems like a much better demonstration as it removes the need for external components. Signed-off-by: Søren L. Hansen <sorenisanerd@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: bump gvisorColin Adler2023-02-073-7/+7
| | | | | | | Bump gVisor to a recent known-good version. Signed-off-by: Colin Adler <colin1adler@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* global: bump copyright yearJason A. Donenfeld2022-09-2075-75/+75
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: ensure `(*netTun).incomingPacket` chan is closedColin Adler2022-09-201-0/+4
| | | | | | | Without this, `device.Close()` will deadlock. Signed-off-by: Colin Adler <colin1adler@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* all: use Go 1.19 and its atomic typesBrad Fitzpatrick2022-09-0420-288/+156
| | | | | Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: remove separate moduleJason A. Donenfeld2022-08-294-33/+12
| | | | | | | Now that the gvisor deps aren't insane, we can just do this in the main module. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: bump to latest gvisorShengjing Zhu2022-08-293-1031/+37
| | | | | | | | | | | To build with go1.19, gvisor needs 99325baf ("Bump gVisor build tags to go1.19"). However gvisor.dev/gvisor/pkg/tcpip/buffer is no longer available, so refactor to use gvisor.dev/gvisor/pkg/tcpip/link/channel directly. Signed-off-by: Shengjing Zhu <i@zhsj.me> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn, device, tun: set CLOEXEC on fdsBrad Fitzpatrick2022-07-046-24/+36
| | | | | Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun: use ByteSliceToString from golang.org/x/sys/unixTobias Klauser2022-06-011-6/+1
| | | | | | | | Use unix.ByteSliceToString in (*NativeTun).nameSlice to convert the TUNGETIFF ioctl result []byte to a string. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: remove the final alloc per packet receiveJosh Bleecher Snyder2022-04-071-16/+37
| | | | | | | | | | | | | | | | | | | | | | | | | This does bind_std only; other platforms remain. The remaining alloc per iteration in the Throughput benchmark comes from the tuntest package, and should not appear in regular use. name old time/op new time/op delta Latency-10 25.2µs ± 1% 25.0µs ± 0% -0.58% (p=0.006 n=10+10) Throughput-10 2.44µs ± 3% 2.41µs ± 2% ~ (p=0.140 n=10+8) name old alloc/op new alloc/op delta Latency-10 854B ± 5% 741B ± 3% -13.22% (p=0.000 n=10+10) Throughput-10 265B ±34% 267B ±39% ~ (p=0.670 n=10+10) name old allocs/op new allocs/op delta Latency-10 16.0 ± 0% 14.0 ± 0% -12.50% (p=0.000 n=10+10) Throughput-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) name old packet-loss new packet-loss delta Throughput-10 0.01 ±82% 0.01 ±282% ~ (p=0.321 n=9+8) Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* conn: use netip for std bindJason A. Donenfeld2022-03-171-26/+13
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bump snapshot0.0.20220316Jason A. Donenfeld2022-03-161-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: bump modJason A. Donenfeld2022-03-162-24/+17
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* mod: bump packages and remove compat netipJason A. Donenfeld2022-03-162-21/+9
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* all: use any in place of interface{}Josh Bleecher Snyder2022-03-164-15/+15
| | | | | | Enabled by using Go 1.18. A bit less verbose. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
* all: update to Go 1.18Josh Bleecher Snyder2022-03-1620-33/+23
| | | | | | | | | | Bump go.mod and README. Switch to upstream net/netip. Use strings.Cut. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
* tun/netstack: check error returned by SetDeadline()Alexander Neumann2022-03-091-1/+4
| | | | | | Signed-off-by: Alexander Neumann <alexander.neumann@redteam-pentesting.de> [Jason: don't wrap deadline error.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: update to latest wireguard-goAlexander Neumann2022-03-093-24/+36
| | | | | | | | | This commit fixes all callsites of netip.AddrFromSlice(), which has changed its signature and now returns two values. Signed-off-by: Alexander Neumann <alexander.neumann@redteam-pentesting.de> [Jason: remove error handling from AddrFromSlice.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: simplify read timeout on ping socketJason A. Donenfeld2022-02-021-43/+14
| | | | | | I'm not 100% sure this is correct, but it certainly is a lot simpler. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* tun/netstack: implement ICMP pingThomas H. Ptacek2022-02-022-24/+343
| | | | | | | | | | Provide a PacketConn interface for netstack's ICMP endpoint; netstack currently only provides EchoRequest/EchoResponse ICMP support, so this code exposes only an interface for doing ping. Signed-off-by: Thomas Ptacek <thomas@sockpuppet.org> [Jason: rework structure, match std go interfaces, add example code] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bump snapshot0.0.20220117Jason A. Donenfeld2022-01-171-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* ipc: bsd: try again if kqueue returns EINTRJason A. Donenfeld2022-01-141-1/+1
| | | | | Reported-by: J. Michael McAtee <mmcatee@jumptrading.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* global: apply gofumptJason A. Donenfeld2021-12-0928-71/+56
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: handle peer post config on blank lineJason A. Donenfeld2021-11-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We missed a function exit point. This was exacerbated by e3134bf ("device: defer state machine transitions until configuration is complete"), but the bug existed prior. Minus provided the following useful reproducer script: #!/usr/bin/env bash set -eux make wireguard-go || exit 125 ip netns del test-ns || true ip netns add test-ns ip link add test-kernel type wireguard wg set test-kernel listen-port 0 private-key <(echo "QMCfZcp1KU27kEkpcMCgASEjDnDZDYsfMLHPed7+538=") peer "eDPZJMdfnb8ZcA/VSUnLZvLB2k8HVH12ufCGa7Z7rHI=" allowed-ips 10.51.234.10/32 ip link set test-kernel netns test-ns up ip -n test-ns addr add 10.51.234.1/24 dev test-kernel port=$(ip netns exec test-ns wg show test-kernel listen-port) ip link del test-go || true ./wireguard-go test-go wg set test-go private-key <(echo "WBM7qimR3vFk1QtWNfH+F4ggy/hmO+5hfIHKxxI4nF4=") peer "+nj9Dkqpl4phsHo2dQliGm5aEiWJJgBtYKbh7XjeNjg=" allowed-ips 0.0.0.0/0 endpoint 127.0.0.1:$port ip addr add 10.51.234.10/24 dev test-go ip link set test-go up ping -c2 -W1 10.51.234.1 Reported-by: minus <minus@mnus.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: reduce peer lock critical section in UAPIJosh Bleecher Snyder2021-11-231-26/+28
| | | | | | | | | The deferred RUnlock calls weren't executing until all peers had been processed. Add an anonymous function so that each peer may be unlocked as soon as it is completed. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: remove code using unsafeJosh Bleecher Snyder2021-11-231-33/+13
| | | | | | | | | | | | | There is no performance impact. name old time/op new time/op delta TrieIPv4Peers100Addresses1000-8 78.6ns ± 1% 79.4ns ± 3% ~ (p=0.604 n=10+9) TrieIPv4Peers10Addresses10-8 29.1ns ± 2% 28.8ns ± 1% -1.12% (p=0.014 n=10+9) TrieIPv6Peers100Addresses1000-8 78.9ns ± 1% 78.6ns ± 1% ~ (p=0.492 n=10+10) TrieIPv6Peers10Addresses10-8 29.3ns ± 2% 28.6ns ± 2% -2.16% (p=0.000 n=10+10) Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* global: use netip where possible nowJason A. Donenfeld2021-11-2322-285/+247
| | | | | | | | There are more places where we'll need to add it later, when Go 1.18 comes out with support for it in the "net" package. Also, allowedips still uses slices internally, which might be suboptimal. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: only propagate roaming value before peer is referenced elsewhereJason A. Donenfeld2021-11-161-1/+3
| | | | | | | | | A peer.endpoint never becomes nil after being not-nil, so creation is the only time we actually need to set this. This prevents a race from when the variable is actually used elsewhere, and allows us to avoid an expensive atomic. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: align 64-bit atomic member in DeviceJason A. Donenfeld2021-11-161-5/+6
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: start peers before running handshake testJason A. Donenfeld2021-11-161-0/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* Makefile: don't use test -v because it hides failures in scrollbackJason A. Donenfeld2021-11-161-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: fix nil pointer dereference in uapi readDavid Anderson2021-11-161-2/+2
| | | | | Signed-off-by: David Anderson <danderson@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: make new peers inherit broken mobile semanticsJason A. Donenfeld2021-11-153-0/+5
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: defer state machine transitions until configuration is completeJason A. Donenfeld2021-11-153-15/+18
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>