TL;DR: The current implementation uses a 32K buffer size for a total of 64K of
buffers/connection, but each read/write is less than 2K according to my measurements.
# Background
The Snwoflake proxy uses as particularly hot function `copyLoop`
(proxy/lib/snowflake.go) to proxy data from a Tor relay to a connected client.
This is currently done using the `io.Copy` function to write all incoming data
both ways.
Looking at the `io.Copy` implementation, it internally uses `io.CopyBuffer`,
which in turn defaults to a buffer of size 32K for copying data (I checked and
the current implementation uses 32K every time).
Since `snowflake-proxy` is intended to be run in a very distributed manner, on
as many machines as possible, minimizing the CPU and memory footprint of each
proxied connection would be ideal, as well as maximising throughput for
clients.
# Hypothesis
There might exist a buffer size `X` that is more suitable for usage in `copyLoop` than 32K.
# Testing
## Using tcpdump
Assuming you use `-ephemeral-ports-range 50000:51000` for `snowflake-proxy`,
you can capture the UDP packets being proxied using
```sh
sudo tcpdump -i <interface> udp portrange 50000-51000
```
which will provide a `length` value for each packet captured. One good start
value for `X` could then be slighly larger than the largest captured packet,
assuming one packet is copied at a time.
Experimentally I found this value to be 1265 bytes, which would make `X = 2K` a
possible starting point.
## Printing actual read
The following snippe was added in `proxy/lib/snowflake.go`:
```go
// Taken straight from standardlib io.copyBuffer
func copyBuffer(dst io.Writer, src io.Reader, buf []byte) (written int64, err error) {
// If the reader has a WriteTo method, use it to do the copy.
// Avoids an allocation and a copy.
if wt, ok := src.(io.WriterTo); ok {
return wt.WriteTo(dst)
}
// Similarly, if the writer has a ReadFrom method, use it to do the copy.
if rt, ok := dst.(io.ReaderFrom); ok {
return rt.ReadFrom(src)
}
if buf == nil {
size := 32 * 1024
if l, ok := src.(*io.LimitedReader); ok && int64(size) > l.N {
if l.N < 1 {
size = 1
} else {
size = int(l.N)
}
}
buf = make([]byte, size)
}
for {
nr, er := src.Read(buf)
if nr > 0 {
log.Printf("Read %d", nr) // THIS IS THE ONLY DIFFERENCE FROM io.CopyBuffer
nw, ew := dst.Write(buf[0:nr])
if nw < 0 || nr < nw {
nw = 0
if ew == nil {
ew = errors.New("invalid write result")
}
}
written += int64(nw)
if ew != nil {
err = ew
break
}
if nr != nw {
err = io.ErrShortWrite
break
}
}
if er != nil {
if er != io.EOF {
err = er
}
break
}
}
return written, err
}
```
and `copyLoop` was amended to use this instead of `io.Copy`.
The `Read: BYTES` was saved to a file using this command
```sh
./proxy -verbose -ephemeral-ports-range 50000:50010 2>&1 >/dev/null | awk '/Read: / { print $4 }' | tee read_sizes.txt
```
I got the result:
min: 8
max: 1402
median: 1402
average: 910.305
Suggested buffer size: 2K
Current buffer size: 32768 (32K, experimentally verified)
## Using a Snowflake Proxy in Tor browser and use Wireshark
I also used Wireshark, and concluded that all packets sent was < 2K.
# Conclusion
As per the commit I suggest changing the buffer size to 2K. Some things I have not been able to answer:
1. Does this make a big impact on performance?
1. Are there any unforseen consequences? What happens if a packet is > 2K (I
think the Go standard libary just splits the packet, but someone please confirm).