...and adjust the `totalFailedConnections` metric name
and description.
This commit should make the periodic stats log messages
and the relevant metric look less scary to users:
P2P connection failures are relatively frequent and are usually
not indicative of the proxy operator having done something wrong.
So let's tone the wording down.
See the discussion: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/516#note_3173677.
This migrates from using `atomic.LoadInt64` on `int64`
to making the `clients` field itself `atomic.Int64`.
Also `count` now takes `*tokens_t` by reference,
which fixes a linter warning.
It's not clear to me why it warned about this,
but I simplified it anyway.
And `failedConnectionCount`.
Convert the `int` / `uint` to `atomic.Int32` / `atomic.Uint32`.
The race was discovered by running a proxy with the `-race` flag.
Although it does not look like that there are situations
where it is critical to use a mutex, because it's only used
when performing rendezvous with a proxy, which doesn't happen
too frequently,
let's still do it just to be sure.
Some of the changes do not appear to have a potential race condition,
so there it is purely a refactor,
while in others (e.g. in broker.go and in proxy/lib/snowflake.go)
we do use the same variable from multiple threads / functions.
Our metrics were undercounting client polls by missing the case where
clients are matched with a snowflake but receive a timeout before the
snowflake responds with its answer. This change adds a new metric,
called client-snowflake-timeout-count, to the 24 hour broker stats and a
new "timeout" status label for prometheus metrics.
For the summary log and for Prometheus metrics.
Log output example:
> In the last 1h0m0s, there were 7 completed successful connections. 2 connections failed. Traffic Relayed ↓ 321 KB (0.10 KB/s), ↑ 123 KB (0.05 KB/s).
To test that the broker responds with a proxy answer if available, have
only one valid client offer to ensure metrics will always be in the
first multiple of 8.
Fixes a bug where socksAcceptLoop was reusing the same client config
when processing arguments from multiple SOCKS connections, causing
different bridge lines to clobber each other.
Our SQS tests were not concurrency safe and we hadn't noticed until now
because we were processing incoming SQS queue messages sequentially
rather than in parallel.
This fix removes the log output checks, which were prone to error
anyway, and relies instead on gomock's expected function calls and
strategic use of the context cancel function for each test.
We're losing a lot of messages from the broker SQS queue because they
are exceeding their maximum lifetime before being read and processed by
the broker. This change speeds up that process by increasing the size of
messagesChn and processing the messages within a go routine.