In February I posted a series of blog posts defining Transparent Telemetry, and we had a lively discussion on #58409. In the original posts, the design was opt-out. Based on that discussion as well as private discussions with long-time contributors and users, I revised the design to be opt-in.
I propose that we add opt-in transparent telemetry to the Go toolchain as described in those posts, specifically “The Design of Transparent Telemetry.” Transparent Telemetry has the following key properties:
- The decisions about what metrics to collect are made in an open, public process.
- The collection configuration is automatically generated from the actively tracked metrics: no data is collected that isn’t needed for the metrics.
- The collection configuration is served using a tamper-evident transparent log, making it very difficult to serve different collection configurations to different systems.
- The collection configuration is a cacheable, proxied Go module, so any privacy-enhancing local Go proxy already in use for ordinary modules will automatically be used for collection configuration.
- Uploaded reports only include total event counts over a full week, not any kind of time-ordered event trace.
- Uploaded reports do not include user IDs, machine IDs, or any other kind of ID.
- Uploaded reports only contain strings that are already known to the collection server: counter names, program names, and version strings repeated from the collection configuration, along with the names of functions in specific, unmodified Go toolchain programs for stack traces. The only types of non-string data in the reports are event counts, dates, and line numbers.
- IP addresses exposed by the HTTP session that uploads the report are not recorded with the reports.
- Thanks to sampling, only a constant number of uploaded reports are needed to achieve a specific accuracy target, no matter how many installations exist. Specifically, only about 16,000 reports are needed for 1% accuracy at a 99% confidence level. This means that as new systems are added to the system, each system reports less often. Exactly how often will depend on how many systems opt in.
- The aggregate computed metrics are made public in graphical and tabular form.
- The full raw data as collected is made public, so that project maintainers have no proprietary advantage or insights in their role as the direct data collector.
- The system is off by default and requires an explicit opt-in.
Please note, as described in the Why Telemetry? section of the intro post, that telemetry addresses a different kind of problem than bug reports and surveys. In particular, relying on bug reports is not sufficient to identify problems that don’t obviously impact functionality, including performance problems, and surveys are not sufficient to identify the variety of usage and contexts where Go is used and which would inform prioritization of effort.
There is good reason to believe that with even tens of thousands of users opted in, we should be able to get helpful data. It will not be as complete as the opt-out system, but it should be good enough. As described in the Can We Still Make Good Decisions? section of the opt-in post, there will be certain biases in the data based on who is more likely to opt in. Once we have data, it would make sense to compare “technical demographics” like operating system and editor against the annual Go survey and Stack Overflow surveys. If there is significant skew, we could look into reweighting the data as standard polls do (https://en.wikipedia.org/wiki/Iterative_proportional_fitting).
For examples of the kinds of questions we’d use telemetry to answer and the kinds of decisions those answers would inform (but not decide directly), see “Use Cases for Transparent Telemetry”.