Sublime Forum

RFC: Default Package Control Channel and Package Telemetry

#1

Recently @gerry pointed out that it appears the third most popular package in the Package Control default channel is apparently sending data to the startup Kite:

This seems to explain why a generic package providing extra menu items would be collecting info about how many minutes of time were spent editing Python code.

This has prompted some discussion about what, if anything, Package Control should (try to) do about privacy and dark patterns in packages that are part of the default channel. I’d like to get feedback from the community as to what users think should happen.

One proposed solution is to:

  1. Disallow opt-out telemetry from packages in the default channel. Any packages found collecting user info without having the user opt-in will be immediately removed from the channel and only permitted to be re-added if they have changed to opt-in.
  2. Keep Package Control itself as opt-out for install/upgrade/removal counts, so as not to break all of the anonymous statistics on https://packagecontrol.io. Add an explicit privacy policy in place of the current informal one (which is that I am the only person who has access to the server/data, and stats are only ever shared as anonymous counts of users installing, upgrading or removing packages, along with the Sublime Text version, operating system and date).

This would not prevent authors from creating their own PC channel (which has been supported from the beginning) and publishing opt-out packages, it would just prevent them from being included in the default channel. A simple channel with one or two packages could be hand-maintained without much effort. If someone wanted to set up an automated channel (that crawls package info), that can be adapted from the code base at https://github.com/wbond/packagecontrol.io.

If you can share your thoughts, I’d like to gather feedback and then decide what, if any, action should be taken to address the privacy of Package Control users. Please also share this topic with any Package Control users you can so we can get as much feedback as possible.

18 Likes

#2

I think that opt-in for changing data collection and an official policy with a 1 strike policy would be a great thing to see.

That said, I think there should be an official policy of how packages should be reported, reviewed, and deadlines associated.
I would recommend that reporting should be to a private system, but there should be a place to check what packages are under review.

I applaud you for taking privacy as a serious concern and looking at better ways to bake it into the community.

1 Like

#3

I vote for Solution 1. I too, applaud you for taking privacy seriously. I have removed the Sidebar Enhancements package for that very reason. There is no reason for them to collect that information, and if they want to, they should make an opt-in method.

4 Likes

#4

From the developers view I find some statistics about install counts and ST versions as provided by Package Control useful to check how much effort it’s worth to put into a piece of work or how much backward compatibility with older versions to concern about. It also shows what features people like or need.

But I honestly love Sublime Text for its trustworthy and the absence of any user tracking. You oppose to the major development of all big companies who mine data as kind of oil - go on with it!.

I immediately hacked SidebarEnhancemants to disable that tracking feature right after learning about it several months ago and even decided to stop using it at all (for some other reasons, too). I really felt uncomfortable with all my packages when I heard about the tracking of SidebarEnhancemants but luckily did not find any other installed one, doing so.

Packages must not track user interaction.

If a package provides useful functions, many people will install it. That’s information enough. If a dev wants feedback, he can ask for it in the forum or by update messages.

An interesting question with it is how to handle/review closed source packages?

5 Likes

#5

I see Sublime Text itself and Package Control special cases. As such, I don’t think the rules that apply to other packages need be applied, therefore:

  • Sublime opt-in
  • Package Control opt-in

It’s hard to restrict what a package can and cannot do, but Package Control can set rules about what packages it accepts to its default channel. Notice that this is not saying that a package cannot do x or y, but rather that if it wants to be included in the default channel then it must follow rules.

The policy I advocate for packages to be included in the channel is:

  • No data collection in any way or shape whatsoever, opt-in, opt-out or otherwise.

There are several reasons to prefer a no tolerance policy:

  • Instils confidence because it clearly defines the boundaries
  • Too few a percentage of users will opt-in anyway
  • Makes it easier for the community to handle issues as the arise because there will be more often than not no ambiguity about it. When there is grey area it takes forever to close out issues, and by then a lot of damage is already done.
  • Anything other than a no tolerance policy creates high cost and a complicated mess of what is acceptable and what is not, how it should be implemented, how it should not. It becomes very subjective and then impacts point three because a user is far less likely to raise an issue if they don’t know if it’s acceptable or not. They may feel they’re wasting their time reporting it because it’s probably accepted data collection procedure, even though they themselves don’t think it is, they may just uninstall it, but that doesn’t help the community.

I think SE is an example of why anything other than a no tolerance policy fails. SE has been data collecting for more than five months. It’s about weighting the costs of a complicated ambiguous ruleset vs it’s not allowed fullstop.

On top of that, opening up the Package Control data to provide, for example, package authors with stats of their packages e.g. ST versioning, package versioning, dependencies installed versioning, etc. All the things that will make it redundant for a package to want to collect data to “improve the plugin support”. Pretty much no package should need to be collecting data to know what OS’s their package is installed on or what versions are installed, etc.

6 Likes

#6

I think some definition of data collection needs to be made. Is it data collection at all, or data collection not related to the primary function of the package?

Let’s say Kite authors a Sublime Text plugin that sends your code to their servers for auto-completion support. Is that data collection, or is it ok since it is the express intent of the package? Can they not publish their package on the default channel since data is sent off of your machine?

I’m not trying to be difficult, I just want to make sure there is a clear, established set of rules if we are going that route.

0 Likes

#7

I agree with @gerry. I would like to see a policy that does not allow data collection on the default channel, but I understand that it may not be feasible.

I think a more practical solution would be requiring Opt-in, but also requiring a Privacy Policy that details what is collected and who it is shared with. It would also be nice to show a badge on the Package Control website stating “this package may collect and share information to third parties” with a link to the privacy policy from the package maintainer similar to how Readme is shown in Package Control.

1 Like

#8

I would prefer zero-tolerance. Not even opt-in. It’s easier to define what this means.

For Package Control itself, it’s obvious that this functionality is needed.

I wholeheartedly agree with what @gerry and @deathaxe said.

4 Likes

#9

It’s a good point. What I have in mind with data collection is just ordinary user tracking data.

If a package obviously provides information based on a web service, which needs some information to work properly, this is part of the core functionality and should be possible.

In such cases a package should:

  • inform a user about the usage of a web service backend
  • outline the data being sent to the server (I at least want to know what’s published, to decide whether to agree or not)
0 Likes

#10

I opt by option 1:

Disallow opt-out telemetry from packages in the default channel. Any packages found collecting user info without having the user opt-in will be immediately removed from the channel and only permitted to be re-added if they have changed to opt-in.

I did now know so far there were packages collecting my data.

0 Likes

#11

Just to be clear, the two numbered list items I outlined in the original post were two parts to a single option. I think that if telemetry data is disallowed, the topic of PC collecting data needs to be addressed, which is what bullet 2 was covering.

0 Likes

#12

I understand where you’re coming from.

The Anaconda* packages are Kite based. They’re auto complete packages and they send the full contents of files to Kite: (Correction: the Anaconda packages are not based on Kite and do not send the full contents of files to Kite. That is not an accurate representation of the packages. I projected my own sensitivities around this issue onto the Anaconda packages.)

Here is an Sublime text 2 Kite plugin: https://github.com/kiteco/sublimetext2-plugin/blob/a7a3376e041d04b22c477c5bf1209f498ee1f869/SublimeKite.py

Here is Kite’s privacy policy:

Any source code files on your computer’s hard drive that you have explicitly allowed our services to access https://kite.com/privacy/ (emphasis mine)

I think people are nuts installing this stuff.

But is it data collection. Personally I consider it data collection. I think the data is monetised. Maybe not today, but inevitably it will be. That’s what I believe. But is that just my opinion?

I’m honestly stuck on this one. Personally I don’t like it and wouldn’t allow someone to install it, but I can see how someone can argue that it’s a feature.

1 Like

#13

I don’t use such packages, too.

The point is - it is clear to a user whether data is sent?

0 Likes

#14

Lurker chiming in: I’m in favor of required opt-in for packages distributed through Package Control. There are good reasons why some packages may want to record some user data, but users should be clearly informed of what information is being collected, why it is being collected, and who will have access to it before any is transmitted. That’s just common courtesy.

If package developers are concerned that this will result in fewer users sending their data… well, that’s the point. If you can’t make a convincing argument for why you need the data, and aren’t willing to ask before taking, don’t ask for it in the first place.

0 Likes

#15

The whole stuff is a question of trust. I believe in your statement @wbond Package Control stats are accessed by you only, with anonymous stats being published.

The packages install stats don’t tell too much harmful stuff about user behavior in the end. Furthermore you told which data are collected and they help to find most important packages in most cases. (There are also little used packages with high value for me)

You provided information about the telemetry data and their usage. With that in mind everyone can decide whether to use Package Control or not. I can even disable telemetry. Therefore everything is fine about the telemetry feature of Package Control.

I agree with the other’s Package Control is to handle special as it can be understood as optional core part of Sublime Text provided by a core developer. Everyone not trusting this fact should maybe not use Sublime Text at all - paranoid.

1 Like

#16

Can you please elaborate on the this? The Anaconda package seems to run a local jedi based json-server.

Aside from that the linked source code file connects to a local server (127.0.0.1:46624) and requires kite to be installed and running (this may forward information).

0 Likes

#17

I’m sure there are. I think the good reasons are so negligible that I don’t see this as good argument for allowing it.

PC can open up the stats collected about packages to provide more details about packages, which to me makes most data collection from packages redundant. Beyond that I don’t really know what the value of it is.

0 Likes

#18

Sorry, I haven’t tested those I just seen Kite is integrated: https://github.com/DamnWidget/anaconda/blob/master/AnacondaKite.sublime-settings when browsing through packages.

It looks disabled by default.

0 Likes

#19

I would agree with a policy that does not permit data collection as described without an explicit opt-in on the default channel would be the most preferable solution.

The opt-in should include what information is being collected, and for what purpose, to allow an informed decision.

1 Like

#20

I took a short look at the code and it seems to disable some features if the settings is activated and Kite is installed. It seems not to interact with Kite except to check whether it is installed and running or not

It may be worth to monitor it for the future, but currently I don’t see a privacy violation problem.

0 Likes