Sublime Forum

RFC: Default Package Control Channel and Package Telemetry

#21

Sorry, I was confused, but your complete solution is still OK with me.

0 Likes

#22

I think the kind of data Package Control collects is kind of to be expected for such a feature. As long as communication about it is clear it’s fine, so to me part 2 of the proposal is fine.

What’s happening with SidebarEnhancements (and Minimap for Atom), is that the telemetry has no relation at all with the functionality of the package. It’s just a way to sneak in on as many developers as possible.

I feel there should be a clear value-add for the user of the package, and it should be there in the very first version that’s submitted to the default channel. If the telemetry is essential, it doesn’t make sense to add it on later. If it’s not essential and the package maintainer is simply curious, that’s too weak a use case to allow the telemetry. There should be zero tolerance for packages that add, after submission, anything that takes any kind of information from the end user and uploads it elsewhere. Sublime has a large user base so of course it’s enticing, but Package Control should not be treated as a commercial opportunity. It originated from the community and it exists so that the community can share, not as an entry point for Kite et al. to mine data.

Therefore, I would add the following to point 1 of the proposal:
If a new package is submitted that uses telemetry, it should be opt-in and clearly, very explicitly explained to end users. The pull request that adds the package to the default channel should clearly mention the telemetry, how it relates to the core functionality of the package, and the efforts undertaken to explain this to end users. If there is no strong relation between the telemetry and the features of the package, it will not be allowed. Any package that is found to add telemetry after being added to the default channel will be pulled immediately and only restored after the telemetry code is removed.

This still allows anyone to create a new version of an existing package with telemetry, but challenges them to create a new entry in Package Control and convince the user base to migrate based on the value-add of the telemetry. If SidebarEnhancements-with-Kite is much better than the old one there should be no problem with this approach.

I would consider adding similar rules about advertising: it’s only a matter of time Kite hijacks something else (the linter seems like a tasty target, I bet they’ve already been approached) to add links to their services like it did with Minimap for Atom. Any advertising that’s not a donation button or a link to a repository should follow the same rules: add value, improve the core functionality, add it on day one, don’t be sneaky.

Edit: When in doubt about the value-add, default to “no”. Something like Package Control cannot flourish if people don’t feel safe about it, but I doubt we’d miss out much if a package or two is rejected. Plus, you can always create a new channel.

5 Likes

#23

I would prefer a hard line against any data collection personally. While it could be useful for a developer to know how the tool is used, there are channels that can be used if the user so chooses to utilize them. If the user does not utilize them, I think we can easily say the user did not wish to utilize them.

2 Likes

#24

Hi everyone, member of Kite here. The SideBarEnhancements telemetry was originally added to gather data around what programming languages we should support next. We will ask Tito, the maintainer, to remove it.

We fully support a standardized telemetry consent mechanism for Sublime packages. An option for “Allow plugin authors to collect anonymous usage statistics” similar to Atom’s could be helpful.

0 Likes

#25

Oh, so it’s fine then :roll_eyes:. You needed to mine some data without people knowing, you got what you needed and now that you’re caught you just have to try to make it to the exit swiftly and silently before it blows up any further. Bravo.

You do understand that this discussion isn’t about this particular situation, but about preventing it from ever happening again.

An option for “Allow plugin authors to collect anonymous usage statistics” similar to Atom’s could be helpful.

A blanket opt-in is useless. Users should know exactly which package requests telemetry and should be able to opt-in on a per package basis.

9 Likes

#26

Plugins should make very clear that they are collecting information and be very clear about what they are collecting. I think it should never default to collecting without consent. It should also be per plugin. Just because I give consent to one, I may not want to give consent to all.

Spoiler alert, I don’t want to give consent to any plugin. I frankly don’t trust plugins to this kind of information.

6 Likes

#27

The question is, why did you try to hide who the data was being sent to? And why did you ask to capture activeNonBundledPackageNames? That bit of data seems like a very non-anonymous collection of information. You could be capturing internal package names and consequently exfiltrating the existence of development of competitors products.

I’m having a really hard time understanding why you’ve been so careful about being anonymous about capturing data from the users of (at least) two popular text editors. Do you have you fingers in VS Code, VIM or Emacs packages also?

11 Likes

#28

With the realization of the massive privacy leak that SideBarEnhancements has been perpetrating (sending a list of every installed package to a commercial entity that has been hiding its involvement), I’ve made a judgement to remove it in an attempt at preventing further damage. It can be added back if/when we see a concern for the privacy of users.

10 Likes

#29

Also, why does it take exposure via a thread like this for you to take action? You say you’re staying open, but why didn’t you pull your involvement from SidebarEnhancements when you got caught on the Atom packages last week?

It’s plainly obvious you fail to understand the extent to which you abused and continue to abuse your target audience. So please don’t try to spin this or “engage with the community”, it’s insulting.

2 Likes

#30

@braver the truth is we didn’t remember. This was done the better part of a year ago and we haven’t looked at it in a while.

(We also didn’t remember this telemetry was in the kite-installer package linked to in OP. That will be removed later today.)

@wbond I believe we did not capture internal package names, or that we filtered them out in the code.

0 Likes

#31

I don’t see any filtering happening here: https://github.com/SideBarEnhancements-org/SideBarEnhancements/blob/1858330b71b682ec1f3265bd280dde65465bdf18/Stats.py#L86.

1 Like

#32

Incompetent much?

Edit: shouldn’t post when angry.

0 Likes

#33

@wbond I have some internal packages installed manually into ST3 but when I run sublime.load_settings('Package Control.sublime-settings').get('installed_packages') I don’t see them listed. (?)

0 Likes

#34

It depends on how you install them. If you git cloned or manually installed, they won’t be in the installed list. If you installed from a private channel file, or by pasting the git repo as a Package Control repository and used Package Control to install/update then they will be in the list.

If you wanted info about popularity of packages. https://packagecontrol.io/browse/popular.json is probably what you should have used.

2 Likes

#35

Actually, the same concern is also applicable to commercial packages and Package Control binary dependencies. A package developer can ship their packages with binary dependencies which somehow collect user information.

One of my packages requires PyWIn32, and it seems that there is no good way to prove that the binary files are from a credible source.

0 Likes

#36

You make a good point. It seems like something along the lines of Little Snitch would be required to tell if binary packages are making web requests.

Honestly, that’s probably the only feasible way to tell anyway, since it won’t be possible to review the source of every package.

I think we’ll probably have to have a rule and consequences for breaking the rule, but rely on the community for identification.

8 Likes

#37

I’d be interested to see something like the manifest.json format in Chrome Applications.

It’d be hard to limit actual behavior, but as different permissions are brought up like “collect file data” or “collect other package data” packages can explicitly tell the user what they are doing.

Ideally you’d also be able to deny new permissions from this manifest and those particular features are turned off.

If a package doesn’t list tracking/network traffic or it ignores the user’s choice when asking for permissions, I see that as a 1 strike offense. The package is removed and any other packages associated with that author and/or corporation are reviewed.

This will be harder since unlike iOS, Chrome, or Android ST and package control don’t have as much of a sandbox around plugins.
But, keeping things to an honor system with clear consequences I think should keep the channel fairly clear.

0 Likes

#38

@wbond

This would not prevent authors from creating their own PC channel

Just so I understand correctly, this means something like SublimeLinter’s separate channel for all its plugins? If so, this would be a huge cut into how we currently operate our reviewal process, and I’m sure would increase the burden on Package Control maintainers to work around our approval system.

To make a statement about this issue: SublimeLinter cares about your privacy and does not tolerate data collection as shown by Kite or other parties. We do not, and will never, give away or sell your information without prior approval as long as I am involved. That being said, our contributor plugins are not fully under the SublimeLinter Community jurisdiction and we can not promise that same level of service. We will work with users to help remedy the issue if one of our contributors is found collecting information.

4 Likes

#39

Just to be clear on the SublimeLinter topic: the sublime linter “channel” is actually a “repository” and included in the default channel. This is mostly a relic from old times and, as far as I’m aware, the different types of specifying packages could be unified (and the terminology is kind of confusing), but it is not of matter for this topic. I believe that I created an issue about this on the PC repo a while ago.

I will comment on the actual topic of this thread later with more time at hand.

2 Likes

#40

Okay, back onto the topic at hand:

I am very much in favor of strictly requiring telemitry and other user data to be sent only when the user opted in. This may occur via a popup on first installation asking whether data should be collected or by having to specify an API key in some settings file, which should be enough user action to determine that the user is indeed inclined to have his data shared to some remote. Of course, Package Control would be excluded from this with the privacy policy mentioned in OP.
This should be the first step.

Any package found violating this rule will be purged from the channel immediately, probably with a one strike system and re-adding it once when the violation is removed. Since there is no reporting system on the site (yet), I suggest contacting me or Will directly when you find a package like this. You can find us on the discord server, for example (just mention us/me in the #general channel with a url to the offending package). Not that we will only review package on their initial submission and when specifically reported to us, as we can’t possibly review each and every update.

In a similar matter, packages doing other malicious things to the system (Python doesn’t run in a sandbox) should be removed immediately as well with all the author’s other packages purged as well. No strikes here.


Now, onto whether data collection should be banned entirely: I believe this is not possible. There are a couple packages that specifically send stuff to some remote as their designated feature, such as time-tracking packages like Wakatime or packages that upload snippets to pastebin sites. These would be affected by this policy, too, even though they are certainly useful for some. A privacy policy statement from these packagesabout what data is collected and how it is used would certainly be appreciated, though I don’t currently see how this can be implemented reasonably. For now, I say the readme is enough and we require this to be present (for new packages).


@braver suggested to also only permit user data being sent if it exists in the package from the beginning and was not added later. I don’t support this stance, because there may very well be scenarios where you add an additional feature that is related to your package but does not need to be in its own package for architectural or topical reasons. I don’t want to force too many restrictions on developers when there don’t need to be. The feature must be opt-in either way, so it doesn’t matter whether the user explicitly installs the “new separate package” with that feature or enables it manually through some action.

From a review standpoint, this doesn’t matter either, except that it would be easier on us as we don’t have to try and track down the state in which the package was submitted. In order to find out whether a package is violating policies, it has to reviewed either way.

7 Likes