Leaked tokens in VSCode extensions

obligatory useless picture that adds nothing to the article

The problem

To develop and bundle a VSCode extension (.vsix file), you run vsce package. This command checks package.json etc., and it includes all files in the current directory, except for those listed on .vscodeignore. It does not check .gitignore. This means, for example, if (forgetting about this) after a while you do the following:

echo 'SECRET' > .secret
echo '.secret' >> .gitignore
vsce package
vsce publish

...then congrats! you've just successfully released SECRET to the public, without you ever noticing.

This is pretty scary. I admit it requires a certain amount of sloppiness to not realize what is going on, but it happened to me (with uncritical data), so I supposed there must be more hidden data discoverable across all vsix files the marketplace has to offer. Well, turns out, yes, there is.

Finding hidden files

I won't bother you too much with the technical details, I simply wrote a few scripts that processed the latest version of all existing VSCode extensions. To determine if a file is hidden though, I exclusively targeted the ones that have a valid GitHub repository linked to them (65%, much more than I thought) to match both file listings.

Here's the gist:

Out of all 36,684 extensions, 28% have no GitHub repository linked, 7% have but the link is dead, and I skipped everything above 4 MB (8%). The remaining 20,606 (56%) were inspected. Out of these inspected, 3,219 (16%) contain at least one top level file that is not present in the respective Git repo; 6,848 files in total.

Out of these "hidden" files, most stuff is really boring -

Here is a list of them categorized
Total: 6,848
1,175 package.json
1,282 Image
932 Changelog
439 Readme
422 other unlisted source file
314 package.*
311 js
267 other .json files
223 language-configuration.json
174 md
145 package-log.json
106 tsconfig/jsconfig.json
104 yarn-error.log
104 other txt
98 logfile
97 extension.js
80 filenames matching "token" or "secret"
74 other
65 quickstart
55 TODO
53 shell script
52 ThirdPartyNotices
51 tslint.json
43 ts
39 html
35 yarn.lock
31 archive (mostly vsix)
28 snippets.json
26 yml
16 binary executable
3 video
3 pdf
1 gpg key
- but:

Leaked tokens

...99 (3.0%) of hidden files contain a definite 52-character Azure personal access token (PAT). That's 0.5% of all extensions. PATs can be used to publish extension updates. They have a configurable expiration date, maximum one year. Once a malicious actor has gained access to a PAT, they can utilize it without further ado. Judge for yourself.

Now I didn't abuse those obtained PATs of course (who knows how many of those are expired anyway) and contacted the respective extension authors instead. All of their reactions have been wonderfully positive and constructive. To put things into perspective, all 99 extensions have an all time download count of 685,966 which allegedly means "The number of unique installations, not including updates".

What should change

One expedient measure against token abuse can be 2FA at CLI login time. NPM does that, for example. This measure alone would be worth ten times more than all of Microsoft's unnecessarily complicated token management steps combined.

Besides that, vsce really should respect .gitignore or at least give out warnings to the user or check for common filenames, integrate with Git and so on. Commonly used software must be smarter than that.

And finally, VSCode's atrocious security model. VSCode is awesome, but extensions have full control over your computer, still they're one-click install solutions built and released by anyone with no security precautions. This can be abused by both malicious extension authors and black hat hackers, as outlined above, and is some disaster waiting to happen if it doesn't get fixed. It is difficult to fix now and was more of a design decision (e.g. LSP servers), but I think we can do better than that. Deno's mindset at least is a great step in the right direction. Is it too late for Microsoft to improve VSCode sandboxing? I don't know, but perhaps let's stop pretending this it is not an issue.


Appendix after some feedback: