Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify what is allowed and what is considered malicious. #381

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
82 changes: 72 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,30 +29,85 @@ These public reports help protect the open source community, and provide a data
source for the security community to improve their ability to find and detect
new open source malware.

### Scope
## Scope

What is in scope?

- any package that belongs to an ecosystem supported by the
[OSV Schema](https://ossf.github.io/osv-schema/)
- malicious packages published under typosquatting or dependency
confusion type attacks
- malicious packages published under typosquatting type attacks
- malicious packages published through account takeover
- prebuilt binaries for a package that are malicious
- malicious prebuilt binaries downloaded or installed with a package
- security researcher activity
- dependency and manifest confusion

Borderline:

- typosquatting, or spam packages that are empty or trivial, while not
malicious, are allowed to be present in the dataset

Out-of-scope:

- non-malicious packages
- vulnerability reports
- compromised infrastructure
- non-malicious packages

### Prior Work
## Definition of a Malicious Package

- GitHub's [Advisory Database (filtered by malware)](https://github.com/advisories?query=type%3Amalware), for the NPM ecosystem.
- https://github.com/lxyeternal/pypi_malregistry (PyPI)
- https://dasfreak.github.io/Backstabbers-Knife-Collection/ (PyPI and npm), by Marc Ohm et al.
- https://github.com/datadog/malicious-software-packages-dataset (PyPI), by Datadog
- an open source package public available in a package registry
- and either:
- when installed or used, would require some sort of incident response; or
- exfiltrates an identifier that can be directly used to launch an attack
against the victim (e.g. username for phishing or password bruteforcing)

### Dependency and manifest confusion

[Dependency confusion](https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610)
and [manifest confusion](https://blog.vlt.sh/blog/the-massive-hole-in-the-npm-ecosystem)
are techniques that exploit quirks in the behavior of package systems and how
they are used within organizations. Packages using these attacks are malicious.

Very occasionally someone may unintentionally encounter these quirks, but
these are infrequent.

Manifest confusion requires someone to bypass the NPM command line tool and
deliberately provide an altered manifest.

Dependency confusion are effectively the same as an account takeover where an
attacker replaces a package's code with their own. This means even trivial or
empty dependency confusion packages would require incident response.

### Spam and typosquating

Spam, typosquatting are not malicious, unless the package itself exhibits
malicious behavior as-per the definition above.

These types of packages are often empty, or consist of only useless trivial
functionality. While these packages are not malicious, they are a nuisance and
generally unwanted.

Typosquatting packages may be hard to distinguish from dependency confusion. As
a result, these reports are allowed to be present in the malicious packages
repository.

### Reverse engineering protection (e.g. obfuscation)

Reverse engineering protections are not malicious, unless it exhibits malicious
behavior as-per the definition above.

Obfuscation, debugger evasion, and other reverse engineering protection
techniques, are used by both developers seeking to protect their source code
and attackers seeking to evade detection.

### Telemetry

Telemetry, on its own, is not malicious.

Many open source packages use telemetry to track installs or the behavior and
performance of the package.

However, if telemetry is abused to exfiltrate and steal sensitive data, or
provide remote access, this can be considered malicious.

## Get Involved

Expand Down Expand Up @@ -109,6 +164,13 @@ We will then either:

**Note:** support for handling false positives is TBC.

## Prior and Related Work

- GitHub's [Advisory Database (filtered by malware)](https://github.com/advisories?query=type%3Amalware), for the NPM ecosystem.
- https://github.com/lxyeternal/pypi_malregistry (PyPI)
- https://dasfreak.github.io/Backstabbers-Knife-Collection/ (PyPI and npm), by Marc Ohm et al.
- https://github.com/datadog/malicious-software-packages-dataset (PyPI), by Datadog

## Governance

This work is associated with the
Expand Down