Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a SBOM for Node.js? #1115

Open
marco-ippolito opened this issue Sep 20, 2023 · 19 comments
Open

Have a SBOM for Node.js? #1115

marco-ippolito opened this issue Sep 20, 2023 · 19 comments

Comments

@marco-ippolito
Copy link
Member

marco-ippolito commented Sep 20, 2023

I think it would be great to have a SBOM for the project now that we are working on dependency build audit.
Probably investigate on how we can achieve this since we have different types of dependencies and which format.

@RafaelGSS
Copy link
Member

IIRC besides the SLSA + SigStore work I think @BethGriggs also have looked to SBOMs, right? Would you mind sharing your point of view?

@BethGriggs
Copy link
Member

I had some initial thoughts, but didn't get too far. Some of them:

  • GitHub produces a SBOM downloadable from Insights on the repository. This is incomplete and I believe only shows the more-easily discoverable npm and actions dependencies.
    • I know in security the stance is often that small mitigations are better than no mitigations. But in this case, I feel an incomplete SBOM is probably worse than no SBOM.
  • Typical SBOM tools tend to assume you're building an application using a specific runtime/language. For example, they'll just traverse the node_modules and generate the SBOM from that. Our mixture of runtimes/languages used complicates things.
  • Information that would be really useful to ship in an SBOM alongside our binaries is information of dependencies we build directly against, and where their source came from. This is so users can easily know the version of node they're using depends directly on dependency x from this source, and can feed it into tools that monitor their SBOMs, etc.
  • maintaining-dependencies.md is a good start, but I came to the conclusion the SBOM should really be generated at build time. This is because some of our dependencies can be externalised or swapped out during the build step (for example, building against a system OpenSSL). What is in the deps directory in our sources may not match what is actually built.
  • Another approach I thought about was just using the values that get built into process.versions. It could be a reasonable interim step. It felt a bit odd (even risky?) to rely on executing the software to determine what's being used in it. I feel doing it at the build stage would allow us to gather more detail (which source was used) and verification rather than just reporting versions.

@marco-ippolito
Copy link
Member Author

marco-ippolito commented Oct 3, 2023

I see that CycloneDX is quite popular, should we give it a try? What kind of tool should we use?

@mhdawson
Copy link
Member

mhdawson commented Oct 3, 2023

+1 for CycloneDX

@marco-ippolito
Copy link
Member Author

So I gave it a try on my machine and unfortunately my macbook went OOM and crashed.
Since Node is a fairly large project it's an expensive operation that falls into the case described by documentation: https://github.com/CycloneDX/cdxgen/blob/master/ADVANCED.md#use-atom-in-java-mode

I was wondering if it was possible to have access to a machine with 32/64gb of ram to run it.

@mhdawson
Copy link
Member

mhdawson commented Oct 4, 2023

I think the only machines we have that have that much memory might be:

test-nearform_intel-ubuntu2204-x64-1 test-nearform_intel-ubuntu2204-x64-2

I'd suggest you open an issue in the build repo to request access to one of those.

@marco-ippolito
Copy link
Member Author

marco-ippolito commented Oct 17, 2023

The ideal goal is to ship a SBOM for every executable we release, since every platform might have slight difference settings, tools, dependendencies (? I'm not sure this is true). I guess it should eventually, be included at build time as a release step, @RafaelGSS.

It is also possible to generate the SBOM starting from a csv file manually, which might be easier and less expensive in terms of computing but hard to maintain, not big fan of this idea.

Also we should define a end goal for the project in terms of SBOM quality https://scvs.owasp.org/scvs/v2-software-bill-of-materials/ I assuming we start from the basic

My idea is to start quick with https://github.com/CycloneDX/cdxgen which is a "generalistic" tool and then refine and improve quality with further developments and more specific tools

@richardlau
Copy link
Member

So I gave it a try on my machine and unfortunately my macbook went OOM and crashed. Since Node is a fairly large project it's an expensive operation that falls into the case described by documentation: https://github.com/CycloneDX/cdxgen/blob/master/ADVANCED.md#use-atom-in-java-mode

I was wondering if it was possible to have access to a machine with 32/64gb of ram to run it.

The ideal goal is to ship a SBOM for every executable we release, since every platform might have slight difference settings, tools, dependendencies (? I'm not sure this is true). I guess it should eventually, be included at build time as a release step,

Dependencies are the same for the platforms we currently release. Tooling (compilers, Python, etc) do differ.

If CycloneDX requires that amount of RAM to run for Node.js it's not going to be realistic to run on every platform we release on. Most of the release machines have 4GB RAM (some have 2GB+swap and a small number have 8GB).

@pombredanne
Copy link

@marco-ippolito repasting my post from the CycloneDX chat:

cdxgen is a good start! For a large codebase like node.js, here are my extra 2 cents:

  1. IMHO your problem is not so much npms or pypi that are easy to inventory because they have package manifests, but the rest of the C/C++ code and its deps that are vendored or not but have no manifest, like zlib, cares, and similar and their nested and bundled deps all the way down (like in V8)

  2. You may document their origin and licenses in the codebase. I use small YAML files for this, you could use a small CycloneDX SBOM to the same effect. Conceptually something like this https://github.com/nodejs/node/blob/main/deps/zlib/README.chromium#L1 but improved to have proper Package URL/purls. This will get you an explicit list that you can then have scanners collect in addition to the simpler npms or Python package.

  3. Or you might want to match against a reference index of C/C++ packages for these too, in which case you need a code matching tool and a reference DB. Or do a combo of 2. and 3. which is best IMHO. Then eventually you will need to craft and run a custom pipeline assemble data from a few different tools and origins to get something that is tailored to node.js

  4. You may want to consider also analyzing the deployed (debug) binaries rather than the sources code to craft an SBOM that is based on the subset of the sources effectively used. This is effectively what users and security teams will care for, not the (many) other development-only packages that are not deployed

  5. You really want to get proper Package URLs/purls in your CycloneDX output for this to be useful for downstream users when querying for vulnerability in modern databases. If you have a few CPE that will not hurt either!

  6. This is a process. Do not expect to get any open source or commercial tool to get you the correct results out of the box. This will require tuning and a custom pipeline to automate all this. And the output of running this pipeline will require regular review for accuracy.

I have some experience in the domain and I may be able to help modestly.

@pombredanne
Copy link

@BethGriggs re: #1115 (comment)

I came to the conclusion the SBOM should really be generated at build time. This is because some of our dependencies can be externalised or swapped out during the build step (for example, building against a system OpenSSL). What is in the deps directory in our sources may not match what is actually built.

💯 ... if you can instrument your build to collect the subset of third-party code that you effectively include (and possibly external deps that may be expected at runtime), then this is IMHO the best possible case and something that I would always recommend.

@marco-ippolito
Copy link
Member Author

@pombredanne so my idea to get started is :

  1. run cdxgen for each package in /deps folder for npm packages,
  2. run cdxgen for tools and github actions
  3. document their origin and licenses for V8 and OpenSSL and other c++ dependencies

Would you suggest some tools for your point 3 and 4? or some reference

@prabhu
Copy link

prabhu commented Nov 1, 2023

I will work on improving the performance of cdxgen/atom for the c/c++ codebase. It has to be done regardless of whether node.js becomes a user or not. My initial focus would be on the time to reduce it to less than an hour for v8. Reducing the memory footprint to make it run in a CI agent for such large codebases is impossible, so it is not going to be my priority this year.

@marco-ippolito I like your proposal to generate individual SBOMs per folder in deps. CycloneDX supports linking SBOMs using BOM-Link under external references.

I have created this ticket to automate this process a bit. Once all the performance tickets are done, I am happy to share an example workflow with the right arguments needed to generate these.

@pombredanne
Copy link

pombredanne commented Nov 6, 2023

@marco-ippolito you wrote:

so my idea to get started is :

1. run cdxgen for each package in `/deps` folder for npm packages,

2. run cdxgen for tools and github actions

3. document their origin and licenses for V8 and OpenSSL and other c++ dependencies

Would you suggest some tools for your point 3 and 4? or some reference

I suggest you get something started first with your plan.

For 3 and 4 I have some bit that are work in progress at https://github.com/nexb/elf-inspector and https://github.com/nexB/purldb/ ... scancode-toolkit also has some code to collect metadata from the README.chromium files used to document the metadata.

BTW, are there debug builds with debug symbols available? (with DWARFs for Linux and macOS and a PDB for Windows)

@prabhu
Copy link

prabhu commented Nov 6, 2023

cdxgen 9.9.2 was released with the required improvements. Will share an example workflow that will do both 1 and 2. (Aiming for single invocation). For 3, cdxgen currently supports vcpkg.json format to share additional metadata. You can create this file within the various folders, and the information will be used in the generated SBOM. Will also share some examples of this as well.

@marco-ippolito
Copy link
Member Author

I'm wondering which installation method should we use on our machine, link to guide guide

@prabhu
Copy link

prabhu commented Nov 21, 2023

I'm wondering which installation method should we use on our machine, link to guide guide

npm install with Java 21 must work. For CI, we can have a workflow that sets up the prereqs.

@UlisesGascon
Copy link
Member

I was reading about the possibilities to use SBOM in Docker images, and it seems that is possible using docker sbom or docker buildx build --sbom=true -t <myorg>/<myimage> --push . This might be a good option for the Docker Official images. What do you think?

References

@mrutkows
Copy link

mrutkows commented Jan 8, 2024

IMO, CycloneDX is the way to go (as it becomes an Ecma and hopefully an ISO standard with v1.6 due in Feb.) and will need to eventually have their specified ability to declare (quantum) crypto information and actual attestations as consumers are able to produce them.

Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants