Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Analyze command to guacone #1809

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

arorasoham9
Copy link
Contributor

@arorasoham9 arorasoham9 commented Apr 4, 2024

Description of the PR

Analyze command for guacone

Comparing largely similar SBOMs
guacone analyze diff --uri --sboms=https://anchore.com/syft/image/k8s.gcr.io/kube-apiserver-v1.24.1-583a02ce-8f7e-4794-91af-35f27ffeb73d,https://anchore.com/syft/image/k8s.gcr.io/kube-apiserver-v1.24.2-ee7e0a81-87de-4761-9689-4f7162d81e44
Screenshot 2024-05-16 at 7 40 44 PM

The diff is a 4 step process:

  • Create graph for SBOMs
  • Find all paths in both graphs
  • Find the paths that are present in one graph and not in the other, do this for both graphs.
  • Pair paths from one graph to the other that are closest, then find the difference between these pairs.

Each step has tests to ensure that it is stable. Meaning, if the steps are repeated for the same Input SBOM, the results are the same. The JSON you see are the test SBOMS I use to carry out the tests for each step of the diff. These JSON files are marshalled HasSBOM nodes for SBOMs I ingested into GUAC, pulled using the Node function.

PR Checklist

  • All commits have a Developer Certificate of Origin (DCO) -- they are generated using -s flag to git commit.
  • All new changes are covered by tests
  • If GraphQL schema is changed, make generate has been run
  • If OpenAPI spec is changed, make generate has been run
  • If collectsub protobuf has been changed, make proto has been run
  • All CI checks are passing (tests and formatting)
  • All dependent PRs have already been merged

@arorasoham9
Copy link
Contributor Author

@pxp928 ready for review.

@pxp928
Copy link
Collaborator

pxp928 commented Apr 4, 2024

Cool, @arorasoham9 can you add some screenshots to what the output will look like for the CLI?

// See the License for the specific language governing permissions and
// limitations under the License.

package guacanalyze
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one major task would be to add unit tests to this package. Some things like the graph, getting data from graphQL can be abstracted away but do you have some tests you can add here to test the functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any at the moment. I can write a few. Could they come in a future PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests as requested.

@arorasoham9
Copy link
Contributor Author

arorasoham9 commented Apr 4, 2024

Cool, @arorasoham9 can you add some screenshots to what the output will look like for the CLI?

Added some screenshots. The Prettify function errored out at many instances so I improvised to print the same details that function does but taken from the graph DS itself, not graphQL.

node *Node
)
if node, err = g.Vertex(ID); err != nil {
fmt.Println("Error setting node attribute", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to return this as an error.

)
if node, err = g.Vertex(ID); err != nil {
fmt.Println("Error setting node attribute", err)
os.Exit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use os.Exit(1) should return error


func GetNodeAttribute(g graph.Graph[string, *Node],ID, key string) interface{} {
var (
err error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to instantiate this here, can just it on line 72?

node.Attributes[key] = value
}

func GetNodeAttribute(g graph.Graph[string, *Node],ID, key string) interface{} {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this returning an interface{}

foundHasSBOMPkg, err = model.HasSBOMs(ctx, gqlclient, model.HasSBOMSpec{Subject: &model.PackageOrArtifactSpec{Package: &model.PkgSpec{Id: &pkgResponse.Packages[0].Namespaces[0].Names[0].Versions[0].Id}},
})
if err != nil {
fmt.Printf("(purl)failed getting hasSBOM with error :%v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use fmt.Printf, you should return an error via fmt.Errorf

func verifyAnalyzeFlags(slsas, sboms []string, errSlsa, errSbom error, uri, purl, id bool) {

if (errSlsa != nil && errSbom != nil) || (len(slsas) ==0 && len(sboms) == 0 ){
fmt.Println("Must specify slsa or sboms ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return a proper error and dont use os.Exit

namespaceTwo, okTwo := GetNodeAttribute(analysisGraph,diffList.MissingAddedRemovedLinks[i][1], "Namespace[0]").(string)

if !okOne || !okTwo {
fmt.Println("Error getting node namespace attribute")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up and pass out proper error

table.SetAlignment(tablewriter.ALIGN_LEFT)
table.Render()
if (!all && max > maxprint){
fmt.Println("Run with --all to see full list")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should user logger here: https://github.com/guacsec/guac/blob/main/pkg/handler/processor/process/process.go#L117

It should be able to be pulled from the context:

logger := logging.FromContext(ctx)

}

func HasSBOMToGraph(cmd *cobra.Command, ctx context.Context, gqlclient graphql.Client) ( []graph.Graph[string, *Node]){
slsas, errSlsa := cmd.Flags().GetStringSlice("slsa")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass in options or something similar and not the actual cmd. See: https://github.com/guacsec/guac/blob/main/cmd/guacone/cmd/s3.go#L80-L93

if uri {
hasSBOMResponseOne, err = FindHasSBOMBy(model.HasSBOMSpec{} ,sboms[0],"", "", ctx, gqlclient)
if err != nil {
fmt.Println("(uri)failed to lookup sbom:", sboms[0], err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix error

@@ -0,0 +1,66 @@
package guacanalyze_test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing header

Copy link
Collaborator

@jeffmendoza jeffmendoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly code organization and naming comments.

"github.com/guacsec/guac/pkg/logging"
"github.com/spf13/cobra"
"github.com/spf13/viper"
analysis "github.com/guacsec/guac/pkg/analyzer"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the import rename here?

if diff && intersect && union || diff && intersect || diff && union || intersect && union {
fmt.Println("Must specify only one of --diff, --intersect, --union")
return
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a positional argument if required, not a flag/option.

}

//create graphs
graphs := analysis.HasSBOMToGraph(cmd, ctx, gqlclient)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't pass cmd to a package under pkg/. Those shouldn't have a dependency on cli commands or cobra. Explicitly pass the parameters this needs.


func init() {

rootCmd.PersistentFlags().Bool("intersect", false, "compute intesection of given sboms")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add flags to rootCmd, this will show up on all guacone cmds if so. Use this analyzeCmd you are creating.
Also, cli options are normally under pkg/cli/store.go but this command has so many it might make sense to just leave them here.

rootCmd.PersistentFlags().Bool("purl", false, "input is a pURL")
rootCmd.PersistentFlags().Bool("id", false, "input is an Id")
rootCmd.PersistentFlags().Bool("metadata", false, "Compare SBOM metadata")
rootCmd.PersistentFlags().Bool("inclSoft", false, "Compare Included Softwares")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to incl-soft and similar for others. All cli options use hyphen-case.

rootCmd.PersistentFlags().Int("maxprint", PRINT_MAX, "max number of similar sboms to print")
rootCmd.AddCommand(analyzeCmd)

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline to the end of the file

)
if node, err = g.Vertex(ID); err != nil {
fmt.Println("Error setting node attribute", err)
os.Exit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't have any os.Exit in a package under pkg/, assume this is a library that can be called from anywhere. Propagate errors back to the caller.

return foundHasSBOMPkg, nil
}

func verifyAnalyzeFlags(slsas, sboms []string, errSlsa, errSbom error, uri, purl, id bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this should be under the file in cmd/, shouldn't have any flag validation here in pkg/.

//print to stdout
printHighlightedAnalysis(dot, diffList, all, maxprint, action, analysisGraph )
}
func max(nums []int) int {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an empty line between functions.

//Create dot file
createGraphDotFile(dot, analysisGraph)
//print to stdout
printHighlightedAnalysis(dot, diffList, all, maxprint, action, analysisGraph )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything under pkg/ should not print anything to stdout. Any logic should result in a datastructure that returns back to cmd/ and is then printed. This helps for testability. Looks like much of this may need to be moved back to cmd/

@pxp928
Copy link
Collaborator

pxp928 commented Apr 18, 2024

@arorasoham9 did the node functionality get fixed?

Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
@arorasoham9
Copy link
Contributor Author

@pxp928 Ready for review. Please let me know what changes you require to the output presentation.

Copy link
Contributor

@nathannaveen nathannaveen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @arorasoham9, I have a suggestion. Could you move the test data to https://github.com/guacsec/guac-test?

There are over 1,000,000 lines of code in this PR, making it virtually impossible to review, and without adequate reviews, security bugs could be introduced into the code.

If the test files were to be moved into guac-test, then the majority of this code wouldn’t be in this PR so that we can properly review it.

If the test files were moved to guac-test, this code would not be built and intern would not affect the end user.

I am not entirely sure whether the JSON files are part of executable code. Even if they aren't right now I am not sure whether we will in the future.

While JSON files are typically safe, there's a potential risk that they could contain encoded scripts, which might inadvertently introduce security vulnerabilities.

arorasoham9 and others added 4 commits May 23, 2024 19:02
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
Signed-off-by: Soham Arora <arorasoham9@gmail.com>
@arorasoham9
Copy link
Contributor Author

@nathannaveen Done! I have moved the HasSBOM JSON test files for gaucanalyze to guac-test. I have also added some explanation for having these JSON files to the description of this PR.

I will require this PR merged before I can send a final commit to reflect the code changes to accept the JSON test from guac-test.

@pxp928
Copy link
Collaborator

pxp928 commented May 24, 2024

@arorasoham9 any luck in improving the output? It is very hard to understand. Like if you can show less information and just show only what is relevant, that would be better.

@arorasoham9
Copy link
Contributor Author

arorasoham9 commented May 24, 2024

@pxp928 I am still thinking of ways to show only the relevant information. This is a better explanation for why you see so much even with just a small difference between two sboms. The diff works at the granularity of a path, not a node, so when it finds even one node different it classifies all paths with that participating node as different (and prints them). I believe the original aim was to find missing/different paths bw two sboms was to also offer a graph "structural" difference as well. I am trying to think how we can still offer the same differences in an easier to understand manner.

Copy link
Contributor

@lumjjb lumjjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arorasoham9 for working on this big feature!! definitely going to be very useful, would you be able to help break this up into a few separate PRs, that would be very helpful in getting this merged in faster!

maybe start with the analyzer package, and then add a table formatter pkg, and then the CLI options and the CLI itself?

uri, _ := cmd.Flags().GetBool("uri")
purl, _ := cmd.Flags().GetBool("purl")

metadata, _ := cmd.Flags().GetBool("metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these flags should be initialized in https://github.com/guacsec/guac/blob/main/pkg/cli/store.go, and preferably prefixed in a way that shows its for guacanalyze


if test == "" {
if err = verifyAnalyzeFlags(slsas, sboms, errSlsa, errSbom, uri, purl, id); err != nil {
fmt.Fprintf(os.Stderr, "Error: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: golang errors style guide is lowercase first characters.


//create graphs
graphs, err = hasSBOMToGraph(ctx, gqlclient, sboms, AnalyzeOpts{
Metadata: metadata, InclSoft: inclSoft, InclDeps: inclDeps, InclOccur: inclOccur, Namespaces: namespaces, URI: uri, PURL: purl, ID: id})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it into different lines for readability.

os.Exit(1)
}

//create graphs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the comment doesn't add much context to the code, either remove or expand.

}
}

if args[0] == "diff" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense for these to be subcommands? otherwise this should be a flag maybe?

return n.ID
}

func SetNodeAttribute(g graph.Graph[string, *Node], ID, key string, value interface{}) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide additional context and documentation around the functions? it is not clear what should be in value, unless there's a good reason, i would highly recommend not having interface{} as the type of value. If anything it can be a hidden type that can be minted by the analyzer but must provide sufficient constructors to make sense of it.


switch action {
//0 is diff
case 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an enum :)

return hex.EncodeToString(hash[:])
}

func AddGraphNode(g graph.Graph[string, *Node], _ID, color string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is color used for? Is it the identity of an edge of just for analysis purpose? can it be passed it as an option on the node for extensibility?

_ID should be id to follow golang style?

},
}

func printDiffedPathTable(diffs []analyzer.DiffedPath) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be helpful to move all printing stuff to either a different package or a different file

return nil
}

func readTwoSBOM(filename string) ([]graph.Graph[string, *analyzer.Node], error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why read two sboms from a file? i don't think that's a common pattern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants