Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design of the REST API #1544

Open
mdeicas opened this issue Nov 30, 2023 · 10 comments
Open

Design of the REST API #1544

mdeicas opened this issue Nov 30, 2023 · 10 comments

Comments

@mdeicas
Copy link
Collaborator

mdeicas commented Nov 30, 2023

#1326 and design doc provides the motivation for developing a new REST API in Guac. A proof of concept API Server has also already been added to Guac. This issue is to discuss how to move towards a more production ready solution.

Motivation

In general, the new API can address the limitations of the GQL API (i.e. the ontology API). To reiterate the points made in the linked issue and the design doc, these limitations include:

  • [1] The output of simple questions isn’t easily readable by humans or other systems.
    • E.g. Listing the dependencies of a package outputs trees, while PURLS are desired.
  • [2] Can’t easily express complex graph queries, or questions that require multiple queries
    • E.g. List all packages without SBOMs.
    • E.g. Find all packages three hops away from package x.
  • [3] Has high overhead for simple questions (requires learning GQL and the Guac ontology)

A REST API addresses [3], adding parsing/formatting capabilities (such as returning results in purl format) addresses [1], and adding analysis endpoints (such as those in Guacone) addresses [2].

Vision

The proposed vision for the server is that it will be a grab-bag of capabilities contributed to by the community, motivated by specific use cases, and not a reimplementation of the ontology API in REST. Users will be able to choose to query the ontology via the GQL API or more use case specific endpoints via the REST API.

The alternative to this is that the API also serves the ontology. However, the GQL API already exists and I don't think there is any motivation to add this now -- it could always be added in the future.

Requirements

With the above vision in mind, here are some requirements for the new API and server.

  • Schema-first
  • Server code stub can be generated from the schema
  • Client code can be generated from the schema, in various languages
  • It is easy to add new endpoints
  • The logic (parsing and analysis) component, the server component, and the data fetching (from gql server) component are all decoupled to enable possible reuse of components between Guacone and the Rest API and to facilitate testing.

Additionally, the frameworks used should be fast, modern, and use acceptable licenses.

Some questions to resolve are:

  • Should the API support pagination from the start?
  • Should we consider authentication?

I’ll follow up with some thoughts on what frameworks and code generators should be used.

@pxp928
Copy link
Collaborator

pxp928 commented Dec 1, 2023

Thanks @mdeicas!

For the questions:

  1. Pagination is being introduced on the GQL side via [feature] Enable pagination for GUAC GraphQL APIs #1525. If the REST API implements a query requiring multiple backend queries (and depending on the output), it would make sense to introduce pagination on REST API side from the beginning so that we do not run into re-work down the line.

  2. I would assume, based on the framework, that authentication should already be part of its implementation. We could start (like we have with the experimental one) without it and have the ability to add it later as needed.

@mdeicas
Copy link
Collaborator Author

mdeicas commented Dec 11, 2023

Standard Rest API

The standard approach is to define the schema with OpenAPI and generate server and client code.

Two code generators seemed the most promising:

For various reasons, I ruled out

In short, deepmap/oapi-codegen is better than OpenAPITools/openapi-generator because

  • (most importantly) it generates a more strongly typed server interface to implement
  • supports a superset of the web frameworks that OpenAPITools/openapi-generator does
  • Is more easily integrated to Guac (it is written in Go, OpenAPITools/openapi-generator is not)
  • Generates more query validation code than OpenAPITools/openapi-generator

The drawback of deepmap/oapi-codegen is that it can only generate client code in Go. To generate client code for other languages, OpenAPITools/openapi-generator would need to be used. Both of these tools use the Apache license.

For reference, here is some of the generated code for an OpenAPI endpoint, SearchPackageNames, that takes a single string parameter and outputs a list of purls.

The server interface from deepmap/oapi-codegen looks like:

type SearchPackageNamesResponseObject interface {
	VisitSearchPackageNamesResponse(w http.ResponseWriter) error
}

// generated code implements the above interface
type SearchPackageNames200JSONResponse PurlList

// generated code implements the above interface
type SearchPackageNamesdefaultJSONResponse struct {
	Body       Error
	StatusCode int
}

type StrictServerInterface interface {
	// (GET /search/packages/names)
	SearchPackageNames(ctx context.Context, request SearchPackageNamesRequestObject) (SearchPackageNamesResponseObject, error)
}

And the server interface from OpenAPITools/openapi-generator looks like:

type ImplResponse struct {
	Code int
	Body interface{}
}

type DefaultAPIServicer interface {
	SearchPackageNames(context.Context, string) (ImplResponse, error)
}

Web framework

Oapi-codegen supports echo by default, but chi, gin, mux, fiber, and iris have been added by the community. However, oapi-codegen can only generate the strongly typed interface shown above for chi, gin, and echo.

Chi, gin, and echo are all fast and commonly used. Echo and gin are more full-featured web frameworks, while chi is an improved version of the net/http router.

Gin and echo provide their own context types, which make the Guac style of passing the logger through the context (i.e. logger := logging.FromContext(ctx)) a bit more complicated. They both provide ways to configure and use loggers, but it might be better and more consistent to log in the same way as the rest of the codebase.

In short, this can be done with Echo by passing the logger through the http.Request context that is nested in the echo.Context, but this doesn’t work with Gin. It can be made to work, but only if oapi-codegen is configured to generate a less strongly typed interface.

Chi is more lightweight, but still serves all of our use cases. It is compatible with go’s http package, so the server could always be modified to use another framework in the future without reimplementing the middleware.

So I think either echo or chi would be good options.

gRPC Gateway

There is another technique used to expose both a Rest API and an RPC API with one service implementation. It involves implementing a gRPC API as usual, and then using protoc with the grpc-gateway plugin to generate a Rest API Server from an annotated protobuf API specification. This generated server simply forwards requests to the gRPC server. Grpc-gateway can also generate an OpenAPI V2 schema for the generated server. There is a helpful diagram of this design in the grpc-gateway readme.

Client code, generated with protoc, makes requests directly to the gRPC server. The REST HTTP Server would then only be used to serve a webpage or by individual users (e.g. with curl), as any programs could use the gRPC client code directly. In the absence of such use cases, the REST HTTP Server does not need to run.

The protobuf service definition needs to be annotated with the mapping of RPC methods to HTTP endpoints by adding a google.api.http option. There is some documentation on this transcoding here https://google.aip.dev/127 and in the linked protobuf. It looks like this

message PackageName {
  string name = 1;
}

message PurlList {
  repeated string value = 1;
}

service Guac {
  rpc SearchPackageNames (PackageName) returns (PurlList) {
    option (google.api.http) = {
      get:"/search/packages/names"
    };
  }
}

The interface to implement is standard, as generated by protoc:

type GuacServer interface {
	SearchPackageNames(context.Context, *PackageName) (*PurlList, error)
}

The main drawback of this approach is the overhead of running two new servers, which would increase especially in a production environment. The benefits are staying in the gRPC ecosystem, which Guac is already familiar with. Another note is that gRPC supports returning large lists of results natively with streaming gRPCs, which could eliminate the need to add pagination to the API.

Rest API Server with gRPC Handler

Grpc-gateway provides another way to serve a Rest API once a gRPC API has been implemented, but with a single server instead of two. It generates boilerplate code that adapts each incoming request so that it can be handled by the gRPC handler instead of a regular http handler. This approach results in a single REST HTTP server, implemented by way of a gRPC service handler. As before, an OpenAPI schema that specifies the server can also be generated.

This approach is a bit awkward because it mixes paradigms. The API is specified in protobuf, but all of the middleware is implemented as HTTP server middleware. Furthermore, gRPC features such as streaming RPCs are not supported, and only a single http web framework, mux, is supported. Finally, as the server running is a REST HTTP server, client code cannot be generated by protoc. Another code generator for OpenAPI, such as those discussed in the first approach, must be picked up.

Conclusion

I think the decision comes down to how much overhead results from running two servers versus how much benefit is gained from staying in the gRPC + protobuf ecosystem, as implementing both (choosing oapi-codegen) seems to be fairly equivalent. In my opinion, the simpler approach of directly implementing the REST HTTP server seems better.

@lumjjb
Copy link
Contributor

lumjjb commented Dec 14, 2023

Thanks for writing this detailed analysis @mdeicas !

I think having a proto definition which we use already use in GUAC would be nice, but like you said it mixes the paradigms, and I think that it introduces a world where you'd want to use gRPC for some things and then REST for the other, which ends up not having any of them be "first-class". Since most policy engines will likely use REST (due to overhead of creating a gRPC client), the native support there should be priority. Thus having to add pagination on top of a gRPC streaming implementation.

RE: REST API

The drawback of deepmap/oapi-codegen is that it can only generate client code in Go.

I think I misunderstood this the first time round so want to clarify - this means that no other language has a client code generator, not that go only implements the client codegen, but not the server codegen.

Given this is REST, i don't think there's a tight coupling for clients, so it should be ok for other languages to use a different codegen - it should still work yea?

It looks like both projects are used widely and maintained. It also looks like the cost of moving frameworks will likely be low if for some reason we have to do it.. so i think we should go ahead with whichever we feel most comfortable with.

RE: Web frameworks

I don't think we have too many requirements here, our bottleneck will likely be the backend - so imo the simpler the better. If we do have to switch we should pick one that is supported by the other preferred codegen, OpenAPITools/openapi-generator, which seems to be net/http, Gin, Echo.

Would love to see comments from other folks who have maybe used some of these libraries/generators before.

@mdeicas
Copy link
Collaborator Author

mdeicas commented Dec 14, 2023

The drawback of deepmap/oapi-codegen is that it can only generate client code in Go.

I think I misunderstood this the first time round so want to clarify - this means that no other language has a client code generator, not that go only implements the client codegen, but not the server codegen.

That's right, sorry for the confusion. deepmap/oapi-codegen generates Go server code and Go client code, but not Java or Rust client code.

Given this is REST, i don't think there's a tight coupling for clients, so it should be ok for other languages to use a different codegen - it should still work yea?

Yup I think this would be fine.

@mdeicas
Copy link
Collaborator Author

mdeicas commented Dec 18, 2023

Adding the OpenAPI Spec I used to generate the examples above for reference:

openapi: "3.0.0"
paths:
  "/search/packages/names":
    get:
      summary: Search packages by package name
      operationId: searchPackageNames
      parameters:
        - name: name
          in: query
          description: the name to search for
          required: true
          style: form
          schema:
            type: string
      responses:
        "200":
          description: A list of purls that match the search
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/PurlList"
        default:
          description: unexpected error
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Error"

components:
  schemas:
    Purl:
      type: string
    PurlList:
      type: array
      items:
        $ref: "#/components/schemas/Purl"
    Error:
      type: object
      required:
        - code
        - message
      properties:
        code:
          type: integer
          format: int32
        message:
          type: string

@mdeicas
Copy link
Collaborator Author

mdeicas commented Jan 12, 2024

We're going to go with a standard REST API because it is the simplest option and, and use oapi-codegen over OpenAPITools/openapi-generator for the reasons outlined in a previous comment. Either echo or chi are good options, but we'll go with chi for now because it uses the standard library context and http handlers, which makes things a bit simpler.

@mrizzi
Copy link
Collaborator

mrizzi commented Jan 17, 2024

@mdeicas thanks a lot for this analysis.

Myself and @dejanb have been investing in creating some new (GQL) endpoints in a fork of guac for some specific use cases we tackled.

A feedback I can share is that our initial approach has been exactly what has been proposed here, i.e. connecting to GQL ontology endpoints in order to have our new endpoints available, no matter the running Guac's backend (and so then easily contribute upstream the new endpoints).

The issue with this approach, at least for us, has been about performances: being able to load all the data to run the correlations in order to create the response was heavily memory and time consuming with the Ent backend.

In the end, we had to abandon the "GQL ontology endpoints" approach and, in our solution, create directly specific Ent queries for covering the requirements from our use cases.

The drawback of optimizing new endpoints letting them to interact directly with the backend is that you have to provide an implementation of each REST endpoint for each backend but I think it's an expected consequence for having multiple backends.

If having the requirement for REST endpoints to interact only with GQL is mandatory, I can see two options when implementing a new REST endpoint:

  1. leveraging only the available GQL ontology endpoints (as required)
  2. or create a new GQL endpoint (implementing it in -at least- one backend) that the REST one can interact with to get the data

If having the requirement for REST endpoints to interact only with GQL is NOT mandatory, then each REST endpoint could be allowed to leverage GQL ontology endpoints OR directly connect to the backend.

I would like to collect everyone's feedback on this (@mdeicas @pxp928 @lumjjb)

@dejanb
Copy link
Contributor

dejanb commented Jan 17, 2024

By allowing REST endpoint to directly use backend we can open it for experimenting with new use cases and advanced queries. Once those use cases gain traction and prove to be generally useful, we can turn them into GQL query that should be implemented by the other backends.

@mdeicas
Copy link
Collaborator Author

mdeicas commented Jan 17, 2024

Yup I think these are good points. Using the ontology API as a level of indirection also limits the use of capabilities that each datastore may have, such as native graph traversal queries.

I think we should support a default implementation of most or all REST API endpoints using the ontology API, to serve as a reference and for the inmem backend.

After that, to support endpoints that depend directly on datastores, I agree with @dejanb to first add them in the REST API and add a way to swap in the optimized implementation depending on which backend is being used. This avoids GQL schema changes that would be necessary in the alternative approach, which is beneficial because changing this is not as lightweight. And yes, if one of these becomes generally useful, we can decide then whether to upgrade it to the GQL API.

@lumjjb
Copy link
Contributor

lumjjb commented Jan 29, 2024

yea +1. Hmm ideally i think it should have a standard interface, although i think that given the experimental nature of this - and to better understand optimization issues, i think it would make sense to directly work with some backends only, and then slowly work evolve towards getting to a v2 of the ontology interface (GQL or otherwise) to a v2 which will work effectively for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants