Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of cached values in Rack::Request. #2054

Merged
merged 4 commits into from Mar 15, 2023

Conversation

ioquatix
Copy link
Member

@ioquatix ioquatix commented Mar 13, 2023

This PR introduces two private cache helpers for Rack::Request, based on a proposal from @matthewd:

  1. cache_for: Things which are expected to be per-request, and are cached in env, and depend on a specific key matching a current input value.
  2. class_cache_for: Things which are expected to be per-request-class, and are cached in env using a per-class cache key, and depend on a specific key matching a current input value.

It changes the structure of the env cache keys slightly when class_cache_for is used, as this uses self.cache_key as a key in the cache lookup.

With this change, the bespoke cache logic is now centralised in these two methods, which also unify error handling.

While I'm slightly against this level of complexity, it is required for sub-classes of request which change the behaviour of def GET and def POST to behave correctly. I have confirmed this works with Rails which essentially sub-classes Rack::Request with the following addition:

class MyRequest < Rack::Request
  class QueryParser < ::Rack::QueryParser
    def missing_value
      nil
    end
  end

  QUERY_PARSER = QueryParser.make_default(32)

  def cache_key
    :my_request
  end

  def query_parser
    QUERY_PARSER
  end
end

I'm mostly happy that this makes sense and is a reasonable implementation.

Some thoughts, in no particular order:

  • A mutable env makes caching harder, e.g. since at any time QUERY_STRING can change. I feel like this is a poor design, and might be a strong signal that we need to rethink how this cache works so that it can be more efficient and/or predictable. For example, I don't understand what the use case is for mutating QUERY_STRING. It's also true that we key on rack.input - and this also doesn't make a huge amount of sense (why is this being updated).
  • Creating instances of Rack::Request for the same env is inefficient, and it would be good to avoid doing this. Since some things should be cached per-env, e.g. query parameter list/body parameter list, but other things should be cached per request instance, e.g. the interpretation of these fields and potentially other fields. IIRC, there was some level of memoization of Rack::Request in env, I don't recall if this existed or why it was removed.
  • I'm generally against this level of complexity in Rack::Request but it seems like we don't have much choice given the circumstances. I would like to think that Rails could adopt a more standards based approach to query parsing and/or we can simplify the cache model by some of the other proposed changes, e.g. an additional interface which uses passes a single Rack::Request instance through the middleware or perhaps some changes to how env mutability is considered and/or instances of Rack::Request are cached.
  • Maybe some of this falls out of @matthewd's desire to simplify the query parser interface, perhaps splitting it up or reworking it (probably more of a Rack 4 thing since it sounds like a major breaking change to the interface).

Potential solution to #2053.

@ioquatix ioquatix force-pushed the rack-request-cache-keys branch 2 times, most recently from 888ac32 to 337c70d Compare March 14, 2023 01:56
@ioquatix ioquatix marked this pull request as ready for review March 14, 2023 03:18
@ioquatix ioquatix changed the title Add per-instance cache keys. Fix handling of cached values in Rack::Request. Mar 14, 2023
@ioquatix
Copy link
Member Author

ioquatix commented Mar 14, 2023

We will also want to back port this to 3-0-stable... this should be the last change required to get Rails over the finish line. IOW, without this PR, the current cache implementation isn't sufficient at best, and at worst is buggy.

@ioquatix
Copy link
Member Author

See rails/rails#47652 for the Rails PR.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class-based caching seems like unnecessary complexity to me. The underlying issue seems to be that you don't want Rails using the same key as Rack uses, so why not just allow for easily using a separate key?:

diff --git a/lib/rack/request.rb b/lib/rack/request.rb
index e6969645..c3f8caa1 100644
--- a/lib/rack/request.rb
+++ b/lib/rack/request.rb
@@ -483,12 +483,12 @@ module Rack
       # Returns the data received in the query string.
       def GET
         if get_header(RACK_REQUEST_QUERY_STRING) == query_string
-          if query_hash = get_header(RACK_REQUEST_QUERY_HASH)
+          if query_hash = get_header(query_hash_key)
             return query_hash
           end
         end

-        set_header(RACK_REQUEST_QUERY_HASH, expand_params(query_param_list))
+        set_header(query_hash_key, expand_params(query_param_list))
       end

       def query_param_list
@@ -508,12 +508,12 @@ module Rack
       # multipart/form-data.
       def POST
         if get_header(RACK_REQUEST_FORM_INPUT).equal?(get_header(RACK_INPUT))
-          if form_hash = get_header(RACK_REQUEST_FORM_HASH)
+          if form_hash = get_header(form_hash_key)
             return form_hash
           end
         end

-        set_header(RACK_REQUEST_FORM_HASH, expand_params(body_param_list))
+        set_header(form_hash_key, expand_params(body_param_list))
       end

       def body_param_list
@@ -649,6 +649,14 @@ module Rack

       private

+      def query_hash_key
+        RACK_REQUEST_QUERY_HASH
+      end
+
+      def form_hash_key
+        RACK_REQUEST_FORM_HASH
+      end
+
       def default_session; {}; end

       # Assist with compatibility when processing `X-Forwarded-For`.

Rails (and other frameworks who want to differ from Rack's default behavior) would then override #query_hash_key and #form_hash_key to use a framework-specific key.

@matthewd
Copy link
Contributor

The class_cache_for structure is definitely added complexity... I have mixed feelings on whether it's unnecessary, though.

There is an existing known issue in Rack, in that Rack::Request is designed to allow subclasses to customize how the parameters are parsed, but then all instances, regardless of their implementation, will use the same env-cached value that is stored by whoever happens to run first.

Rails in particular can avoid the problem by setting a different key (plus a bit more -- either using or implementing cache_for, because the existing Rack behaviour depends upon the set_header(..., nil) lines your diff isn't touching). The trouble there is that Rack is claiming this is a customizable behaviour, but you're actually obliged to override several additional methods (like GET and POST) to prevent the base class performing API-breaking caching.

@ioquatix
Copy link
Member Author

@jeremyevans We tried your proposed approach already, but it doesn't work because we don't have the full set of keys to invalidate, if say, QUERY_STRING is changed. That's solved by using a hash of class => value because invalidation is just clearing that key and it clears all classes with cached state.

@ioquatix
Copy link
Member Author

FYI, here are the invalidations:

set_header RACK_REQUEST_FORM_HASH, nil

set_header RACK_REQUEST_QUERY_HASH, nil

This cache key defaults to `:rack`. Sub-classes need to explicitly change
this if they alter the way cached values are generated.
Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have more documentation for the added methods explaining why the complexity is needed, but that doesn't need to block this.

@tenderlove
Copy link
Member

Would be nice to have more documentation for the added methods explaining why the complexity is needed, but that doesn't need to block this.

I personally would like some comments before merging this. The added complexity isn't really self-evident from reading the code. Also if we're doing this to support subclasses, shouldn't we be adding tests that use subclasses?

There is only one test changed in this PR and it's not very clear to me how/why that drives the rest of the changes in the PR.

@ioquatix ioquatix merged commit d25fedd into rack:main Mar 15, 2023
@ioquatix ioquatix deleted the rack-request-cache-keys branch March 15, 2023 03:37
ioquatix added a commit to ioquatix/rack that referenced this pull request Mar 15, 2023
* Per-class cache keys for cached query/body parameters.

* Use the query parser class as the default cache key.
ioquatix added a commit to ioquatix/rack that referenced this pull request Mar 15, 2023
* Per-class cache keys for cached query/body parameters.

* Use the query parser class as the default cache key.
ioquatix added a commit to ioquatix/rack that referenced this pull request Mar 15, 2023
* Per-class cache keys for cached query/body parameters.

* Use the query parser class as the default cache key.
ioquatix added a commit to ioquatix/rack that referenced this pull request Mar 15, 2023
* Per-class cache keys for cached query/body parameters.

* Use the query parser class as the default cache key.
jeremyevans added a commit to jeremyevans/rack that referenced this pull request Mar 16, 2023
ioquatix pushed a commit that referenced this pull request Mar 16, 2023
* Revert "Prefer to use `query_parser` itself as the cache key. (#2058)"

This reverts commit 5f90c33.

* Revert "Fix handling of cached values in `Rack::Request`. (#2054)"

This reverts commit d25fedd.

* Revert "Add `QueryParser#missing_value` for handling missing values + tests. (#2052)"

This reverts commit 59d9ba9.

* Revert "Split form/query parsing into two steps (#2038)"

This reverts commit 9f059d1.

* Make query parameters without = have nil values

This was Rack's historical behavior.  While it doesn't match
URL spec section 5.1.3.3, keeping the historical behavior avoids
all of the complexity required to support the URL spec standard
by default, but also support frameworks that want to be backwards
compatible.

This keeps as much of the specs added by the recently reverted
commits that make sense.
ioquatix added a commit to ioquatix/rack that referenced this pull request Mar 16, 2023
* Revert "Prefer to use `query_parser` itself as the cache key. (rack#2058)"

This reverts commit 5f90c33.

* Revert "Fix handling of cached values in `Rack::Request`. (rack#2054)"

This reverts commit d25fedd.

* Revert "Add `QueryParser#missing_value` for handling missing values + tests. (rack#2052)"

This reverts commit 59d9ba9.

* Revert "Split form/query parsing into two steps (rack#2038)"

This reverts commit 9f059d1.

* Make query parameters without = have nil values

This was Rack's historical behavior.  While it doesn't match
URL spec section 5.1.3.3, keeping the historical behavior avoids
all of the complexity required to support the URL spec standard
by default, but also support frameworks that want to be backwards
compatible.

This keeps as much of the specs added by the recently reverted
commits that make sense.
# Conflicts:
#	lib/rack/multipart.rb
#	lib/rack/request.rb
#	test/spec_request.rb
ioquatix added a commit that referenced this pull request Mar 16, 2023
* Revert "Prefer to use `query_parser` itself as the cache key. (#2058)"

This reverts commit 5f90c33.

* Revert "Fix handling of cached values in `Rack::Request`. (#2054)"

This reverts commit d25fedd.

* Revert "Add `QueryParser#missing_value` for handling missing values + tests. (#2052)"

This reverts commit 59d9ba9.

* Revert "Split form/query parsing into two steps (#2038)"

This reverts commit 9f059d1.

* Make query parameters without = have nil values

This was Rack's historical behavior.  While it doesn't match
URL spec section 5.1.3.3, keeping the historical behavior avoids
all of the complexity required to support the URL spec standard
by default, but also support frameworks that want to be backwards
compatible.

This keeps as much of the specs added by the recently reverted
commits that make sense.
# Conflicts:
#	lib/rack/multipart.rb
#	lib/rack/request.rb
#	test/spec_request.rb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants