New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split form/query parsing into two steps #2038
Conversation
Additional context: this is intended in aid of rails/rails#47080, allowing Rails to re-use Rack's [consuming] parse of the input stream, reading the same syntactically-defined set of key-value pairs, but applying slightly different rules in the higher-level transformation into the deeply-nested params hash format. |
If the entire point of this is just for Rails to keep compatibility (no diff --git a/lib/rack/query_parser.rb b/lib/rack/query_parser.rb
index 1592a01e..4e169788 100644
--- a/lib/rack/query_parser.rb
+++ b/lib/rack/query_parser.rb
@@ -128,7 +128,7 @@ module Rack
return if k.empty?
- v ||= String.new
+ v ||= missing_param_value_default
if after == ''
if k == '[]' && depth != 0
@@ -174,6 +174,10 @@ module Rack
private
+ def missing_param_value_default
+ String.new # return nil in Rails
+ end
+
def params_hash_type?(obj)
obj.kind_of?(@params_class)
end @matthewd What are your thoughts on that approach? I'm not sure whether Rails uses a custom QueryParser subclass already or not, but that's an easy change to make if not. |
@jeremyevans The problem that @tenderlove had with making a custom query parser is that there is no "custom query builder" that matches it. We already copied the Rack 2.x query parser to rails but that PR looks like it will be rejected. Rails also has "deep munge" which modifies the hash generated by the query parser, so it's multi-layered. The problem with modify the behaviour of Rack is that it bifurcates the testing surface area for middleware. For example, one way for Rack, and a possibly different way for Rails (query parameter handling). A simple way would just be to revert Rails query handling to the standard, but I'm also not sure that's acceptable. |
I find I wonder if we can avoid replicating them into more method names and instead perhaps be more descriptive at the same time, e.g. |
If this is an issue, it suggests that we would want to add a QueryBuilder class to address it. That shouldn't affect how we implement the query parsing change.
Not surprising, as that approach sounds awful to me.
Assuming it is operating on the hash generated, it doesn't matter how the hash is generated, so the much simpler fix I am proposing should still work with deep munge.
This is going to be true if Rails and Rack behavior is different, regardless of how the difference is implemented.
That's the approach I chose with Roda, and nobody has brought it up as an issue. Obviously Roda usage is a small fraction of Rails usage, but it's not like this issue is going to affect the majority of applications. How small a percentage, I'm not quite sure. I think there are a few gems designed around use with Rails that rely on the old behavior and would need to be updated if they haven't already been. |
I don't have a strong opinion about any of this. I'm just trying to be a gap filler. We will ultimately have to make a decision that lands somewhere on:
|
@ioquatix Is it intentional that you are ignoring the approach I proposed in #2038 (comment), which allows Rails to easily get backwards compatibility with minimal changes to Rack itself? In terms of building queries, note that the behavior didn't change between Rack 2 and Rack 3: # In both Rack 2.2 and 3.0
build_nested_query("a"=>nil)
# => "a"
build_nested_query("a"=>"")
# => "a=" There is no way to get symmetric behavior for query parsing and query building, there are multiple cases where separate hashes passed to the query builder will result in the same query string, and obviously you can't have the same query string parse to multiple hashes if you want deterministic behavior. |
I'm fine with your suggested approach but it doesn't seem materially different from "We already copied the Rack 2.x query parser to rails but that PR looks like it will be rejected." I agree the implementation would be significantly better if we implement your proposed approach, but it will still require having a custom query parser sub-class in Rails. If @tenderlove and @matthewd are fine with that, then I have no problem with it. Otherwise, as it stands, that approach was already rejected. That being said, I don't think your implementation is a bad addition :) |
I think there is huge difference from forking the entire parser to overriding a single one-line method.
True, but it's a simple subclass with one method override, plus an override of
"that approach was already rejected" only applies if you consider my approach the same as forking the parser. As stated above, I think the approaches are completely different. Anyway, let's see what @matthewd and @tenderlove think. |
IMO it is not acceptable for a non-Rails middleware (or even a co-mounted Sinatra or Roda application) to see a different parse result depending on whether Rails is loaded -- and right now, that would be necessary with the proposed change: the parser consumes the (non-rewindable) input. My intention here is to separate the "smart" part of the parsing, which is a nonstandard but convenient application-level behaviour and applies regardless of how the k-v pairs were encoded, from the two low-level mechanical and protocol-specific parts. This proposal is about making it possible to have more than one QueryParser read the input pairs... after making that possible, I do think the change you describe could make for a much nicer way for Rails to implement a custom QueryParser rather than a full copy. At the moment, I argue that Rack implies that you might e.g. have two different Request options, with different QueryParser behaviours, examine the same Beyond the technicality of naming, I would like us and other future libraries to retain the ability to implement additional parsing behaviour downstream, without affecting independent receivers of the same |
IMO, the approach in this PR adds a lot of unnecessary complexity, just because Rails does not want to deal with a very minimal behavior change to conform with the standard.
Rails generally controls its own middleware stack (https://guides.rubyonrails.org/rails_on_rack.html#configuring-middleware-stack), and it could add a middleware for Rails-specific parsing at the top of the stack, using the much simpler approach I propose. Are you worried about users adding middleware to
The only reason this PR does what you want is you are implicitly encoding the old behavior (
This results in N+1 additional array allocations, with N being the number of elements being parsed. That's a significant cost for something with no benefit for the vast majority of users, and that makes this approach a poor-tradeoff in my opinion.
That's still a problem with your approach. What happens if a middleware sets
For a query string, you can always reparse. With the change to non-rewindable bodies, there is no longer a way to ensure that is the case for bodies. However, since Rails generally controls its middleware stack, I'm not sure why this is an issue for Rails. Implementation-wise, I also don't think it's a good idea to change the return type of |
Fair. My theory is that this API would aim not to present the most externally-compliant interface, but the least internally-lossy one, allowing the consumer to read as much or as little into that distinction as they choose.
It adds one array allocation for query parsing (for each of GET and POST, when applicable). It does add N+1 for multiparts, but if we care I think that's potentially recoverable -- that parser does a complicated job, so it's totally understandable, but it does not seem especially allocation-conservative at the moment.
That would be an API violation: the point of separating them is to have them available as a common reference point for any subclass to adapt into its unique parameter presentation. It's a new rack-namespaced key, so should be in no more compatibility danger than someone setting any other key to their own mangled value (read: it's not prevented, but it's also not necessary, and thus their fault when it very obviously breaks). On a related note, though, while subclasses would be free to make their own interpretation of
If Rails, or more generally an ActionPack-based controller, is no longer able to be mounted into an existing Rack application [and interpret the user's input as it otherwise would], or is no longer able to itself mount another Rack application [without affecting that application's understanding of its inputs], then as far as I can see, Rails is no longer Rack compatible. Globally reconfiguring and/or overriding Rack behaviour can obviously fix any problem Rails has that isn't "my mounted Rack application behaves differently when Rails is present". It's an option, but one I see as the very last resort, as we'd then be running in an environment functionally equivalent to a fork of Rack.
There is no standard that says However this also goes deeper: the no-value case is the one we're currently looking at, because it's immediate, but it's revealing a more general issue that Rack has currently defined-away the ability for a library to invent a new parameter-interpretation methodology without imposing that upon the entire [in-process] Rack ecosystem. i.e., the change I'm actually seeking to work around is the removal of rewindability on inputs (which has reasonable technical justification, but does create this new situation where you only get one shot at parsing). The Moreover, after Rack has consumed the input stream (e.g. in a middleware), it would currently be impossible for a mounted application to produce the all-flat-strings structure dictated by the previously-referenced standard. More pointedly, I believe this is supporting [the spirit of] the "subclass
It's certainly code-additive, in large part because it's trying to slot into/over the existing APIs, which are all public -- this is especially obvious in the introduction of If this is deviating from your vision of how these classes' interaction would evolve, I'm happy to do the work to explore alternative arrangements... I really don't think teasing the QueryParser away from the Multipart::Parser is exactly a simple code -> complex code transition, though. With the benefit of a deprecation brush and a bit more rearrangement, even setting aside how it might help downstream and looking at Rack's implementation in isolation, I think the separate code pieces will be more focused and better contained with separation into three independent classes, for parsing of querystrings, parsing of multipart streams, and inflation of complex parameter names, respectively. |
If we accepted this, we would want the
If a Rack middleware creates a Rack::Request object with a custom query parser, then Rack::Request will write the resulting (non-blessed?) params to RACK_REQUEST_FORM_HASH. That's how it's been since Rack 2, I think.
The original reasoning given was that this PR would allow Rails to use
I agree with most of what you wrote. I think this comes back to the question, what problem are you trying to solve? If you are trying to solve the
If you are trying to add the ability for applications/middleware/frameworks using Rack to more easily support arbitrary parameter interpretations (e.g. |
I'm not really seeing the complexity here. The proposed PR seems like a straightforward "extract method" refactor such that subclasses can do their own thing. The way I see this is we're keeping an abstract representation of the query (similar to storing a parse tree). Then downstream consumers can choose to interpret the AST as they see fit ( I think it's a pretty elegant approach. Since we cannot provide rewindable input bodies, I think we need to provide something downstream that doesn't lose information.
This is an abstract representation, and
Maybe |
I personally think that using There is nothing about |
|
That seems reasonable to me, I'd also suggest avoiding abbreviations, i.e. |
The current API has |
However, it's not referred to as "query string" in any modern RFC that I'm aware of. Usually just referred to as the "query" or "query part of the URI", e.g. https://www.rfc-editor.org/rfc/rfc3986#section-3.4 And if you go looking for modern wording, it's often referred to as "query parameters", e.g. https://www.rfc-editor.org/rfc/rfc8820#section-2.4 In always thought the usage of |
Interesting! RFC 1738 uses "query string", but by 2396 it's "query" and "<query> component". Technically Rack does already use "query parser" /
If anything I'd personally lean the other way, suggesting that the formality of I'll tidy this up with something that fits, then leave final naming up to y'all. 👍 |
Apologies for the delay on this! As it's (currently?1) intended as a semi-internal API, I've gone light on the testing, and just added an assertion into the existing Request parsing spec. I've retained the private Request#parse_query and Request#parse_multipart as-is, purely for the benefit of downstream subclasses. I've also taken a maximally-conservative approach to compatibility on the new QueryParser#split_query call, first trying Request#query_parser, then Utils.default_query_parser, and if necessary falling back to creating a new instance of the built-in class directly. As a pretty pure extract-method refactoring, this should be safe for backporting to allow consistent compatibility. I do think this change creates, and in some places exacerbates, some mismatches in the relationships between the mostly-internal classes (especially around QueryParser and its callers); I suspect there's value in some follow-up deprecations and external-API narrowing, but I'm deferring proper exploration of that because it would be clearly ineligible for backporting to the 3.0 series, and smoothing that transition is my current priority. For the method names, I've currently gone with Footnotes
|
Rebased on main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks fine to me.
First we parse the raw input into a stream of [key, value] pairs, and only after that do we expand that into the deep params hash. This allows a user to operate directly on the pair stream if they need to apply different semantics, without needing to rewind the input, and without creating a conflict with anything else (like a middleware) that wants to use Rack's standard GET / POST hash format.
489aa51
to
736fb3f
Compare
* Split form/query parsing into two steps First we parse the raw input into a stream of [key, value] pairs, and only after that do we expand that into the deep params hash. This allows a user to operate directly on the pair stream if they need to apply different semantics, without needing to rewind the input, and without creating a conflict with anything else (like a middleware) that wants to use Rack's standard GET / POST hash format.
This reverts commit 9f059d1.
* Revert "Prefer to use `query_parser` itself as the cache key. (#2058)" This reverts commit 5f90c33. * Revert "Fix handling of cached values in `Rack::Request`. (#2054)" This reverts commit d25fedd. * Revert "Add `QueryParser#missing_value` for handling missing values + tests. (#2052)" This reverts commit 59d9ba9. * Revert "Split form/query parsing into two steps (#2038)" This reverts commit 9f059d1. * Make query parameters without = have nil values This was Rack's historical behavior. While it doesn't match URL spec section 5.1.3.3, keeping the historical behavior avoids all of the complexity required to support the URL spec standard by default, but also support frameworks that want to be backwards compatible. This keeps as much of the specs added by the recently reverted commits that make sense.
* Revert "Prefer to use `query_parser` itself as the cache key. (rack#2058)" This reverts commit 5f90c33. * Revert "Fix handling of cached values in `Rack::Request`. (rack#2054)" This reverts commit d25fedd. * Revert "Add `QueryParser#missing_value` for handling missing values + tests. (rack#2052)" This reverts commit 59d9ba9. * Revert "Split form/query parsing into two steps (rack#2038)" This reverts commit 9f059d1. * Make query parameters without = have nil values This was Rack's historical behavior. While it doesn't match URL spec section 5.1.3.3, keeping the historical behavior avoids all of the complexity required to support the URL spec standard by default, but also support frameworks that want to be backwards compatible. This keeps as much of the specs added by the recently reverted commits that make sense. # Conflicts: # lib/rack/multipart.rb # lib/rack/request.rb # test/spec_request.rb
* Revert "Prefer to use `query_parser` itself as the cache key. (#2058)" This reverts commit 5f90c33. * Revert "Fix handling of cached values in `Rack::Request`. (#2054)" This reverts commit d25fedd. * Revert "Add `QueryParser#missing_value` for handling missing values + tests. (#2052)" This reverts commit 59d9ba9. * Revert "Split form/query parsing into two steps (#2038)" This reverts commit 9f059d1. * Make query parameters without = have nil values This was Rack's historical behavior. While it doesn't match URL spec section 5.1.3.3, keeping the historical behavior avoids all of the complexity required to support the URL spec standard by default, but also support frameworks that want to be backwards compatible. This keeps as much of the specs added by the recently reverted commits that make sense. # Conflicts: # lib/rack/multipart.rb # lib/rack/request.rb # test/spec_request.rb
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
Rack v3.0.7 rolled back this change with rack/rack#2059 Other references: - rails/rails#47652 - rack/rack#2052 - rack/rack#2038
First we parse the raw input into a stream of [key, value] pairs, and only after that do we expand that into the deep params hash.
This allows a user (or framework) to operate directly on the pair stream if they need to apply different semantics -- without needing to rewind the input, and without creating a conflict with anything else (like a middleware) that wants to use Rack's standard GET / POST hash format.
The names (
flat_GET
/flat_POST
) are terrible, and definitely need better alternatives for this to be seriously considered.This is currently presented as a minimal and additive change, just extracting part of the existing functionality into a separately-exposed inner parsing step, for ease of review (and possible backporting to the 3.0 series).
That being said, it feels like this sets up some possible longer-term refactoring to more cleanly separate the two parsing phases, and thus avoid some currently awkward redundancy between the separated parsing of queries and multiparts. I'm avoiding that for now because it likely leads to distracting conversations about how public Rack::QueryParser and Rack::Multipart::Parser are / what steps are needed to safely reduce their APIs.
This change adds public methods, which will need their own tests... but for the purposes of initial discussion, I think it's probably sufficient to assert they're working because the already-tested functionality is now built atop them.
cc @ioquatix I think this matches the approach we discussed the other day