Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split form/query parsing into two steps #2038

Merged
merged 2 commits into from Mar 11, 2023
Merged

Conversation

matthewd
Copy link
Contributor

First we parse the raw input into a stream of [key, value] pairs, and only after that do we expand that into the deep params hash.

This allows a user (or framework) to operate directly on the pair stream if they need to apply different semantics -- without needing to rewind the input, and without creating a conflict with anything else (like a middleware) that wants to use Rack's standard GET / POST hash format.

The names (flat_GET / flat_POST) are terrible, and definitely need better alternatives for this to be seriously considered.

This is currently presented as a minimal and additive change, just extracting part of the existing functionality into a separately-exposed inner parsing step, for ease of review (and possible backporting to the 3.0 series).

That being said, it feels like this sets up some possible longer-term refactoring to more cleanly separate the two parsing phases, and thus avoid some currently awkward redundancy between the separated parsing of queries and multiparts. I'm avoiding that for now because it likely leads to distracting conversations about how public Rack::QueryParser and Rack::Multipart::Parser are / what steps are needed to safely reduce their APIs.

This change adds public methods, which will need their own tests... but for the purposes of initial discussion, I think it's probably sufficient to assert they're working because the already-tested functionality is now built atop them.

cc @ioquatix I think this matches the approach we discussed the other day

@matthewd
Copy link
Contributor Author

Additional context: this is intended in aid of rails/rails#47080, allowing Rails to re-use Rack's [consuming] parse of the input stream, reading the same syntactically-defined set of key-value pairs, but applying slightly different rules in the higher-level transformation into the deeply-nested params hash format.

@jeremyevans
Copy link
Contributor

If the entire point of this is just for Rails to keep compatibility (no = results in nil value instead of ''), it seems better to extract a method so you can get the old behavior by overriding:

diff --git a/lib/rack/query_parser.rb b/lib/rack/query_parser.rb
index 1592a01e..4e169788 100644
--- a/lib/rack/query_parser.rb
+++ b/lib/rack/query_parser.rb
@@ -128,7 +128,7 @@ module Rack
 
       return if k.empty?
 
-      v ||= String.new
+      v ||= missing_param_value_default
 
       if after == ''
         if k == '[]' && depth != 0
@@ -174,6 +174,10 @@ module Rack
 
     private
 
+    def missing_param_value_default
+      String.new # return nil in Rails
+    end
+
     def params_hash_type?(obj)
       obj.kind_of?(@params_class)
     end

@matthewd What are your thoughts on that approach? I'm not sure whether Rails uses a custom QueryParser subclass already or not, but that's an easy change to make if not.

@ioquatix
Copy link
Member

@jeremyevans The problem that @tenderlove had with making a custom query parser is that there is no "custom query builder" that matches it.

We already copied the Rack 2.x query parser to rails but that PR looks like it will be rejected.

Rails also has "deep munge" which modifies the hash generated by the query parser, so it's multi-layered.

The problem with modify the behaviour of Rack is that it bifurcates the testing surface area for middleware. For example, one way for Rack, and a possibly different way for Rails (query parameter handling).

A simple way would just be to revert Rails query handling to the standard, but I'm also not sure that's acceptable.

@ioquatix
Copy link
Member

I find GET and POST ugly as part of the method names.

I wonder if we can avoid replicating them into more method names and instead perhaps be more descriptive at the same time, e.g. flat_query_parameters and flat_input_parameters etc.

@jeremyevans
Copy link
Contributor

@jeremyevans The problem that @tenderlove had with making a custom query parser is that there is no "custom query builder" that matches it.

If this is an issue, it suggests that we would want to add a QueryBuilder class to address it. That shouldn't affect how we implement the query parsing change.

We already copied the Rack 2.x query parser to rails but that PR looks like it will be rejected.

Not surprising, as that approach sounds awful to me.

Rails also has "deep munge" which modifies the hash generated by the query parser, so it's multi-layered.

Assuming it is operating on the hash generated, it doesn't matter how the hash is generated, so the much simpler fix I am proposing should still work with deep munge.

The problem with modify the behaviour of Rack is that it bifurcates the testing surface area for middleware. For example, one way for Rack, and a possibly different way for Rails (query parameter handling).

This is going to be true if Rails and Rack behavior is different, regardless of how the difference is implemented.

A simple way would just be to revert Rails query handling to the standard, but I'm also not sure that's acceptable.

That's the approach I chose with Roda, and nobody has brought it up as an issue. Obviously Roda usage is a small fraction of Rails usage, but it's not like this issue is going to affect the majority of applications. How small a percentage, I'm not quite sure. I think there are a few gems designed around use with Rails that rely on the old behavior and would need to be updated if they haven't already been.

@ioquatix
Copy link
Member

I don't have a strong opinion about any of this. I'm just trying to be a gap filler.

We will ultimately have to make a decision that lands somewhere on:

  • Do nothing, and Rack 2 / Rack 3 behaviours are different and thus affect Rails.
  • Expose something like this PR, which implements the tuple of string key/value pairs and allow applications to build on top of that as appropriate.
  • Expose a symmetric interface for parsing and building query strings.
  • Revert the Rack 3 behaviour (assume that the WhatWG spec is for browsers not web servers).

@jeremyevans
Copy link
Contributor

I don't have a strong opinion about any of this. I'm just trying to be a gap filler.

We will ultimately have to make a decision that lands somewhere on:

  • Do nothing, and Rack 2 / Rack 3 behaviours are different and thus affect Rails.
  • Expose something like this PR, which implements the tuple of string key/value pairs and allow applications to build on top of that as appropriate.
  • Expose a symmetric interface for parsing and building query strings.
  • Revert the Rack 3 behaviour (assume that the WhatWG spec is for browsers not web servers).

@ioquatix Is it intentional that you are ignoring the approach I proposed in #2038 (comment), which allows Rails to easily get backwards compatibility with minimal changes to Rack itself?

In terms of building queries, note that the behavior didn't change between Rack 2 and Rack 3:

# In both Rack 2.2 and 3.0
build_nested_query("a"=>nil)
# => "a"
build_nested_query("a"=>"")
# => "a="

There is no way to get symmetric behavior for query parsing and query building, there are multiple cases where separate hashes passed to the query builder will result in the same query string, and obviously you can't have the same query string parse to multiple hashes if you want deterministic behavior.

@ioquatix
Copy link
Member

ioquatix commented Feb 20, 2023

I'm fine with your suggested approach but it doesn't seem materially different from "We already copied the Rack 2.x query parser to rails but that PR looks like it will be rejected."

I agree the implementation would be significantly better if we implement your proposed approach, but it will still require having a custom query parser sub-class in Rails.

If @tenderlove and @matthewd are fine with that, then I have no problem with it. Otherwise, as it stands, that approach was already rejected. That being said, I don't think your implementation is a bad addition :)

@jeremyevans
Copy link
Contributor

I'm fine with your suggested approach but it doesn't seem materially different from "We already copied the Rack 2.x query parser to rails but that PR looks like it will be rejected."

I think there is huge difference from forking the entire parser to overriding a single one-line method.

I agree the implementation would be significantly better if we implement your proposed approach, but it will still require having a custom query parser sub-class in Rails.

True, but it's a simple subclass with one method override, plus an override of Rack::Request.query_parser to use that query parser.

If @tenderlove and @matthewd are fine with that, then I have no problem with it. Otherwise, as it stands, that approach was already rejected. That being said, I don't think your implementation is a bad addition :)

"that approach was already rejected" only applies if you consider my approach the same as forking the parser. As stated above, I think the approaches are completely different.

Anyway, let's see what @matthewd and @tenderlove think.

@matthewd
Copy link
Contributor Author

IMO it is not acceptable for a non-Rails middleware (or even a co-mounted Sinatra or Roda application) to see a different parse result depending on whether Rails is loaded -- and right now, that would be necessary with the proposed change: the parser consumes the (non-rewindable) input.

My intention here is to separate the "smart" part of the parsing, which is a nonstandard but convenient application-level behaviour and applies regardless of how the k-v pairs were encoded, from the two low-level mechanical and protocol-specific parts.

This proposal is about making it possible to have more than one QueryParser read the input pairs... after making that possible, I do think the change you describe could make for a much nicer way for Rails to implement a custom QueryParser rather than a full copy.


At the moment, I argue that Rack implies that you might e.g. have two different Request options, with different QueryParser behaviours, examine the same env at different points (e.g. through configuring a "default query parser" rather than "the query parser"), but the implementation doesn't support that arrangement... the first one to touch env will destructively consume it.

Beyond the technicality of naming, I would like us and other future libraries to retain the ability to implement additional parsing behaviour downstream, without affecting independent receivers of the same env.

@jeremyevans
Copy link
Contributor

IMO, the approach in this PR adds a lot of unnecessary complexity, just because Rails does not want to deal with a very minimal behavior change to conform with the standard.

IMO it is not acceptable for a non-Rails middleware (or even a co-mounted Sinatra or Roda application) to see a different parse result depending on whether Rails is loaded -- and right now, that would be necessary with the proposed change: the parser consumes the (non-rewindable) input.

Rails generally controls its own middleware stack (https://guides.rubyonrails.org/rails_on_rack.html#configuring-middleware-stack), and it could add a middleware for Rails-specific parsing at the top of the stack, using the much simpler approach I propose. Are you worried about users adding middleware to config.ru instead of through the way that Rails recommends? I would think that would be considered going "off the Rails", and Rails wouldn't try to guarantee behavior in this case.

My intention here is to separate the "smart" part of the parsing, which is a nonstandard but convenient application-level behaviour and applies regardless of how the k-v pairs were encoded, from the two low-level mechanical and protocol-specific parts.

The only reason this PR does what you want is you are implicitly encoding the old behavior (nil) instead of the standard behavior ('') in split_query. If split_query also implemented the standard behavior, then it wouldn't work for what you want to use it for. If we were going to consider such an approach in Rack, I don't see why we would have split_query use the non-standard behavior but have POST use the standard behavior.

This proposal is about making it possible to have more than one QueryParser read the input pairs... after making that possible, I do think the change you describe could make for a much nicer way for Rails to implement a custom QueryParser rather than a full copy.

This results in N+1 additional array allocations, with N being the number of elements being parsed. That's a significant cost for something with no benefit for the vast majority of users, and that makes this approach a poor-tradeoff in my opinion.

At the moment, I argue that Rack implies that you might e.g. have two different Request options, with different QueryParser behaviours, examine the same env at different points (e.g. through configuring a "default query parser" rather than "the query parser"), but the implementation doesn't support that arrangement... the first one to touch env will destructively consume it.

That's still a problem with your approach. What happens if a middleware sets RACK_REQUEST_FORM_PAIRS or RACK_REQUEST_QUERY_PAIRS in a way different than the default?

Beyond the technicality of naming, I would like us and other future libraries to retain the ability to implement additional parsing behaviour downstream, without affecting independent receivers of the same env.

For a query string, you can always reparse. With the change to non-rewindable bodies, there is no longer a way to ensure that is the case for bodies. However, since Rails generally controls its middleware stack, I'm not sure why this is an issue for Rails.

Implementation-wise, I also don't think it's a good idea to change the return type of parse_query and parse_multipart. While they are private methods, parse_query has returned a hash since it was added in 2007, and parse_multipart since it was added in 2009. Many web frameworks subclass Rack::Request, and may be relying on parse_query and parse_multipart returning hashes.

@matthewd
Copy link
Contributor Author

matthewd commented Feb 21, 2023

The only reason this PR does what you want is you are implicitly encoding the old behavior (nil) instead of the standard behavior ('') in split_query. If split_query also implemented the standard behavior, then it wouldn't work for what you want to use it for. If we were going to consider such an approach in Rack, I don't see why we would have split_query use the non-standard behavior but have POST use the standard behavior.

Fair. My theory is that this API would aim not to present the most externally-compliant interface, but the least internally-lossy one, allowing the consumer to read as much or as little into that distinction as they choose. nil still felt like the best way of encoding the "this was mentioned, but no value was suggested" no-= state, without complicating the API by making the "pair" entries variable length. I'm not attached to it, and could equally see it encoded using a unique frozen and constant-accessible empty string instance, say.

This results in N+1 additional array allocations, with N being the number of elements being parsed. That's a significant cost for something with no benefit for the vast majority of users, and that makes this approach a poor-tradeoff in my opinion.

It adds one array allocation for query parsing (for each of GET and POST, when applicable). It does add N+1 for multiparts, but if we care I think that's potentially recoverable -- that parser does a complicated job, so it's totally understandable, but it does not seem especially allocation-conservative at the moment.

[Conflicting Request subclasses is] still a problem with your approach. What happens if a middleware sets RACK_REQUEST_FORM_PAIRS or RACK_REQUEST_QUERY_PAIRS in a way different than the default?

That would be an API violation: the point of separating them is to have them available as a common reference point for any subclass to adapt into its unique parameter presentation. It's a new rack-namespaced key, so should be in no more compatibility danger than someone setting any other key to their own mangled value (read: it's not prevented, but it's also not necessary, and thus their fault when it very obviously breaks). On a related note, though, while subclasses would be free to make their own interpretation of params etc, they must not write it to RACK_REQUEST_FORM_HASH & friends: when present, those rack-namespaced keys must always contain Rack's blessed opinion of what the raw pairs meant. (I'll need to fix that here if we move forward.)

Rails generally controls its own middleware stack [/] However, since Rails generally controls its middleware stack, I'm not sure why this is an issue for Rails.

If Rails, or more generally an ActionPack-based controller, is no longer able to be mounted into an existing Rack application [and interpret the user's input as it otherwise would], or is no longer able to itself mount another Rack application [without affecting that application's understanding of its inputs], then as far as I can see, Rails is no longer Rack compatible.

Globally reconfiguring and/or overriding Rack behaviour can obviously fix any problem Rails has that isn't "my mounted Rack application behaves differently when Rails is present". It's an option, but one I see as the very last resort, as we'd then be running in an environment functionally equivalent to a fork of Rack.

just because Rails does not want to deal with a very minimal behavior change to conform with the standard.

There is no standard that says ?a[b][c] should parse to { "a" => { "b" => { "c" => "" } } }. I'm not going to re-litigate whether Rack should have changed its interpretation, but I do reject the implication that Rack is being right and Rails is actively working to remain wrong. Rails defined a complex interpretation of sequences of encoded k-v pairs; that was later copied into Rack for wider accessibility within the ecosystem; Rack recently modified its definition in a way inspired by a recently codified standard for browser behaviour.

However this also goes deeper: the no-value case is the one we're currently looking at, because it's immediate, but it's revealing a more general issue that Rack has currently defined-away the ability for a library to invent a new parameter-interpretation methodology without imposing that upon the entire [in-process] Rack ecosystem. i.e., the change I'm actually seeking to work around is the removal of rewindability on inputs (which has reasonable technical justification, but does create this new situation where you only get one shot at parsing).

The a[b] format is ugly; it would be totally reasonable for a future framework to choose something like a.b instead. While Rails is obviously not going to change like that, it does have available keyspace that (while safely parsing without error in current/historical parsers) has no existing plausible usage (being never-generated alternate spellings for identical values), and thus might be given specific meaning in future.

Moreover, after Rack has consumed the input stream (e.g. in a middleware), it would currently be impossible for a mounted application to produce the all-flat-strings structure dictated by the previously-referenced standard.


More pointedly, I believe this is supporting [the spirit of] the "subclass Request and override query_parser" option introduced in #820, in this new read-once world. Without some change of this general nature, I don't see how two libraries/frameworks/applications can co-exist while using their own Params class.

IMO, the approach in this PR adds a lot of unnecessary complexity

It's certainly code-additive, in large part because it's trying to slot into/over the existing APIs, which are all public -- this is especially obvious in the introduction of ParamList. I didn't touch QueryParser's parse_query and parse_nested_query, but they trivially shrink as split_query is just extracting their existing commonality. Removing Multipart::Parser's dependency upon QueryParser API, and reducing the former to maintaining the flat keyspace it already uses internally (to then be post-processed) would be a semantic simplification. The fact Query​Parser is involved in processing multipart inputs really seems like an unfortunate remnant of the previous Utils entanglement to me.

If this is deviating from your vision of how these classes' interaction would evolve, I'm happy to do the work to explore alternative arrangements... I really don't think teasing the QueryParser away from the Multipart::Parser is exactly a simple code -> complex code transition, though.

With the benefit of a deprecation brush and a bit more rearrangement, even setting aside how it might help downstream and looking at Rack's implementation in isolation, I think the separate code pieces will be more focused and better contained with separation into three independent classes, for parsing of querystrings, parsing of multipart streams, and inflation of complex parameter names, respectively.

@jeremyevans
Copy link
Contributor

Fair. My theory is that this API would aim not to present the most externally-compliant interface, but the least internally-lossy one, allowing the consumer to read as much or as little into that distinction as they choose. nil still felt like the best way of encoding the "this was mentioned, but no value was suggested" no-= state, without complicating the API by making the "pair" entries variable length. I'm not attached to it, and could equally see it encoded using a unique frozen and constant-accessible empty string instance, say.

If we accepted this, we would want the split_query behavior to match the current query parsing, so it would use '' instead of nil for params without =. Currently, a unique and frozen string is not used, as all parameter values are mutable strings. Changing the behavior to return a frozen string in some cases would make things inconsistent. However, we could consider that.

[Conflicting Request subclasses is] still a problem with your approach. What happens if a middleware sets RACK_REQUEST_FORM_PAIRS or RACK_REQUEST_QUERY_PAIRS in a way different than the default?

That would be an API violation: the point of separating them is to have them available as a common reference point for any subclass to adapt into its unique parameter presentation. It's a new rack-namespaced key, so should be in no more compatibility danger than someone setting any other key to their own mangled value (read: it's not prevented, but it's also not necessary, and thus their fault when it very obviously breaks). On a related note, though, while subclasses would be free to make their own interpretation of params etc, they must not write it to RACK_REQUEST_FORM_HASH & friends: when present, those rack-namespaced keys must always contain Rack's blessed opinion of what the raw pairs meant. (I'll need to fix that here if we move forward.)

If a Rack middleware creates a Rack::Request object with a custom query parser, then Rack::Request will write the resulting (non-blessed?) params to RACK_REQUEST_FORM_HASH. That's how it's been since Rack 2, I think.

just because Rails does not want to deal with a very minimal behavior change to conform with the standard.

There is no standard that says ?a[b][c] should parse to { "a" => { "b" => { "c" => "" } } }. I'm not going to re-litigate whether Rack should have changed its interpretation, but I do reject the implication that Rack is being right and Rails is actively working to remain wrong. Rails defined a complex interpretation of sequences of encoded k-v pairs; that was later copied into Rack for wider accessibility within the ecosystem; Rack recently modified its definition in a way inspired by a recently codified standard for browser behaviour.

The original reasoning given was that this PR would allow Rails to use nil instead of '' for params without =. I agree that if we want to support alternative nested parameter parsing, this PR to separate the process into multiple steps makes sense. However, I'm not sure we want to do that, and it wasn't previously stated that this was the goal.

...

With the benefit of a deprecation brush and a bit more rearrangement, even setting aside how it might help downstream and looking at Rack's implementation in isolation, I think the separate code pieces will be more focused and better contained with separation into three independent classes, for parsing of querystrings, parsing of multipart streams, and inflation of complex parameter names, respectively.

I agree with most of what you wrote. I think this comes back to the question, what problem are you trying to solve? If you are trying to solve the nil vs. '' issue, as you indicated in #2038 (comment), I think the approach in this PR is overkill, and we should choose one of the following approaches:

  • Allowing overriding a method to return nil instead of ''
  • Using a "unique frozen and constant-accessible empty string instance"
  • Have a global setting that users can toggle if the want the nil instead of ''.

If you are trying to add the ability for applications/middleware/frameworks using Rack to more easily support arbitrary parameter interpretations (e.g. a.b instead of a[b]), the general idea in this PR of separating the steps makes sense. However, I'm not convinced there is a need for that, and this adds enough complexity that we should only do it if there is a actual (not theoretical) need for it. If we make split_query use '' for parameters without = (as we should for consistency with the current query parsing), we would still need one of the above approaches to solve the nil vs '' case.

@tenderlove
Copy link
Member

I'm not really seeing the complexity here. The proposed PR seems like a straightforward "extract method" refactor such that subclasses can do their own thing.

The way I see this is we're keeping an abstract representation of the query (similar to storing a parse tree). Then downstream consumers can choose to interpret the AST as they see fit (Rack::Request providing the WhatWG one, and Rails doing whatever Rails wants to do).

I think it's a pretty elegant approach. Since we cannot provide rewindable input bodies, I think we need to provide something downstream that doesn't lose information.

If we accepted this, we would want the split_query behavior to match the current query parsing, so it would use '' instead of nil for params without =. Currently, a unique and frozen string is not used, as all parameter values are mutable strings. Changing the behavior to return a frozen string in some cases would make things inconsistent. However, we could consider that.

This is an abstract representation, and split_query is a new method, so I don't think that's necessary. Its callers are private, and the parser's public API remains the same.

The names (flat_GET / flat_POST) are terrible, and definitely need better alternatives for this to be seriously considered.

Maybe param_structure_for(:get), param_structure_for(:post). Or if we want two methods param_structure_for_(get|post). Seems a little wordy, but I like method names that do what they say. 😅

@ioquatix
Copy link
Member

ioquatix commented Feb 23, 2023

I personally think that using get and post as part of the method names is semantically wrong, even if historically it was popular.

There is nothing about get that can't also be part of post, and there is nothing about post that can't part of patch, delete, etc. So I'd suggest we avoid using the HTTP method/verb as part of the name convention.

@jeremyevans
Copy link
Contributor

query_string_param_list/body_param_list would be my recommendation for method names.

@ioquatix
Copy link
Member

ioquatix commented Feb 23, 2023

That seems reasonable to me, I'd also suggest avoiding abbreviations, i.e. query_parameter_list and body_parameter_list. (we already know it's a string, I don't think we need to mention that??)

@jeremyevans
Copy link
Contributor

That seems reasonable to me, I'd also suggest avoiding abbreviations, i.e. query_parameter_list and body_parameter_list. (we already know it's a string, I don't think we need to mention that??)

The current API has #params/#update_param/#delete_param/#media_type_params (nothing with parameter) and #query_string (not #query). I think it's good to be consistent with that.

@ioquatix
Copy link
Member

QUERY_STRING is a very old reference to the CGI spec: https://www.rfc-editor.org/rfc/rfc3875#section-4.1.7

However, it's not referred to as "query string" in any modern RFC that I'm aware of. Usually just referred to as the "query" or "query part of the URI", e.g. https://www.rfc-editor.org/rfc/rfc3986#section-3.4

And if you go looking for modern wording, it's often referred to as "query parameters", e.g. https://www.rfc-editor.org/rfc/rfc8820#section-2.4

In always thought the usage of params was inconsistent with the RFCs and never liked the abbreviation, so perhaps we can address that too (in a separate PR) with aliases.

@matthewd
Copy link
Contributor Author

matthewd commented Feb 23, 2023

However, it's not referred to as "query string" in any modern RFC

Interesting! RFC 1738 uses "query string", but by 2396 it's "query" and "<query> component". Technically Rack does already use "query parser" / parse_query, so one could argue that the current query_string method is a specific reference to the CGI header as distinct from the general concept (with a nod to the fact that referer is the method name, and referrer is an alias).

In always thought the usage of params was inconsistent with the RFCs and never liked the abbreviation, so perhaps we can address that too

If anything I'd personally lean the other way, suggesting that the formality of parameters matches the lower level mechanical interpretation we're implementing here, while the casual and easily-typed params is consistent with the blended, deep-hash-inflated, convenient API it currently fronts.

I'll tidy this up with something that fits, then leave final naming up to y'all. 👍

@matthewd
Copy link
Contributor Author

matthewd commented Mar 7, 2023

Apologies for the delay on this!

As it's (currently?1) intended as a semi-internal API, I've gone light on the testing, and just added an assertion into the existing Request parsing spec.

I've retained the private Request#parse_query and Request#parse_multipart as-is, purely for the benefit of downstream subclasses. I've also taken a maximally-conservative approach to compatibility on the new QueryParser#split_query call, first trying Request#query_parser, then Utils.default_query_parser, and if necessary falling back to creating a new instance of the built-in class directly.

As a pretty pure extract-method refactoring, this should be safe for backporting to allow consistent compatibility. I do think this change creates, and in some places exacerbates, some mismatches in the relationships between the mostly-internal classes (especially around QueryParser and its callers); I suspect there's value in some follow-up deprecations and external-API narrowing, but I'm deferring proper exploration of that because it would be clearly ineligible for backporting to the 3.0 series, and smoothing that transition is my current priority.

For the method names, I've currently gone with query_param_list / body_param_list. The existing use of "param" is compelling. From what I can see, while query_string is the most prominent entry in the public API, it's also 1) the exact name of the corresponding CGI field, and 2) literally a string -- elsewhere throughout internals, from parse_query on down, Rack seems to prefer plain "query".

Footnotes

  1. I'm certainly not suggesting this should ever be the primary user-facing API, for numerous reasons, but I imagine it will eventually be The Correct Way for a web-framework-level downstream to do self-controlled parameter parsing.

@ioquatix
Copy link
Member

ioquatix commented Mar 9, 2023

Rebased on main.

@matthewd matthewd marked this pull request as ready for review March 9, 2023 17:06
Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks fine to me.

lib/rack/multipart.rb Show resolved Hide resolved
lib/rack/query_parser.rb Show resolved Hide resolved
lib/rack/query_parser.rb Show resolved Hide resolved
lib/rack/request.rb Outdated Show resolved Hide resolved
lib/rack/request.rb Outdated Show resolved Hide resolved
First we parse the raw input into a stream of [key, value] pairs, and
only after that do we expand that into the deep params hash.

This allows a user to operate directly on the pair stream if they need
to apply different semantics, without needing to rewind the input, and
without creating a conflict with anything else (like a middleware) that
wants to use Rack's standard GET / POST hash format.
lib/rack/request.rb Outdated Show resolved Hide resolved
@ioquatix ioquatix merged commit 9f059d1 into rack:main Mar 11, 2023
14 checks passed
ioquatix pushed a commit that referenced this pull request Mar 12, 2023
* Split form/query parsing into two steps

First we parse the raw input into a stream of [key, value] pairs, and
only after that do we expand that into the deep params hash.

This allows a user to operate directly on the pair stream if they need
to apply different semantics, without needing to rewind the input, and
without creating a conflict with anything else (like a middleware) that
wants to use Rack's standard GET / POST hash format.
jeremyevans added a commit to jeremyevans/rack that referenced this pull request Mar 16, 2023
ioquatix pushed a commit that referenced this pull request Mar 16, 2023
* Revert "Prefer to use `query_parser` itself as the cache key. (#2058)"

This reverts commit 5f90c33.

* Revert "Fix handling of cached values in `Rack::Request`. (#2054)"

This reverts commit d25fedd.

* Revert "Add `QueryParser#missing_value` for handling missing values + tests. (#2052)"

This reverts commit 59d9ba9.

* Revert "Split form/query parsing into two steps (#2038)"

This reverts commit 9f059d1.

* Make query parameters without = have nil values

This was Rack's historical behavior.  While it doesn't match
URL spec section 5.1.3.3, keeping the historical behavior avoids
all of the complexity required to support the URL spec standard
by default, but also support frameworks that want to be backwards
compatible.

This keeps as much of the specs added by the recently reverted
commits that make sense.
ioquatix added a commit to ioquatix/rack that referenced this pull request Mar 16, 2023
* Revert "Prefer to use `query_parser` itself as the cache key. (rack#2058)"

This reverts commit 5f90c33.

* Revert "Fix handling of cached values in `Rack::Request`. (rack#2054)"

This reverts commit d25fedd.

* Revert "Add `QueryParser#missing_value` for handling missing values + tests. (rack#2052)"

This reverts commit 59d9ba9.

* Revert "Split form/query parsing into two steps (rack#2038)"

This reverts commit 9f059d1.

* Make query parameters without = have nil values

This was Rack's historical behavior.  While it doesn't match
URL spec section 5.1.3.3, keeping the historical behavior avoids
all of the complexity required to support the URL spec standard
by default, but also support frameworks that want to be backwards
compatible.

This keeps as much of the specs added by the recently reverted
commits that make sense.
# Conflicts:
#	lib/rack/multipart.rb
#	lib/rack/request.rb
#	test/spec_request.rb
ioquatix added a commit that referenced this pull request Mar 16, 2023
* Revert "Prefer to use `query_parser` itself as the cache key. (#2058)"

This reverts commit 5f90c33.

* Revert "Fix handling of cached values in `Rack::Request`. (#2054)"

This reverts commit d25fedd.

* Revert "Add `QueryParser#missing_value` for handling missing values + tests. (#2052)"

This reverts commit 59d9ba9.

* Revert "Split form/query parsing into two steps (#2038)"

This reverts commit 9f059d1.

* Make query parameters without = have nil values

This was Rack's historical behavior.  While it doesn't match
URL spec section 5.1.3.3, keeping the historical behavior avoids
all of the complexity required to support the URL spec standard
by default, but also support frameworks that want to be backwards
compatible.

This keeps as much of the specs added by the recently reverted
commits that make sense.
# Conflicts:
#	lib/rack/multipart.rb
#	lib/rack/request.rb
#	test/spec_request.rb
@dentarg dentarg mentioned this pull request Mar 16, 2023
7 tasks
dentarg added a commit to dentarg/sinatra that referenced this pull request May 15, 2023
dentarg added a commit to dentarg/sinatra that referenced this pull request May 15, 2023
dentarg added a commit to dentarg/sinatra that referenced this pull request May 16, 2023
dentarg added a commit to dentarg/sinatra that referenced this pull request Aug 7, 2023
geoffharcourt pushed a commit to commonlit/sinatra that referenced this pull request Nov 6, 2023
dentarg added a commit to dentarg/sinatra that referenced this pull request Dec 23, 2023
dentarg added a commit to dentarg/sinatra that referenced this pull request Jan 2, 2024
dentarg added a commit to dentarg/sinatra that referenced this pull request Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants