Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy Rack 2.2 query parser to retain compatibility. #47080

Closed
wants to merge 3 commits into from

Conversation

ioquatix
Copy link
Contributor

@ioquatix ioquatix commented Jan 20, 2023

The WhatWG URL spec standardises the behaviour of query strings without = symbols, e.g. a&b as having empty string values, not nil. Rack 3 follows this behaviour.

Pull in a copy of the Rack 2.2 query parser to retain compatibility with existing handling of query string keys without = symbols.

@rafaelfranca
Copy link
Member

@tenderlove does this mean we don't need perform_deep_munge anymore in Rack 3?

@rafaelfranca rafaelfranca added this to the 7.1.0 milestone Jan 20, 2023
@ioquatix
Copy link
Contributor Author

I think for Rails 8, we should aim to drop Rack 2.x - and maybe this helps align on a timeline for dropping features that only exist for Rack 2.x's sake.

@rafaelfranca
Copy link
Member

Rails doesn't follow semver as you are thinking, so it is totally fine to introduce breaking changes in 7.1. Of course, 7.1 should support Rack 2 was well, so what I was thinking is not executing deep_munge when rack 3 is used with Rails, but keep the code around with a version check.

@ioquatix
Copy link
Contributor Author

I don't have a strong opinion on the matter, and if you want to make a follow-up PR (or add commits to this one) that does what you propose, it seems reasonable to me.

old_perform_deep_munge = ActionDispatch::Request::Utils.perform_deep_munge
ActionDispatch::Request::Utils.perform_deep_munge = false
begin
if Rack.release < "3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to add the opposite test when Rack.release >= 3?

Copy link
Contributor Author

@ioquatix ioquatix Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but this is consistent with the other checks I added. I slightly prefer if Rack.release < "3" rather than unless Rack.release >= "3". If you want to change this, I would suggest a follow up PR changing it in all the places consistently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting to use unless but to have an else clause with a rack 3 version of the test. Because here you are basically removing tests.

Anyway, Matthew's concerns would need to be addressed first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any point repeating tests that already exist in Rack. What do you think?

@matthewd
Copy link
Member

This is a breaking change in our API caused by an upstream change in our dependency, which I think is an internal implementation detail of AD::Request.

ISTM our default response to this test failing should be to fix our implementation to maintain compatibility. Clearly, we had our own tests that explicitly guaranteed this historical behaviour.

We might consider other options, but I think that needs a more specific conversation than merely updating our tests to reflect the changed behaviour now observed.

@ioquatix
Copy link
Contributor Author

ioquatix commented Jan 21, 2023

@matthewd do you recall specifically what downstream system requires this behaviour? Because nothing else (appears to) breaks with this change, just these tests.

My suggestion would be, for people who want the current behaviour, to stay on Rack 2. People who want the standard, conforming behaviour can use Rack 3.

I did research this in depth and attempted to adjust the standards, but there was push back: whatwg/url#732

The standard is described here: https://url.spec.whatwg.org/#urlencoded-parsing - AFAIK, most browsers follow this standard, including JavaScript interfaces for URL encoded parameters. So, the current Rails behaviour (sans this PR) is deviating from the specification used by (all?) other systems doing URL encoding.

As another data point, grape decided to go all in on the standardised encoding and drop the old bespoke implementation/behaviour: whatwg/url#732 (comment)

cc @tenderlove whom had a strong opinion on this.

@matthewd
Copy link
Member

These tests demonstrate behaviour that we have exposed to applications. The downstream I'm concerned about is the numerous Rails applications that exist in the wild -- so many that some will depend on any established behaviour. This was deliberately implemented, and tested, to work in this way -- it's reasonable for people's applications to have depended on that, and IMO it's not reasonable for us to change it from under them without 1) a specific reason, and 2) a safe deprecation-based transition.

That being said, I think the thread you linked supports my opinion that the current behaviour is, in fact, either 1) conforming, or 2) not interested in ever conforming: (emphasis added)

The key is that the standard only has a single model: a string-to-string(s) multimap. Each entry in this multimap had a string key, and its value is one or more strings.

This model does not support:

The value being null
The value being a list containing null
The value being a zero-sized list
The value being a string, instead of a list of strings
Any special treatment of keys that end in "[]"
Or anything else, like the value being a number, or boolean, or whatever.

By that definition (which is a true re-statement of the spec as written), our integer conversion makes us permanently non-conforming, and we have no interest in changing that.

Alternatively, we might reasonably claim that integer conversion is safe because while it's a non-string value, you still get the standard-consistent value from a=1 with params[:a].to_s -- we use a [judged by us to be] contextually more convenient type by default, but it also safely stringifies to the standard's defined string value.

If we accept that it's reasonable to return an integer [which will to_s to the standard-imposed value], then it follows that we can (should!) apply an equal level of convenience-guessing and sometimes produce a nil [which will to_s to the standard-imposed value].

@tenderlove
Copy link
Member

I'm reticent to change the parameter parsing behavior in Rails. Rails has specific requirements of parameter parsing in order to make it work with with ActiveRecord. We definitely expect nulls to come in via parameters because null can be represented in JSON and we expect that controller actions don't need to change behavior whether the request was made with JSON or via QPs.

69ab91a#diff-c0f0426bb88e0814669f4262e038f0c9a99a72e1c6b4808ffb3c69aceda0065eR947

IMO we should fork the code from Rack and pull it inside Rails. I think it's really fine for Rack to have a WHATWG conforming (or whatever spec we want) QP parser, but in the context of a Rails application, Rails should define how to parse the query string.

In Rails applications it's very common to do Post.where(field: param[:field]) in a controller. Apps can (and probably do) rely on the fact that it's NULL. Changing to an empty string seems dangerous.

it's reasonable for people's applications to have depended on that, and IMO it's not reasonable for us to change it from under them without 1) a specific reason, and 2) a safe deprecation-based transition.

💯

I'd be quite happy to change this behavior, but we must at the very least have an escape hatch to allow apps to use the old behavior (and I don't think "please downgrade rack" is the right escape hatch).

(Also I'm really paranoid about changing the way parameters are parsed in Rails because we've had so many security issues surrounding parameters + ActiveRecord that I don't really want to touch what we have now)

@ioquatix
Copy link
Contributor Author

ioquatix commented Jan 21, 2023

Everything that's been said here seems reasonable to me.

However, there are also some challenges e.g. there is no way to specify NULL values to https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams and other similar interfaces because it's not in the spec, thus not supported (unfortunately). So, even if we are comfortable with Rails interpretation, it's not strictly compatible with other web technologies. If you try to specify non-supported values (e.g. null, you end up with stringified values, e.g. "null" IIUC).

I actually agree that the standard not being able to specify NULL is pretty irritating. This is why I opened an issue on the project which specifies what appears to be the defacto standard. But I don't expect to make any real progress - it would be up to browsers, but IMHO, over a long enough time (and I attached some photos to the discussion), "Real World Null Strings" show up and have real world consequences for HTML Forms / Web Applications. So, I'm actually personally in support of being able to represent nulls correctly. As it stands, the best thing we can do, to indicate that a field has a null value, is simply not to specify it.

Anyway, that aside, we can solve this problem by pulling in the Rack 2 default query parser in Rails. Maybe someone can suggest the best way to integrate with this at the configuration level so users are aware of what they are getting and can deprecate it if not needed. However, for Rails 7.1 this behaviour makes sense.

Regarding parsing parameters and extracting structure, I would personally explore having a type checking/coercion layer in the controller/business logic layer. This probably avoids a lot of the automatic coercion problems/security issues. and can probably be even a little more strict than the current interpretation of fields, e.g. integers, etc.

@ioquatix
Copy link
Contributor Author

Okay, this now uses the old query parser, and all the tests are passing without a version guard.

@ioquatix ioquatix force-pushed the rack-3-query-parser-not-nil branch 2 times, most recently from 9695190 to 0434cf5 Compare January 21, 2023 21:26
@ioquatix ioquatix changed the title Don't assume query string parsing will result in nil values. Copy Rack 2.2 query parser to retain compatibility. Jan 21, 2023
@zzak
Copy link
Member

zzak commented Jan 24, 2023

Out of curiosity, is this something that will be maintained in the future, or will have to be deprecated and removed? Should it be a gem instead?

@ioquatix
Copy link
Contributor Author

While I prefer the behaviour that preserves null values, I think it would make sense to follow the standard, therefore, I think we should try to change the standard to support preserving null values.

Personally, I think it's okay for this code to live in Rails, but merely including Rails can change the query parser behaviour now. I think this proposed PR is fine for Rails 7.1 (compatibility) but longer term (e.g. Rails 8) we should make it configurable and possibly not the default.

@ioquatix
Copy link
Contributor Author

@tenderlove is it possible to get your feedback on the updated PR?


# Replace the Rack query parser with the non-standard Rack 2.2 implementation.
# See <https://github.com/rails/rails/pull/47080> for more details.
Rack::Utils.default_query_parser = ActionDispatch::QueryParser.make_default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the class in here looks good, but this feels risky.

Instead of changing Rack's default, can we arrange to use our parser when we're doing the parsing (and/or most importantly, in the production of our params value), but leave Rack::Request to do whatever Rack wants if e.g. a third-party middleware is calling it directly?

Copy link
Contributor Author

@ioquatix ioquatix Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that’s not possible because the input body can be consumed and cached by Rack::Request. The parameters only get parsed once and it will become non-deterministic whether rack or rails parsers get to it first in your proposed change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and we can't separately cache our parsed interpretation, because the input isn't rewindable.

By even calling it "default", then, it seems that Rack's API is implying a level of flexibility it can not support (any more?).

An effective global monkey-patch of Rack behaviour as a result of loading our library is not an acceptable outcome... at best that will create a per-project non-determinism that will confound any Rack-downstream library, making them behave differently when loaded in a Rails app.

I think my current best suggestion is that we install a middleware to make the input rewindable [only] if it has a relevant content-type -- after which we should be able to do our own parameter parsing independently of rack, caching separately.

Copy link
Contributor Author

@ioquatix ioquatix Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_query_parser is designed for exactly this scenario and there is no problem changing it, but as I said we should probably make this user configurable.

Rack has lots of globally controllable behaviour and that's just the nature of it.

Practically, this change is a strict superset of existing behaviour, so I don't think anything will be broken by it. Can you find any examples?

I think my current best suggestion is that we install a middleware to make the input rewindable [only] if it has a relevant content-type -- after which we should be able to do our own parameter parsing independently of rack, caching separately.

I don't think this is a practical solution, and carries a high level of complexity and technical debt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to find a solution to this. It'll be very confusing if parsing is different depending on what files are loaded. I can imagine a scenario where someone is debugging QP parsing in a Rails app (I have had to do this) and they do it via the Rack APIs, but it's not reflective of reality since this file will be loaded in the app but not necessarily your IRB session.

No that’s not possible because the input body can be consumed and cached by Rack::Request.

If this is the case, then no Rack::Request object should be allowed to be allocated before the request object in Rails, no? Otherwise our request object in Rails would always see empty QPs on POST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, then no Rack::Request object should be allowed to be allocated before the request object in Rails, no?

How do you propose to prevent this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you propose to prevent this?

I don't think we can. But if users add a middleware that uses Rack::Request, surely they'll notice their QPs are gone and not use that middleware?

I guess this raises a larger issues: We basically require anyone using a Rack::Request object to cache any calculations as subsequent allocations won't work since the body isn't rewindable.

Since allocating a Rack::Request isn't idempotent (it reads the body, mutating the input stream), it seems like we should probably make the request object a singleton, and the type configurable (Rails wants to have it's own request object).

Side note: I am starting to feel great regret for changing the way QPs are parsed in Rack!

@ghiculescu
Copy link
Member

ghiculescu commented Jan 31, 2023

FYI if you rebase this PR against main then it'll run the tests against rack 2 and 3, so we can confirm that this PR fixes the failing query_string_parsing_test tests here, on main. (I think this PR hasn't had a build run since rails/buildkite-config#34 was merged.)

@ioquatix
Copy link
Contributor Author

Thanks for your feedback, I've just rebased it.

@ghiculescu
Copy link
Member

👍 it doesn't show in the CI status, but there's some red tests that seem related: https://buildkite.com/rails/rails/builds/93278#01860a17-b242-4dbc-a67f-cbc0e27a2f9b

@ioquatix
Copy link
Contributor Author

ioquatix commented Feb 1, 2023

Ahh thanks, I'll take a look!

@ioquatix
Copy link
Contributor Author

ioquatix commented Feb 2, 2023

@ghiculescu thanks for your report, I've fixed the issue.

@ioquatix
Copy link
Contributor Author

ioquatix commented Feb 2, 2023

I've merged these changes into the main rack-3 branch and all the tests are passing, so the issue should be fixed.

@ioquatix
Copy link
Contributor Author

Given everything that's been said here, I still think we should consider the option where Rack 2 is the default for existing applications and Rack 3 is the default for new applications isn't a bad approach. There doesn't appear to be any perfect solution that meets everyone's requirements.

The WhatWG URL spec standardises the behaviour of query strings without `=`
symbols, e.g. `a&b` as having empty string values, not nil. Rack 3 follows
this behaviour.

Pull in a copy of the Rack 2.2 query parser to retain compatibility with
existing handling of query string keys without `=` symbols.
@ioquatix
Copy link
Contributor Author

According to rack/rack#2038, we will try a different approach to fix this. Therefore, I will close this PR. Thanks everyone for your feedback.

@ioquatix ioquatix closed this Feb 24, 2023
@ioquatix ioquatix deleted the rack-3-query-parser-not-nil branch February 24, 2023 03:54
@ioquatix
Copy link
Contributor Author

We have essentially decided to go with Matthew's implementation, and Rack 3.0.5 was released with his proposed changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants