Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct Rack::MockRequest/Rack::Lint::Wrapper::InputWrapper re #set_encoding #2115

Closed

Conversation

doriantaylor
Copy link
Contributor

Rack::MockRequest.env_for would crash if the input didn't respond to #set_encoding. Rack::Lint::Wrapper::InputWrapper likewise would not convey #set_encoding to its @input member. This patch adds a condition to the former and a pass-through method to the latter.


## * +set_encoding+ just passes through to the underlying object.
def set_encoding(*args)
@input.set_encoding(*args) if @input.respond_to?(:set_encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Surely this is on the mock request to deal with - i.e. we shouldn't change the interface requirements of the input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered this situation when appropriating Rack::MockRequest.env_for to generate the elements for (what are effectively) subrequests within an app, what happened was rack.input was an instance of Rack::Lint::Wrapper::InputWrapper wrapping an instance of Puma::NullIO and neither respond to #set_encoding, so it would crash.

I genuinely believe it's more clement behaviour for Rack::MockRequest.env_for to test if the input object responds to #set_encoding before trying to call it (just like it does for #size directly below).

I'm not married to that pass-through method in Rack::Lint::Wrapper::InputWrapper but the motivation there was Postel's law. In case the underlying input does respond to #set_encoding, perform the expected behaviour.

Copy link
Member

@ioquatix ioquatix Aug 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are asking, is for every implementation of the input body to optionally adopt this interface. But the expectation is, the input should always be binary. So, I'm not sure whether this is a good idea. If the input is not binary, that sounds like a bug in whatever is generating the body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing nothing more than ask for things not to crash when they don't need to. Prior to this patch, the implementation of Rack::MockRequest.env_for does exactly that: crash because it already assumes whatever handed to it responds to #set_encoding. Do you have an issue with that too, or just the pass-through method in Rack::Lint::Wrapper::InputWrapper?

@jeremyevans
Copy link
Contributor

From SPEC: The input stream is an IO-like object. IO objects support set_encoding, therefore, the input stream should respond to it. Input streams that do not respond to set_encoding are outside SPEC, and we should not make changes to rack to support things outside of SPEC.

@MSP-Greg
Copy link
Contributor

MSP-Greg commented Aug 27, 2023

@jeremyevans

From: SPEC: "The input stream must respond to gets, each, and read."

The phrase "The input stream is an IO-like object" is not definitive, especially given the above sentence...

Happy to add it to Puma's Puma::NullIO. Maybe set_encoding should be added to the "must respond to" list. PR?

@jeremyevans
Copy link
Contributor

From: SPEC: "The input stream must respond to gets, each, and read."

The phrase "The input stream is an IO-like object" is not definitive, especially given the above phrase...

Happy to add it to Puma's Puma::NullIO. Maybe set_encoding should be added to the "must respond to" list. PR?

That's a fair point. I think we could add set_encoding to that list, or better, reword the sentence in SPEC so it doesn't list individual methods, as listing individual methods leads to the slippery slope. To me, IO-like means responds to methods that IO object would respond to.

I would recommend adding set_encoding to Puma::NullIO. I think it would be even better to iterate over IO instance methods and make sure Puma::NullIO responds to all of them.

@MSP-Greg
Copy link
Contributor

IO has a lot of methods. Puma::NullIO is only used when a request has no body/content. At some point, why is code querying it if there's no header cl/te info?

So, I'd prefer that the SPEC stayed as is, with an addition of set_encoding.

MSP-Greg added a commit to MSP-Greg/puma that referenced this pull request Aug 27, 2023
@doriantaylor
Copy link
Contributor Author

doriantaylor commented Aug 27, 2023

For the record the code that tripped the linter error looks like this:

  # Using the current request as a basis, fake up a new request with
  # the given URI.
  #
  # @param req [Rack::Request] the current request.
  # @param uri [URI, RDF::URI] the new URI
  # @param method [Symbol, #to_sym] the request method (GET)
  # @param headers [Hash, #to_h] overriding request headers
  # @param body [#each, #call] a new body
  #
  # @return [Rack::Request] a new request
  #
  def dup_request req, uri, method: nil, headers: {}, body: nil
    # coerce the URI just so we can flatten it again so we can parse again
    uri = Intertwingler::Resolver.coerce_resource uri, as: :uri

    # override the method (maybe)
    method ||= req.request_method
    # same deal with with the body
    body ||= req.env['rack.input']

    # fake up an environment
    env = req.env.merge Rack::MockRequest.env_for uri.to_s, input: body,
      method: method.to_s.strip.upcase, script_name: req.script_name

    # correct (non-standard??) REQUEST_URI which will be wrong now if it exists
    env['REQUEST_URI'] = uri.request_uri.b if env.key? 'REQUEST_URI'

    # supplant rack.errors which will also be wrong
    env['rack.errors'] = req.env['rack.errors']

    # now overwrite the headers
    headers.each do |hdr, val|
      hdr = hdr.to_s.strip.upcase.tr_s(?-, ?_)
      hdr = "HTTP_#{hdr}" unless hdr.start_with? 'HTTP_'
      val = val.join ', ' if val.is_a? Array
      env[hdr] = val
    end

    # et voilà
    Rack::Request.new env
  end

I used Rack::MockRequest.env_for because it conveniently handles a bunch of fiddling I would otherwise have to do manually (not to mention redundantly). When the environment is set to development, as you know, Rack::Lint gets put in the pipeline. When the server is Puma (the default if it's installed, as you know), the input (at least for a GET request) is a Puma::NullIO that gets wrapped with Rack::Lint::Wrapper::InputWrapper.

  • Rack::MockRequest.env_for assumes it can call #set_encoding on whatever it gets passed as input, which it probably shouldn't assume.
  • Separately, Rack::Lint::Wrapper::InputWrapper arguably should transparently proxy any method call its @input member is capable of handling (aside from the ones it wants to lint), though #set_encoding is probably a decent start.

@jeremyevans
Copy link
Contributor

  • Rack::MockRequest.env_for assumes it can call #set_encoding on whatever it gets passed as input, which it probably shouldn't assume.

I disagree. Input streams are required by SPEC to be IO-like, and IO-like objects should support set_encoding.

  • Separately, Rack::Lint::Wrapper::InputWrapper arguably should transparently proxy any method call its @input member is capable of handling (aside from the ones it wants to lint), though #set_encoding is probably a decent start.

We don't want this because then it won't catch issues where non-IO methods are called on the input (calling such methods should raise, as it would be a SPEC violation). I think it would be reasonable to add method_missing and delegate, but only for IO instance methods. Any other methods should raise. However, at the minimum it should support set_encoding.

@doriantaylor
Copy link
Contributor Author

doriantaylor commented Aug 27, 2023

I disagree. Input streams are required by SPEC to be IO-like, and IO-like objects should support set_encoding.

Ah, the sense I got from reading the spec was that the input object was duck-typed with a minimal footprint of required methods/signatures/behaviours; if #set_encoding ought to be required then the spec should probably be updated to reflect it. Note again however I have no opinion on this one way or another; I am motivated exclusively by having code that doesn't crash.

(I will add that the spec appears fine as-is with regard to the input object: whether it is an IO object is immaterial—StringIO, for one, is infamously not—and the stipulation that gets, each, and read behave like they do in IO—and that they produce binary strings—is clear. Likewise methods like size/length and such are counting bytes rather than e.g. UTF-8 characters. Whether the input object responds to other methods appears likewise immaterial to Rack's ability to make use of it, so it is of questionable sense to either require or forbid them.)

We don't want this because then it won't catch issues where non-IO methods are called on the input (calling such methods should raise, as it would be a SPEC violation). I think it would be reasonable to add method_missing and delegate, but only for IO instance methods. Any other methods should raise. However, at the minimum it should support set_encoding.

I was imagining something like having it extend Forwardable and then have the constructor look at @input and def_delegators to the set of instance methods in @input minus those defined directly on Rack::Lint::Wrapper::InputWrapper, thus avoiding the tar pit that is #method_missing.

@MSP-Greg
Copy link
Contributor

Re Puma, the SPEC states "The input stream is an IO-like object which contains the raw HTTP POST data"

Ok, so why doesn't the SPEC define what happens when there is no "HTTP POST data"? Maybe in Rack 4 rack.input can be nil?

@doriantaylor
Copy link
Contributor Author

doriantaylor commented Aug 27, 2023

TBH I would change the SPEC on that; there way more methods than POST that afford input (and POST itself of course affords much more than just the two HTML form types). Even request bodies in GET are merely a SHOULD NOT in RFC9110 rather than a MUST NOT. Pruning out GET request content is thus an implementation decision, not a writ from the standard:

Although request message framing is independent of the method used, content received in a GET request has no generally defined semantics, cannot alter the meaning or target of the request, and might lead some implementations to reject the request and close the connection because of its potential as a request smuggling attack (Section 11.2 of [HTTP/1.1]). A client SHOULD NOT generate content in a GET request unless it is made directly to an origin server that has previously indicated, in or out of band, that such a request has a purpose and will be adequately supported. An origin server SHOULD NOT rely on private agreements to receive content, since participants in HTTP communication are often unaware of intermediaries along the request chain.

I interpret that paragraph as "content (ie a body) in a GET request is a terrible idea but we're not going to tell you that you can't do it." That said, I'm not suggesting the implementation decision to prune GET request content is wrong.

@MSP-Greg
Copy link
Contributor

I think that, rather than 'HTTP POST data', 'request body' or 'request content' is probably a better term for the SPEC. Regardless, requiring an "IO like object" when there is no content seems odd.

@doriantaylor
Copy link
Contributor Author

doriantaylor commented Aug 27, 2023

Yeah I agree; it should be able to be nil if there isn't anything there (even POST can have a null request body).

The argument for always having something there is so you don't have to test for it first but ehh ¯\_(ツ)_/¯ I'd probably test for it anyway.

@ioquatix
Copy link
Member

ioquatix commented Aug 27, 2023

From the SPEC:

The Input Stream
The input stream is an IO-like object which contains the raw HTTP POST data. When applicable, its external encoding must be “ASCII-8BIT” and it must be opened in binary mode

The body must be in binary mode, and not being in binary mode should be considered a bug, at least in the current interpretation.

If we add set_encoding, a rework of this section is required, and the following questions need to be answered:

  • Does this mean we can set the encoding to something other than than binary?

  • Should the encoding be set automatically according to the request headers?

  • Who is responsible for encoding errors/error handling?

Maybe in Rack 4 rack.input can be nil?

rack.input was already made optional in #2018.

@ioquatix
Copy link
Member

ioquatix commented Aug 27, 2023

The key issue is this line of code:

rack_input.set_encoding(Encoding::BINARY)

I'd be fine with introducing:

if rack_input.respond_to?(:set_encoding)
  rack_input.set_encoding(Encoding::BINARY)
end

However, at this point, I'm not convinced we should be modifying the SPEC for this. In other words, if you supply a InputWrapper instance to MockRequest.env_for, it should already be a valid binary-encoded body. If we wanted to be even more strict about it, we could write this:

      if String === opts[:input]
        rack_input = StringIO.new(opts[:input], encoding: Encoding::BINARY)
      else
        rack_input = opts[:input]
      end

      if rack_input
        # Delete set_encoding

@MSP-Greg
Copy link
Contributor

rack.input was already made optional in #2018.

Sorry, just noticed the following in the SPEC env section (bold added)

"The environment is required to include these variables (adopted from / PEP 333), except when they’d be empty"

The code you showed above LGTM. Puma uses Puma::NullIO for empty bodies. I wonder what might break if we remove that and env['rack.input']...

@ioquatix
Copy link
Member

ioquatix commented Aug 28, 2023

rack.input is not "effectively" optional in any released version of rack, but it will be in the next minor release (3.1). In Rack 3.1+ you should stop using Puma::NullIO. I don't have a strong opinion about previous releases of Rack, but at least, IIRC, Rack::Lint won't accept nil rack.input even if the SPEC is ambiguous at best in this case. Personally, the sooner we can drop things like Puma::NullIO and the equivalent in Falcon, the better!

@ioquatix
Copy link
Member

There is one more point worth making here, and it's that MockRequest probably should only be used for tests. I'm not sure it's a good idea for using within an actual server virtual/internal redirect/request logic.

@MSP-Greg
Copy link
Contributor

Ok, somewhat off topic...

Given the following:

  • Changes in Ruby 3.x
  • Many things in Rack have existed since a time when bandwidth, CPU power, memory, etc were all much slower/smaller than today.

So, re requests with content, I suspect the vast majority don't have very large content, so the content could effectively be passed as a frozen string. Larger content would probably (?) be better left as an IO. App servers could have an option as to size threshold for that.

Any thoughts? Or wait until Rack 4 is on the horizon?

@ioquatix
Copy link
Member

@MSP-Greg if you want to discuss such a thing, please use https://github.com/rack/rack/discussions or make a separate issue, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants