Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegate Rack::Utils.escape_html to CGI.escapeHTML #2099

Merged
merged 1 commit into from Jul 30, 2023

Conversation

JunichiIto
Copy link
Contributor

@JunichiIto JunichiIto commented Jul 22, 2023

This PR improves the performance of Rack::Utils.escape_html.

Background

#2097 (comment)

Benchmark

I defined two methods:

    def escape_html(string)
      CGI.escapeHTML(string)
    end

    def escape_html_old(string)
      string.to_s.gsub(ESCAPE_HTML_PATTERN, ESCAPE_HTML)
    end

I wrote benchmark script:

require 'benchmark'
require 'rack/utils'

# Generated by ChatGPT
HTML_EXAMPLES = <<HTML.lines(chomp: true)
<p>It's a beautiful day.</p>
<a href="https://example.com/page?id=123&category=5">Link</a>
<button onclick="alert('Hello, world!')">Click me</button>
<input type="text" value="It's my text">
<div class='container'>Content goes here</div>
<span>5 &times; 10 = 50</span>
<img src='image.jpg' alt="Nature's beauty">
<p style='color: red;'>This text is red.</p>
<a href='javascript:alert("XSS attack!");'>Click here</a>
<input type='text' placeholder='Enter your name & email'>
HTML
targets = (HTML_EXAMPLES * 10).shuffle

num_iteration = 10_000

Benchmark.bm 5 do |r|
  r.report "new" do
    num_iteration.times do
      targets.each do |string|
        Rack::Utils.escape_html(string)
      end
    end
  end

  r.report "old" do
    num_iteration.times do
      targets.each do |string|
        Rack::Utils.escape_html_old(string)
      end
    end
  end
end

Here is the result on my local machine:

$ bundle exec ruby benchmark.rb
            user     system      total        real
new     0.257449   0.001930   0.259379 (  0.259876)
old     1.706600   0.004664   1.711264 (  1.711273)

The new version is 6.6 times faster than old one.

Requirements

Requires Ruby 2.3.0 or higher since it depends on 'cgi/escape' library.

CGI.escapeHTML is optimized with C extention. ruby/ruby#1164

https://github.com/ruby/ruby/blob/v2_3_0/NEWS

I'm not sure if rack should support Ruby 2.2.x or older.

Backward compatibility

' is now escaped to #39; instead of #x27; (decimal vs hexadecimal)

@ioquatix
Copy link
Member

define_method(:escape_html, CGI.method(:escape_html))

What about this? It should avoid 1 layer of indirection per call :)

@JunichiIto
Copy link
Contributor Author

@ioquatix Thank you for your suggestion. I applied it.

@JunichiIto
Copy link
Contributor Author

@ioquatix truffleruby test failed. Can you guess why??
https://github.com/rack/rack/actions/runs/5630066289/job/15255702711

  1) Error:
Rack::Utils#test_0028_escape html entities [&><'"/]:
NoMethodError: undefined method `escape_html' for Rack::Utils:Module
Did you mean?  escape
    /home/runner/work/rack/rack/test/spec_utils.rb:474:in `test_0028_escape html entities [&><'"/]'

@ioquatix
Copy link
Member

cc @eregon do you have any opinion on the best way to proxy this method and why my suggestion failed?

@eregon
Copy link
Contributor

eregon commented Jul 22, 2023

I'll take a look.
I thought maybe it's module_function + define_method but that seems to work fine:

$ ruby -ve 'module M; module_function; define_method(:foo) { 42 }; end; p M.foo'
truffleruby 23.1.0-dev-b64cd1d4, like ruby 3.1.3, GraalVM CE Native [x86_64-linux]
42

@eregon
Copy link
Contributor

eregon commented Jul 22, 2023

$ ruby -ve 'def Enumerable.bar; end; module M; module_function; define_method(:foo, Enumerable.method(:bar)); end; p M.foo'
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
-e:1:in `define_method': can't bind singleton method to a different class (TypeError)
	from -e:1:in `<module:M>'
	from -e:1:in `<main>'

So it shouldn't be allowed even on CRuby it seems?

irb(main):001:0> CGI.method(:escapeHTML)
=> #<Method: #<Class:CGI>(CGI::Escape)#escapeHTML(_)>

Ah so because it's defined on CGI::Escape then it's allowed.

Probably the difference is related to oracle/truffleruby#3134
Will try to repro next week.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot be sure that cgi/escape exists. You need to guard the require, and if it raises LoadError, fallback to the current implementation. You should also change the current implementation so it uses the same escape for '.

@JunichiIto
Copy link
Contributor Author

@eregon
Thank you for your support.

@jeremyevans
I updated the code per your request, but I don't know how to test the fallback case. Do you have any idea?

You cannot be sure that cgi/escape exists.

I think cgi/escape always exists because cgi is a default gem. Could you tell me when it happens?

$ ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
$ gem list cgi

*** LOCAL GEMS ***

cgi (default: 0.3.6)
$ gem uninstall cgi
Gem cgi-0.3.6 cannot be uninstalled because it is a default gem

@jeremyevans
Copy link
Contributor

@eregon Thank you for your support.

@jeremyevans I updated the code per your request, but I don't know how to test the fallback case. Do you have any idea?

You cannot be sure that cgi/escape exists.

I think cgi/escape always exists because cgi is a default gem. Could you tell me when it happens?

$ ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]
$ gem list cgi

*** LOCAL GEMS ***

cgi (default: 0.3.6)
$ gem uninstall cgi
Gem cgi-0.3.6 cannot be uninstalled because it is a default gem

Apologies, cgi/escape is supported in stdlib cgi in all versions of Ruby that Rack supports (it was added in Ruby 2.3). I thought it was added more recently, but I was mistaken. So your original version was fine. I'm sorry I didn't check that before requesting the changes and caused you to do unnecessary work.

@JunichiIto
Copy link
Contributor Author

@jeremyevans No problem! I reverted the code😊

@JunichiIto
Copy link
Contributor Author

FYI
I ran benchmark for define_method and def:

define_method(:escape_html, CGI.method(:escapeHTML))

def escape_html_by_def(string)
  CGI.escapeHTML(string)
end
require 'benchmark'
require 'rack/utils'

HTML_EXAMPLES = <<HTML.lines(chomp: true)
<p>It's a beautiful day.</p>
<a href="https://example.com/page?id=123&category=5">Link</a>
<button onclick="alert('Hello, world!')">Click me</button>
<input type="text" value="It's my text">
<div class='container'>Content goes here</div>
<span>5 &times; 10 = 50</span>
<img src='image.jpg' alt="Nature's beauty">
<p style='color: red;'>This text is red.</p>
<a href='javascript:alert("XSS attack!");'>Click here</a>
<input type='text' placeholder='Enter your name & email'>
HTML
targets = (HTML_EXAMPLES * 10).shuffle

num_iteration = 10_000

Benchmark.bm 15 do |r|
  r.report "by define_method" do
    num_iteration.times do
      targets.each do |string|
        Rack::Utils.escape_html(string)
      end
    end
  end

  r.report "by def" do
    num_iteration.times do
      targets.each do |string|
        Rack::Utils.escape_html_by_def(string)
      end
    end
  end
end

Here is the result:

$ ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

$ bundle exec ruby benchmark.rb
                      user     system      total        real
by define_method  0.122637   0.000460   0.123097 (  0.123102)
by def            0.133945   0.000311   0.134256 (  0.134256)

Certainly, using define_method is faster, but after executing it 1 million times (10 examples * 10 repeats * 10,000 iteration), the difference was only 11 milliseconds.

@ioquatix
Copy link
Member

escape_html could easily be called 100s or 1000s of times per request. Obviously not ideal, but it's the central mechanism by which fields from a database or user are made safe for including in HTML. So, performance can matter (which is why CGI.escape_html is a native implementation at best).

@JunichiIto
Copy link
Contributor Author

@ioquatix I see. Thank you for your explanation!

@eregon
Copy link
Contributor

eregon commented Jul 25, 2023

OK I found the bug: oracle/truffleruby#3181
An easy workaround is to use:

    define_method(:escape_html, CGI.method(:escapeHTML))
    module_function :escape_html

Or you can wait the truffleruby fix, up to you.

BTW there is also a module_function :forwarded_values in that file which is not needed.

@eregon
Copy link
Contributor

eregon commented Jul 25, 2023

FWIW a disadvantage of define_method(:escape_html, CGI.method(:escapeHTML)) is it's pretty hard to find the definition of it:

$ ruby -v -Ilib -e 'require "rack/utils"; p Rack::Utils.escape_html Object.new'
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
-e:1:in `escapeHTML': no implicit conversion of Object into String (TypeError)
	from -e:1:in `<main>'

i.e. that backtrace does not include anything about rack/utils.rb.
And worse does not even mention escape_html

vs if defined in the usual way:

ruby -v -Ilib -e 'require "rack/utils"; p Rack::Utils.escape_html Object.new'
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
/home/eregon/code/rack/lib/rack/utils.rb:183:in `escapeHTML': no implicit conversion of Object into String (TypeError)
	from /home/eregon/code/rack/lib/rack/utils.rb:183:in `escape_html'
	from -e:1:in `<main>'

@ioquatix
Copy link
Member

ioquatix commented Jul 25, 2023

I'm personally fine with either option. I don't think I've actually ever seen code using Rack method, most are directly using CGI.

@eregon
Copy link
Contributor

eregon commented Jul 28, 2023

FWIW that bug (oracle/truffleruby#3181) is fixed in truffleruby-head now.

I would recommend to keep it simple, i.e., no define_method but explicit and clear delegation.

@JunichiIto
Copy link
Contributor Author

At this point, we have three implementation patterns. Which one should we choose? I like pattern 3 because it's very straightforward. Additionally, I think we should avoid unkind backtrace because it can easily confuse developers.

pattern 1

define_method(:escape_html, CGI.method(:escapeHTML))

pattern 2

define_method(:escape_html, CGI.method(:escapeHTML))
module_function :escape_html
  • pros: fast, executable on current TruffleRuby
  • cons: unkind backtrace

pattern 3

def escape_html(string)
  CGI.escapeHTML(string)
end
  • pros: kind backtrace, executable on current TruffleRuby
  • cons: a little bit slower

@jeremyevans
Copy link
Contributor

I vote for option 1 (i.e. commit as it is).

@ioquatix
Copy link
Member

ioquatix commented Jul 30, 2023

@JunichiIto thanks so much for presenting all the options so comprehensively. I really appreciate your effort.

I'm fine with either option 1 or 3.

Since Jeremy also agrees option 1, let's merge it.

If you are at RubyKaigi next year, please come and meet me so I can buy you a drink.

@ioquatix ioquatix merged commit 1939a54 into rack:main Jul 30, 2023
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants