Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single regexp for strip_html #1032

Merged
merged 3 commits into from
Feb 22, 2019
Merged

Single regexp for strip_html #1032

merged 3 commits into from
Feb 22, 2019

Conversation

printercu
Copy link
Contributor

Hi!

Benchmark shows almost 2x improvement.

Benchmark source
STRIP_HTML = Regexp.union(
  /<script.*?<\/script>/m,
  /<!--.*?-->/m,
  /<style.*?<\/style>/m,
  /<.*?>/m,
)

def strip_html_old(input)
  empty = ''.freeze
  input.to_s.gsub(/<script.*?<\/script>/m, empty).gsub(/<!--.*?-->/m, empty).gsub(/<style.*?<\/style>/m, empty).gsub(/<.*?>/m, empty)
end

def strip_html_new(input)
  input.to_s.gsub(STRIP_HTML, ''.freeze)
end

require 'benchmark/ips'        

string = <<~HTML
  <div>test</div>
  <div id='test'>test</div>
  <script type='text/javascript'>document.write('some stuff');</script>
  <style type='text/css'>foo bar</style>
  <div\nclass='multiline'>test</div>
  <!-- foo bar \n test -->test
HTML

Benchmark.ips do |x|
  x.report('old') { strip_html_old(string) }
  x.report('new') { strip_html_new(string) }
  x.compare!
end

@ghost ghost added the cla-needed label Feb 22, 2019
@pushrax
Copy link
Contributor

pushrax commented Feb 22, 2019

I modified this to avoid changing behaviour while keeping a lot of the speedup.

@pushrax pushrax merged commit 23d669f into Shopify:master Feb 22, 2019
@shopify-shipit shopify-shipit bot temporarily deployed to rubygems-4-0-stable January 11, 2023 15:52 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants