Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Crash while acquiring the memsize of nodes #2923

Closed
peterzhu2118 opened this issue Jul 6, 2023 · 2 comments · Fixed by #2924
Closed

[bug] Crash while acquiring the memsize of nodes #2923

peterzhu2118 opened this issue Jul 6, 2023 · 2 comments · Fixed by #2924
Labels
state/needs-triage Inbox for non-installation-related bug reports or help requests

Comments

@peterzhu2118
Copy link
Contributor

Please describe the bug

When acquiring the memsize of nodes, nokogiri crashes with the following stack trace:

/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/objspace.rb:102: [BUG] Segmentation fault at 0x0000000000000041
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0004 p:---- s:0025 e:000024 CFUNC  :_dump_all
c:0003 p:0124 s:0017 e:000016 METHOD /opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/objspace.rb:102
c:0002 p:0017 s:0006 e:000005 BLOCK  /home/spin/src/github.com/Shopify/nokogiri/test/helper.rb:21 [FINISH]
c:0001 p:0000 s:0003 E:000250 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
/home/spin/src/github.com/Shopify/nokogiri/test/helper.rb:21:in `block in <top (required)>'
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/objspace.rb:102:in `dump_all'
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/objspace.rb:102:in `_dump_all'

-- Machine register context ------------------------------------------------
 RIP: 0x00007fa029564bc7 RBP: 0x0000000000000031 RSP: 0x00007fff3cf76a10
 RAX: 0x0000000000000001 RBX: 0x0000000000000031 RCX: 0x000000000000000d
 RDX: 0x00007fa02efd2150 RDI: 0x0000000000000031 RSI: 0x000000000000000c
  R8: 0x0000562d2d49a780  R9: 0x0000000000000015 R10: 0x0000000000000000
 R11: 0x000000000000000c R12: 0x0000000000000001 R13: 0x00007fa022d8e978
 R14: 0x0000562d24ec0228 R15: 0x00007fff3cf76da0 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_print_backtrace+0xd) [0x7fa02eea4c3f] /tmp/ruby-3.2.2/vm_dump.c:785
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_vm_bugreport) /tmp/ruby-3.2.2/vm_dump.c:1080
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_bug_for_fatal_signal+0xf4) [0x7fa02ec9bee4] /tmp/ruby-3.2.2/error.c:813
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(sigsegv+0x4d) [0x7fa02edf41dd] /tmp/ruby-3.2.2/signal.c:964
/lib/x86_64-linux-gnu/libc.so.6(0x7fa02e831520) [0x7fa02e831520]
/home/spin/src/github.com/Shopify/nokogiri/lib/nokogiri/nokogiri.so(memsize_node+0x7) [0x7fa029564bc7] ../../../../ext/nokogiri/xml_document.c:100
/home/spin/src/github.com/Shopify/nokogiri/lib/nokogiri/nokogiri.so(memsize_node+0x24) [0x7fa029564be4] ../../../../ext/nokogiri/xml_document.c:107
/home/spin/src/github.com/Shopify/nokogiri/lib/nokogiri/nokogiri.so(memsize_node+0x24) [0x7fa029564be4] ../../../../ext/nokogiri/xml_document.c:107
/home/spin/src/github.com/Shopify/nokogiri/lib/nokogiri/nokogiri.so(memsize_node+0x49) [0x7fa029564c09] ../../../../ext/nokogiri/xml_document.c:113
/home/spin/src/github.com/Shopify/nokogiri/lib/nokogiri/nokogiri.so(memsize+0xd) [0x7fa029564c3d] ../../../../ext/nokogiri/xml_document.c:124
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(obj_memsize_of+0x201) [0x7fa02ecbb681] /tmp/ruby-3.2.2/gc.c:4950
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/x86_64-linux/objspace.so(dump_object+0x26f) [0x7fa0286e9eef] /tmp/ruby-3.2.2/ext/objspace/objspace_dump.c:593
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/x86_64-linux/objspace.so(heap_i+0x3b) [0x7fa0286ead4b] /tmp/ruby-3.2.2/ext/objspace/objspace_dump.c:623
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(objspace_each_objects_try+0x112) [0x7fa02ecbf8e2] /tmp/ruby-3.2.2/gc.c:3931
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_ensure+0x126) [0x7fa02eca5906] /tmp/ruby-3.2.2/eval.c:1007
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_objspace_each_objects+0xc5) [0x7fa02ecc9ef5] /tmp/ruby-3.2.2/gc.c:4006
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/x86_64-linux/objspace.so(dump_flush+0x0) [0x7fa0286e8b98] /tmp/ruby-3.2.2/ext/objspace/objspace_dump.c:801
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/x86_64-linux/objspace.so(dump_result) /tmp/ruby-3.2.2/ext/objspace/objspace_dump.c:688
/opt/rubies/ruby-3.2.2/lib/ruby/3.2.0/x86_64-linux/objspace.so(objspace_dump_all) /tmp/ruby-3.2.2/ext/objspace/objspace_dump.c:803
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(vm_call_cfunc_with_frame+0x127) [0x7fa02ee79357] /tmp/ruby-3.2.2/vm_insnhelper.c:3268
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(vm_sendish+0x97) [0x7fa02ee89884] /tmp/ruby-3.2.2/vm_insnhelper.c:5080
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(vm_exec_core) /tmp/ruby-3.2.2/insns.def:820
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_vm_exec+0xd3) [0x7fa02ee8f1b3] /tmp/ruby-3.2.2/vm.c:2374
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_vm_invoke_proc+0x5f) [0x7fa02ee950af] /tmp/ruby-3.2.2/vm.c:1603
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_proc_call_kw+0x9c) [0x7fa02ed8b42c] /tmp/ruby-3.2.2/proc.c:995
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(exec_end_procs_chain+0x44) [0x7fa02eca6dfd] /tmp/ruby-3.2.2/eval_jump.c:105
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_ec_exec_end_proc) /tmp/ruby-3.2.2/eval_jump.c:120
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_ec_teardown+0xc7) [0x7fa02eca6ff7] /tmp/ruby-3.2.2/eval.c:159
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(rb_ec_cleanup+0x14a) [0x7fa02eca724a] /tmp/ruby-3.2.2/eval.c:212
/opt/rubies/ruby-3.2.2/lib/libruby.so.3.2(ruby_run_node+0x9d) [0x7fa02eca769d] /tmp/ruby-3.2.2/eval.c:330
/opt/rubies/ruby-3.2.2/bin/ruby(rb_main+0x21) [0x562d22f57187] ./main.c:38
/opt/rubies/ruby-3.2.2/bin/ruby(main) ./main.c:57
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_call_main+0x80) [0x7fa02e818d90] ../sysdeps/nptl/libc_start_call_main.h:58
/lib/x86_64-linux-gnu/libc.so.6(call_init+0x0) [0x7fa02e818e40] ../csu/libc-start.c:392
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main_impl) ../csu/libc-start.c:379
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main) (null):0
[0x562d22f571d5]

Help us reproduce what you're seeing

The issue can be reproduced by applying the following patch:

diff --git a/test/helper.rb b/test/helper.rb
index ed3133c6..43100694 100644
--- a/test/helper.rb
+++ b/test/helper.rb
@@ -13,6 +13,14 @@
 # - NOKOGIRI_GC: read more in test/test_memory_leak.rb
 #
 
+at_exit {
+  require "objspace"
+
+  GC.start
+
+  ObjectSpace.dump_all(output: :stdout)
+}
+
 require "simplecov"
 SimpleCov.start do
   add_filter "/test/"

Expected behavior

There are no crashes.

Environment

Ruby 3.2.2 on x86 Linux.
Nokogiri on commit 2edbbef.

Additional context

N/A.

@peterzhu2118 peterzhu2118 added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jul 6, 2023
@stevecheckoway
Copy link
Contributor

On x86-64 linux with this patch, I get a different crash. I get one in ObjectSpace.dump_all:

/nokogiri/test/helper.rb:170: NOKOGIRI_TEST_GC_LEVEL: normal
/usr/local/lib/ruby/3.2.0/objspace.rb:102:in `_dump_all': stack level too deep (SystemStackError)
	from /usr/local/lib/ruby/3.2.0/objspace.rb:102:in `dump_all'
	from /nokogiri/test/helper.rb:21:in `block in <top (required)>'
Full stderr output
/usr/local/bin/ruby -w -I"lib:test" /nokogiri/vendor/bundle/ruby/3.2.0/gems/rake-13.0.6/lib/rake/rake_test_loader.rb "test/css/test_css.rb" "test/css/test_css_integration.rb" "test/css/test_parser.rb" "test/css/test_tokenizer.rb" "test/css/test_xpath_visitor.rb" "test/decorators/test_slop.rb" "test/html4/sax/test_parser.rb" "test/html4/sax/test_parser_context.rb" "test/html4/sax/test_parser_text.rb" "test/html4/sax/test_push_parser.rb" "test/html4/test_attributes.rb" "test/html4/test_attributes_properly_escaped.rb" "test/html4/test_builder.rb" "test/html4/test_comments.rb" "test/html4/test_document.rb" "test/html4/test_document_encoding.rb" "test/html4/test_document_fragment.rb" "test/html4/test_element_description.rb" "test/html4/test_named_characters.rb" "test/html4/test_node.rb" "test/html4/test_node_encoding.rb" "test/html5/test_api.rb" "test/html5/test_encoding.rb" "test/html5/test_monkey_patch.rb" "test/html5/test_nokogumbo.rb" "test/html5/test_null.rb" "test/html5/test_quirks_mode.rb" "test/html5/test_serialize.rb" "test/html5/test_tree_construction.rb" "test/namespaces/test_additional_namespaces_in_builder_doc.rb" "test/namespaces/test_namespace_definitions.rb" "test/namespaces/test_namespaces_aliased_default.rb" "test/namespaces/test_namespaces_in_builder_doc.rb" "test/namespaces/test_namespaces_in_cloned_doc.rb" "test/namespaces/test_namespaces_in_created_doc.rb" "test/namespaces/test_namespaces_in_parsed_doc.rb" "test/namespaces/test_namespaces_preservation.rb" "test/test_compaction.rb" "test/test_css_cache.rb" "test/test_encoding_handler.rb" "test/test_gem_platform.rb" "test/test_html.rb" "test/test_iso.rb" "test/test_memory_leak.rb" "test/test_nokogiri.rb" "test/test_nokogumbo_contract.rb" "test/test_pattern_matching.rb" "test/test_serialization_encoding.rb" "test/test_soap4r_sax.rb" "test/test_version.rb" "test/test_xslt_transforms.rb" "test/xml/node/test_attribute_methods.rb" "test/xml/node/test_save_options.rb" "test/xml/node/test_subclass.rb" "test/xml/sax/test_document_error.rb" "test/xml/sax/test_parser.rb" "test/xml/sax/test_parser_context.rb" "test/xml/sax/test_parser_text.rb" "test/xml/sax/test_push_parser.rb" "test/xml/test_attr.rb" "test/xml/test_attribute_decl.rb" "test/xml/test_builder.rb" "test/xml/test_c14n.rb" "test/xml/test_cdata.rb" "test/xml/test_comment.rb" "test/xml/test_document.rb" "test/xml/test_document_encoding.rb" "test/xml/test_document_fragment.rb" "test/xml/test_dtd.rb" "test/xml/test_dtd_encoding.rb" "test/xml/test_element_content.rb" "test/xml/test_element_decl.rb" "test/xml/test_entity_decl.rb" "test/xml/test_entity_reference.rb" "test/xml/test_namespace.rb" "test/xml/test_node.rb" "test/xml/test_node_attributes.rb" "test/xml/test_node_encoding.rb" "test/xml/test_node_inheritance.rb" "test/xml/test_node_reparenting.rb" "test/xml/test_node_set.rb" "test/xml/test_parse_options.rb" "test/xml/test_processing_instruction.rb" "test/xml/test_reader.rb" "test/xml/test_reader_encoding.rb" "test/xml/test_relax_ng.rb" "test/xml/test_schema.rb" "test/xml/test_syntax_error.rb" "test/xml/test_text.rb" "test/xml/test_unparented_node.rb" "test/xml/test_xinclude.rb" "test/xml/test_xpath.rb" "test/xslt/test_custom_functions.rb" "test/xslt/test_exception_handling.rb"
/nokogiri/test/helper.rb:62: version info:
---
warnings: []
nokogiri:
  version: 1.16.0.dev
  cppflags:
  - "-I/nokogiri/ext/nokogiri"
  - "-I/nokogiri/ext/nokogiri/include"
  - "-I/nokogiri/ext/nokogiri/include/libxml2"
  ldflags: []
ruby:
  version: 3.2.2
  platform: x86_64-linux
  gem_platform: x86_64-linux
  description: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
  engine: ruby
libxml:
  source: packaged
  precompiled: false
  patches:
  - 0001-Remove-script-macro-support.patch
  - 0002-Update-entities-to-remove-handling-of-ssi.patch
  - 0003-libxml2.la-is-in-top_builddir.patch
  - '0009-allow-wildcard-namespaces.patch'
  - 0010-update-config.guess-and-config.sub-for-libxml2.patch
  - 0011-rip-out-libxml2-s-libc_single_threaded-support.patch
  libxml2_path: "/nokogiri/ext/nokogiri"
  memory_management: ruby
  iconv_enabled: true
  compiled: 2.11.4
  loaded: 2.11.4
libxslt:
  source: packaged
  precompiled: false
  patches:
  - 0001-update-config.guess-and-config.sub-for-libxslt.patch
  datetime_enabled: true
  compiled: 1.1.38
  loaded: 1.1.38
other_libraries:
  libgumbo: 1.0.0-nokogiri
/nokogiri/test/helper.rb:170: NOKOGIRI_TEST_GC_LEVEL: normal
/usr/local/lib/ruby/3.2.0/objspace.rb:102:in `_dump_all': stack level too deep (SystemStackError)
	from /usr/local/lib/ruby/3.2.0/objspace.rb:102:in `dump_all'
	from /nokogiri/test/helper.rb:21:in `block in <top (required)>'
rake aborted!
Command failed with status (1): [ruby -w -I"lib:test" /nokogiri/vendor/bundle/ruby/3.2.0/gems/rake-13.0.6/lib/rake/rake_test_loader.rb "test/css/test_css.rb" "test/css/test_css_integration.rb" "test/css/test_parser.rb" "test/css/test_tokenizer.rb" "test/css/test_xpath_visitor.rb" "test/decorators/test_slop.rb" "test/html4/sax/test_parser.rb" "test/html4/sax/test_parser_context.rb" "test/html4/sax/test_parser_text.rb" "test/html4/sax/test_push_parser.rb" "test/html4/test_attributes.rb" "test/html4/test_attributes_properly_escaped.rb" "test/html4/test_builder.rb" "test/html4/test_comments.rb" "test/html4/test_document.rb" "test/html4/test_document_encoding.rb" "test/html4/test_document_fragment.rb" "test/html4/test_element_description.rb" "test/html4/test_named_characters.rb" "test/html4/test_node.rb" "test/html4/test_node_encoding.rb" "test/html5/test_api.rb" "test/html5/test_encoding.rb" "test/html5/test_monkey_patch.rb" "test/html5/test_nokogumbo.rb" "test/html5/test_null.rb" "test/html5/test_quirks_mode.rb" "test/html5/test_serialize.rb" "test/html5/test_tree_construction.rb" "test/namespaces/test_additional_namespaces_in_builder_doc.rb" "test/namespaces/test_namespace_definitions.rb" "test/namespaces/test_namespaces_aliased_default.rb" "test/namespaces/test_namespaces_in_builder_doc.rb" "test/namespaces/test_namespaces_in_cloned_doc.rb" "test/namespaces/test_namespaces_in_created_doc.rb" "test/namespaces/test_namespaces_in_parsed_doc.rb" "test/namespaces/test_namespaces_preservation.rb" "test/test_compaction.rb" "test/test_css_cache.rb" "test/test_encoding_handler.rb" "test/test_gem_platform.rb" "test/test_html.rb" "test/test_iso.rb" "test/test_memory_leak.rb" "test/test_nokogiri.rb" "test/test_nokogumbo_contract.rb" "test/test_pattern_matching.rb" "test/test_serialization_encoding.rb" "test/test_soap4r_sax.rb" "test/test_version.rb" "test/test_xslt_transforms.rb" "test/xml/node/test_attribute_methods.rb" "test/xml/node/test_save_options.rb" "test/xml/node/test_subclass.rb" "test/xml/sax/test_document_error.rb" "test/xml/sax/test_parser.rb" "test/xml/sax/test_parser_context.rb" "test/xml/sax/test_parser_text.rb" "test/xml/sax/test_push_parser.rb" "test/xml/test_attr.rb" "test/xml/test_attribute_decl.rb" "test/xml/test_builder.rb" "test/xml/test_c14n.rb" "test/xml/test_cdata.rb" "test/xml/test_comment.rb" "test/xml/test_document.rb" "test/xml/test_document_encoding.rb" "test/xml/test_document_fragment.rb" "test/xml/test_dtd.rb" "test/xml/test_dtd_encoding.rb" "test/xml/test_element_content.rb" "test/xml/test_element_decl.rb" "test/xml/test_entity_decl.rb" "test/xml/test_entity_reference.rb" "test/xml/test_namespace.rb" "test/xml/test_node.rb" "test/xml/test_node_attributes.rb" "test/xml/test_node_encoding.rb" "test/xml/test_node_inheritance.rb" "test/xml/test_node_reparenting.rb" "test/xml/test_node_set.rb" "test/xml/test_parse_options.rb" "test/xml/test_processing_instruction.rb" "test/xml/test_reader.rb" "test/xml/test_reader_encoding.rb" "test/xml/test_relax_ng.rb" "test/xml/test_schema.rb" "test/xml/test_syntax_error.rb" "test/xml/test_text.rb" "test/xml/test_unparented_node.rb" "test/xml/test_xinclude.rb" "test/xml/test_xpath.rb" "test/xslt/test_custom_functions.rb" "test/xslt/test_exception_handling.rb" ]
/nokogiri/vendor/bundle/ruby/3.2.0/gems/rake-13.0.6/exe/rake:27:in `<top (required)>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Tasks: TOP => test
(See full trace by running task with --trace)

I looked at the code in question and the only thing it's doing is walking the tree and calling xmlStrlen on a bunch of things and all that does is call strlen. The one thing that stood out to me is that this sort of recursive walk could exceed the size of the stack. (This concern is why, for example, the gumbo -> libxml2 conversion does not use recursion and instead uses a convoluted tree walking approach.)

I suspect that's what is happening here. I wasn't able to confirm that with this test case, however

require 'nokogiri'

at_exit do
  require 'objspace'
  GC.start
  ObjectSpace.dump_all(output: :stdout)
end

depth = 100000
html = '<!DOCTYPE html>' + ('<span>' * depth) + ('</span>' * depth)
doc = Nokogiri::HTML5.parse(html, max_tree_depth: -1)

I get a similar result with Nokogiri::HTML4.parse(html).

@stevecheckoway
Copy link
Contributor

No, my analysis is wrong. I believe I've isolated the problem and putting together a PR now.

stevecheckoway added a commit to stevecheckoway/nokogiri that referenced this issue Jul 6, 2023
The `properties` field of an `xmlNode` element points to an `xmlAttr`.
The first few fields of `xmlAttr` are in common with `xmlNode`, but not
the `properties` field which doesn't exist in an `xmlAttr`.

The `memsize_node` function was passing an `xmlAttr` to a
recursive call and then trying to do the same with the properties of
that.

This led to type confusion and subsequent crashes.

Fixes: sparklemotion#2923
flavorjones added a commit that referenced this issue Jul 8, 2023
**What problem is this PR intended to solve?**

The `properties` field of an `xmlNode` element points to an `xmlAttr`.
The first few fields of `xmlAttr` are in common with `xmlNode`, but not
the `properties` field which doesn't exist in an `xmlAttr`.

The `memsize_node` function was passing an `xmlAttr` to a recursive call
and then trying to do the same with the properties of that.

This led to type confusion and subsequent crashes.

Fixes: #2923


**Have you included adequate test coverage?**

Yes.


**Does this change affect the behavior of either the C or the Java
implementations?**

It fixes a crash in the C implementation of an ObjectSpace method that is not implemented in JRuby.
flavorjones pushed a commit that referenced this issue Aug 11, 2023
The `properties` field of an `xmlNode` element points to an `xmlAttr`.
The first few fields of `xmlAttr` are in common with `xmlNode`, but not
the `properties` field which doesn't exist in an `xmlAttr`.

The `memsize_node` function was passing an `xmlAttr` to a
recursive call and then trying to do the same with the properties of
that.

This led to type confusion and subsequent crashes.

Fixes: #2923
(cherry picked from commit 81762fa)
flavorjones pushed a commit that referenced this issue Aug 11, 2023
The `properties` field of an `xmlNode` element points to an `xmlAttr`.
The first few fields of `xmlAttr` are in common with `xmlNode`, but not
the `properties` field which doesn't exist in an `xmlAttr`.

The `memsize_node` function was passing an `xmlAttr` to a
recursive call and then trying to do the same with the properties of
that.

This led to type confusion and subsequent crashes.

Fixes: #2923
(cherry picked from commit 81762fa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state/needs-triage Inbox for non-installation-related bug reports or help requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants