Skip to content

Commit

Permalink
Delay breaking out of the parse loop when max_tree_depth is hit (#3100)
Browse files Browse the repository at this point in the history
When a token causes a node to be added to the DOM which increases the
depth of the DOM to exceed the `max_tree_depth` _and_ the token needs to
be reprocessed, memory is leaked. By delaying breaking out of the loop
until after the token has been completely handled, this appears to fix
the leak.

Fixes #3098
  • Loading branch information
stevecheckoway committed Jan 17, 2024
1 parent e4416e6 commit 598f99f
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 5 deletions.
13 changes: 8 additions & 5 deletions gumbo-parser/src/parser.c
Expand Up @@ -4826,14 +4826,17 @@ GumboOutput* gumbo_parse_with_options (
// to a token.
if (token.type == GUMBO_TOKEN_END_TAG &&
token.v.end_tag.tag == GUMBO_TAG_UNKNOWN)
{
gumbo_free(token.v.end_tag.name);
token.v.end_tag.name = NULL;
}
if (unlikely(state->_open_elements.length > max_tree_depth)) {
parser._output->status = GUMBO_STATUS_TREE_TOO_DEEP;
gumbo_debug("Tree depth limit exceeded.\n");
break;
}
}

if (unlikely(state->_open_elements.length > max_tree_depth)) {
parser._output->status = GUMBO_STATUS_TREE_TOO_DEEP;
gumbo_debug("Tree depth limit exceeded.\n");
break;
}

++loop_count;
assert(loop_count < 1000000000UL);
Expand Down
10 changes: 10 additions & 0 deletions test/test_memory_usage.rb
Expand Up @@ -301,5 +301,15 @@ def start_element(name, attrs = [])
Nokogiri::HTML5::Document.parse(html)
end
end

it "libgumbo max depth exceeded" do
html = "<html><body>"

memwatch(__method__) do
Nokogiri::HTML5.parse(html, max_tree_depth: 1)
rescue ArgumentError
# Expected error. This comment makes rubocop happy.
end
end
end if ENV["NOKOGIRI_MEMORY_SUITE"] && Nokogiri.uses_libxml?
end

0 comments on commit 598f99f

Please sign in to comment.