Skip to content

Commit

Permalink
Treat reaching the maximum parse depth as EOF (#3121)
Browse files Browse the repository at this point in the history
The parser is fragile and hitting the maximum tree depth has exposed
several bugs. By treating this condition as if it is an EOF means that
we're back in the normal parsing condition.

This does expose a weakness in Gumbo's design: A spec bug that results
in skipped steps or failure to reprocess a token (or if we fail to
implement the spec in such a way), then the result can be crashes. Not
ideal.

Fixes: oss-fuzz-66107

<!--
-- Thank you for contributing to Nokogiri! To help us prioritize, please
take care to answer the
--  questions below when you submit this pull request.
--
-- The Nokogiri core team work off of `main`, so please submit all PRs
based on the `main`
-- branch. We generally will cherry-pick relevant bug fixes onto the
current release branch.
-->

**What problem is this PR intended to solve?**
oss-fuzz-66107
<!--
-- If there is an existing issue that describes this, feel free to
simply link to that issue.
--
-- Otherwise, please provide enough context for the Nokogiri maintainers
to understand your intent.
-->

**Have you included adequate test coverage?**

Nope! I'm sorry to punt on testing again but I didn't take the time to
understand why this bug occurred so I didn't produce a minimal test
case. Treating hitting the max depth as EOF just seemed like the best
way to fix this whole class of bugs by turning the exceptional (> max
depth) parsing case into the normal parsing case.

<!--
-- We have a thorough test suite that allows us to create releases
confidently and prevent
-- accidental regressions. Any proposed change in behavior __must__ be
accompanied by tests.
--
-- If possible, please try to write the tests so that they communicate
intent.
-->

**Does this change affect the behavior of either the C or the Java
implementations?**

C

<!--
-- If so, has the behavior change been made to _both_ implementations?
-- 
-- If not, the maintainers can probably help! Tell us what's missing (or
what's blocking you), and
-- then submit this PR as a "Draft".
-->
  • Loading branch information
flavorjones committed Feb 4, 2024
2 parents 91daaa7 + 52afa90 commit 633c7e9
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions gumbo-parser/src/parser.c
Original file line number Diff line number Diff line change
Expand Up @@ -4762,7 +4762,18 @@ GumboOutput* gumbo_parse_with_options (
adjusted_current_node &&
adjusted_current_node->v.element.tag_namespace != GUMBO_NAMESPACE_HTML
);
gumbo_lex(&parser, &token);
// If the maximum tree depth has been exceeded, proceed as if EOF has been reached.
//
// The parser is pretty fragile. Breaking out of the parsing loop in the middle of
// the parse can leave the document in an inconsistent state.
if (unlikely(state->_open_elements.length > max_tree_depth)) {
parser._output->status = GUMBO_STATUS_TREE_TOO_DEEP;
gumbo_debug("Tree depth limit exceeded.\n");
token.type = GUMBO_TOKEN_EOF;
} else {
gumbo_lex(&parser, &token);
}

}

const char* token_type = "text";
Expand Down Expand Up @@ -4830,11 +4841,6 @@ GumboOutput* gumbo_parse_with_options (
gumbo_free(token.v.end_tag.name);
token.v.end_tag.name = NULL;
}
if (unlikely(state->_open_elements.length > max_tree_depth)) {
parser._output->status = GUMBO_STATUS_TREE_TOO_DEEP;
gumbo_debug("Tree depth limit exceeded.\n");
break;
}
}


Expand Down

0 comments on commit 633c7e9

Please sign in to comment.