-
Notifications
You must be signed in to change notification settings - Fork 459
Comparing changes
Open a pull request
base repository: rust-lang/regex
base: regex-syntax-0.8.2
head repository: rust-lang/regex
compare: regex-syntax-0.8.3
Commits on Oct 14, 2023
-
Configuration menu - View commit details
-
Copy full SHA for ee01ec2 - Browse repository at this point
Copy the full SHA ee01ec2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 488604d - Browse repository at this point
Copy the full SHA 488604dView commit details -
Configuration menu - View commit details
-
Copy full SHA for d242ede - Browse repository at this point
Copy the full SHA d242edeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5dff4bd - Browse repository at this point
Copy the full SHA 5dff4bdView commit details -
lite: fix stack overflow in NFA compiler
This commit fixes a bug where the parser could produce a very deeply nested Hir value beyond the configured nested limit. This was caused by the fact that the Hir can have some of its nested structures added to it without a corresponding recursive call in the parser. For example, repetition operators. This means that even if we don't blow the nest limit in the parser, the Hir itself can still become nested beyond the limit. This in turn will make it possible to unintentionally overflow the stack in subsequent recursion over the Hir value, such as in the Thompson NFA compiler. We fix this by checking the nesting limit both on every recursive parse call and also on the depth of the final Hir value once parsing is finished but before it has returned to the caller. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60608
Configuration menu - View commit details
-
Copy full SHA for 466e42c - Browse repository at this point
Copy the full SHA 466e42cView commit details -
Configuration menu - View commit details
-
Copy full SHA for cd79881 - Browse repository at this point
Copy the full SHA cd79881View commit details
Commits on Oct 15, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 4ae1472 - Browse repository at this point
Copy the full SHA 4ae1472View commit details -
It turns out that we missed another case where the stack could overflow: dropping a deeply nested Hir. Namely, since we permit deeply nested Hirs to be constructed and only reject them after determining they are too deeply nested, they still then need to be dropped. We fix this by implementing a custom a Drop impl that uses the heap to traverse the Hir and drop things without using unbounded stack space. An alternative way to fix this would be to adjust the parser somehow to avoid building deeply nested Hir values in the first place. But that seems trickier, so we just stick with this for now.
Configuration menu - View commit details
-
Copy full SHA for 0086dec - Browse repository at this point
Copy the full SHA 0086decView commit details -
Configuration menu - View commit details
-
Copy full SHA for e7bd19d - Browse repository at this point
Copy the full SHA e7bd19dView commit details
Commits on Oct 16, 2023
-
automata/meta: revert broadening of reverse suffix optimization
This reverts commit 8a8d599 and includes a regression test, as well as a tweak to a log message. Essentially, the broadening was improper. We have to be careful when dealing with suffixes as opposed to prefixes. Namely, my logic previously was that the broadening was okay because we were already doing it for the reverse inner optimization. But the reverse inner optimization works with prefixes, not suffixes. So the comparison wasn't quite correct. This goes back to only applying the reverse suffix optimization when there is a non-empty single common suffix. Fixes #1110 Ref astral-sh/ruff#7980
Configuration menu - View commit details
-
Copy full SHA for eb950f6 - Browse repository at this point
Copy the full SHA eb950f6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50fe7d1 - Browse repository at this point
Copy the full SHA 50fe7d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 61242b1 - Browse repository at this point
Copy the full SHA 61242b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a54a82 - Browse repository at this point
Copy the full SHA 1a54a82View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5f1f1c8 - Browse repository at this point
Copy the full SHA 5f1f1c8View commit details
Commits on Oct 20, 2023
-
automata: fix panic in dense DFA deserialization
This fixes a hole in the validation logic that accidentally permitted a dense DFA to contain a match state with zero pattern IDs. Since search code is permitted to assume that every match state has at least one corresponding pattern ID, this led to a panic. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63391
Configuration menu - View commit details
-
Copy full SHA for 20b5317 - Browse repository at this point
Copy the full SHA 20b5317View commit details
Commits on Oct 25, 2023
-
syntax: add Hir::literal example for
char
The example shows a succinct way of creating an HIR literal from a `char` value by first encoding it to UTF-8. Closes #1114
Configuration menu - View commit details
-
Copy full SHA for 6b72eec - Browse repository at this point
Copy the full SHA 6b72eecView commit details
Commits on Nov 1, 2023
-
cli: change --no-captures to --captures (all|implicit|none)
When we added the WhichCaptures type, we didn't update the CLI to expose the full functionality. This change does that.
Configuration menu - View commit details
-
Copy full SHA for 662a8b9 - Browse repository at this point
Copy the full SHA 662a8b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 837fd85 - Browse repository at this point
Copy the full SHA 837fd85View commit details
Commits on Dec 5, 2023
-
doc: tweak
Captures
documentationThis was suggested [on Discord](https://discord.com/channels/273534239310479360/1120175689124036669/1181401471720370237).
Configuration menu - View commit details
-
Copy full SHA for 4f5992f - Browse repository at this point
Copy the full SHA 4f5992fView commit details
Commits on Dec 20, 2023
-
doc: fix link in Index<&str> impl docs
This referenced `Captures::get`, but it should reference `Captures::name`. This was likely a transcription error from the docs for the `Index<usize>` impl.
Configuration menu - View commit details
-
Copy full SHA for a3d5975 - Browse repository at this point
Copy the full SHA a3d5975View commit details
Commits on Dec 29, 2023
-
The regex 1.10 release bumped the MSRV to Rust 1.65, so we no longer need to pin to an older memchr release. We also bump to `actions/checkout@v4`.
Configuration menu - View commit details
-
Copy full SHA for dc0a9d2 - Browse repository at this point
Copy the full SHA dc0a9d2View commit details
Commits on Jan 10, 2024
-
cargo: set 'default-features = false' for memchr and aho-corasick
I'm not sure how this one slipped by. Without this, I'd suppose that no-std support doesn't actually work? Or at least, one would have to disable the use of both memchr and aho-corasick entirely, since they depend on std by default. Not quite sure how to test this. Fixes #1147
Configuration menu - View commit details
-
Copy full SHA for 027eebd - Browse repository at this point
Copy the full SHA 027eebdView commit details
Commits on Jan 21, 2024
-
safety: guard in Input::new against incorrect AsRef implementations
Before this commit, Input::new calls haystack.as_ref() twice, once to get the actual haystack slice and the second time to get its length. It makes the assumption that the second call will return the same slice, but malicious implementations of AsRef can return different slices and thus different lengths. This is important because there's unsafe code relying on the Input's span being inbounds with respect to the haystack, but if the second call to .as_ref() returns a bigger slice this won't be true. For example, this snippet causes Miri to report UB on an unchecked slice access in find_fwd_imp (though it will also panic sometime later when run normally, but at that point the UB already happened): use regex_automata::{Input, meta::{Builder, Config}}; use std::cell::Cell; struct Bad(Cell<bool>); impl AsRef<[u8]> for Bad { fn as_ref(&self) -> &[u8] { if self.0.replace(false) { &[] } else { &[0; 1000] } } } let bad = Bad(Cell::new(true)); let input = Input::new(&bad); let regex = Builder::new() // Not setting this causes some checked access to occur before // the unchecked ones, avoiding the UB .configure(Config::new().auto_prefilter(false)) .build("a+") .unwrap(); regex.find(input); This commit fixes the problem by just calling .as_ref() once and use the length of the returned slice as the span's end value. A regression test has also been added. Closes #1154
Configuration menu - View commit details
-
Copy full SHA for fbd2537 - Browse repository at this point
Copy the full SHA fbd2537View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1bc667d - Browse repository at this point
Copy the full SHA 1bc667dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e7b5401 - Browse repository at this point
Copy the full SHA e7b5401View commit details -
Configuration menu - View commit details
-
Copy full SHA for 653bb59 - Browse repository at this point
Copy the full SHA 653bb59View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c09903 - Browse repository at this point
Copy the full SHA 0c09903View commit details
Commits on Jan 25, 2024
-
automata: make additional prefileter metadata public
This commit exposes `is_fast` and also adds `max_needle_len` to a prefilter. This is useful for engines implemented outside of `regex-automata`. PR #1156
Configuration menu - View commit details
-
Copy full SHA for 07ef7f1 - Browse repository at this point
Copy the full SHA 07ef7f1View commit details -
Configuration menu - View commit details
-
Copy full SHA for d7f9347 - Browse repository at this point
Copy the full SHA d7f9347View commit details
Commits on Feb 26, 2024
-
style: clean up some recent lint violations
It looks like `dead_code` got a little smarter, and more pervasively, some new lint that detects superfluous imports found a bunch of them.
Configuration menu - View commit details
-
Copy full SHA for 10fe722 - Browse repository at this point
Copy the full SHA 10fe722View commit details
Commits on Mar 4, 2024
-
automata: fix bug where reverse NFA lacked an unanchored prefix
Previously, when compiling a Thompson NFA, we were omitting an unanchored prefix when the HIR contained a `^` in its prefix. We did this because unanchored prefix in that case would never match because of the requirement imposed by `^`. The problem with that is it's incorrect when compiling a reverse automaton. For example, in the case of building a reverse NFA for `^Qu`, we should sitll include an unanchored prefix because the `^` in that case has no conflict with it. It would be like if we omitted an unanchored prefix for `Qu$` in a forward NFA, which is obviously wrong. The fix here is pretty simple: in the reverse case, check for `$` in the suffix of the HIR rather than a `^` in the prefix. Fixes #1169
Configuration menu - View commit details
-
Copy full SHA for 9cf4a42 - Browse repository at this point
Copy the full SHA 9cf4a42View commit details -
Configuration menu - View commit details
-
Copy full SHA for a5ae351 - Browse repository at this point
Copy the full SHA a5ae351View commit details
Commits on Mar 23, 2024
-
api: add Cow guarantee to replace API
This adds a guarantee to the API of the `replace`, `replace_all` and `replacen` routines that, when `Cow::Borrowed` is returned, it is guaranteed that it is equivalent to the `haystack` given. The implementation has always matched this behavior, but this elevates the implementation behavior to an API guarantee. There do exists implementations where this guarantee might not be upheld in every case. For example, if the final result were the empty string, we could return a `Cow::Borrowed`. Similarly, if the final result were a substring of `haystack`, then `Cow::Borrowed` could be returned in that case too. In practice, these sorts of optimizations are tricky to do in practice, and seem like niche corner cases that aren't important to optimize. Nevertheless, having this guarantee is useful because it can be used as a signal that the original input remains unchanged. This came up in discussions with @quicknir on Discord. Namely, in cases where one is doing a sequence of replacements and in most cases nothing is replaced, using a `Cow` is nice to be able to avoid copying the haystack over and over again. But to get this to work right, you have to know whether a `Cow::Borrowed` matches the input or not. If it doesn't, then you'd need to transform it into an owned string. For example, this code tries to do replacements on each of a sequence of `Cow<str>` values, where the common case is no replacement: ```rust use std::borrow::Cow; use regex::Regex; fn trim_strs(strs: &mut Vec<Cow<str>>) { strs .iter_mut() .for_each(|s| moo(s, ®ex_replace)); } fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) { let result = f(&c); match result { Cow::Owned(s) => *c = Cow::Owned(s), Cow::Borrowed(s) => { *c = Cow::Borrowed(s); } } } fn regex_replace(s: &str) -> Cow<str> { Regex::new(r"does-not-matter").unwrap().replace_all(s, "whatever") } ``` But this doesn't pass `borrowck`. Instead, you could write `moo` like this: ```rust fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) { let result = f(&c); match result { Cow::Owned(s) => *c = Cow::Owned(s), Cow::Borrowed(s) => { if !std::ptr::eq(s, &**c) { *c = Cow::Owned(s.to_owned()) } } } } ``` But the `std::ptr:eq` call here is a bit strange. Instead, after this PR and the new guarantee, one can write it like this: ```rust fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) { if let Cow::Owned(s) = f(&c) { *c = Cow::Owned(s); } } ```
Configuration menu - View commit details
-
Copy full SHA for 088d7f3 - Browse repository at this point
Copy the full SHA 088d7f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for aa2d8bd - Browse repository at this point
Copy the full SHA aa2d8bdView commit details
Commits on Mar 26, 2024
-
syntax: accept
{,n}
as an equivalent to{0,n}
Most regular expression engines don't accept the `{,n}` syntax, but some other do it (namely Python's `re` library). This introduces a new parser configuration option that enables the `{,n}` syntax. PR #1086
Configuration menu - View commit details
-
Copy full SHA for f5d0b69 - Browse repository at this point
Copy the full SHA f5d0b69View commit details -
Configuration menu - View commit details
-
Copy full SHA for d895bd9 - Browse repository at this point
Copy the full SHA d895bd9View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff regex-syntax-0.8.2...regex-syntax-0.8.3