Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: rust-lang/regex
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: regex-syntax-0.8.2
Choose a base ref
...
head repository: rust-lang/regex
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: regex-syntax-0.8.3
Choose a head ref

Commits on Oct 14, 2023

  1. Copy the full SHA
    ee01ec2 View commit details
  2. regex-automata-0.4.2

    BurntSushi committed Oct 14, 2023
    Copy the full SHA
    488604d View commit details
  3. Copy the full SHA
    d242ede View commit details
  4. 1.10.1

    BurntSushi committed Oct 14, 2023
    Copy the full SHA
    5dff4bd View commit details
  5. lite: fix stack overflow in NFA compiler

    This commit fixes a bug where the parser could produce a very deeply
    nested Hir value beyond the configured nested limit. This was caused by
    the fact that the Hir can have some of its nested structures added to it
    without a corresponding recursive call in the parser. For example,
    repetition operators. This means that even if we don't blow the nest
    limit in the parser, the Hir itself can still become nested beyond the
    limit. This in turn will make it possible to unintentionally overflow
    the stack in subsequent recursion over the Hir value, such as in the
    Thompson NFA compiler.
    
    We fix this by checking the nesting limit both on every recursive parse
    call and also on the depth of the final Hir value once parsing is
    finished but before it has returned to the caller.
    
    Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60608
    BurntSushi committed Oct 14, 2023
    Copy the full SHA
    466e42c View commit details
  6. regex-lite-0.1.4

    BurntSushi committed Oct 14, 2023
    Copy the full SHA
    cd79881 View commit details

Commits on Oct 15, 2023

  1. Copy the full SHA
    4ae1472 View commit details
  2. lite: fix stack overflow test

    It turns out that we missed another case where the stack could overflow:
    dropping a deeply nested Hir. Namely, since we permit deeply nested Hirs
    to be constructed and only reject them after determining they are too
    deeply nested, they still then need to be dropped. We fix this by
    implementing a custom a Drop impl that uses the heap to traverse the Hir
    and drop things without using unbounded stack space.
    
    An alternative way to fix this would be to adjust the parser somehow to
    avoid building deeply nested Hir values in the first place. But that
    seems trickier, so we just stick with this for now.
    BurntSushi committed Oct 15, 2023
    Copy the full SHA
    0086dec View commit details
  3. regex-lite-0.1.5

    BurntSushi committed Oct 15, 2023
    Copy the full SHA
    e7bd19d View commit details

Commits on Oct 16, 2023

  1. automata/meta: revert broadening of reverse suffix optimization

    This reverts commit 8a8d599 and
    includes a regression test, as well as a tweak to a log message.
    
    Essentially, the broadening was improper. We have to be careful when
    dealing with suffixes as opposed to prefixes. Namely, my logic
    previously was that the broadening was okay because we were already
    doing it for the reverse inner optimization. But the reverse inner
    optimization works with prefixes, not suffixes. So the comparison wasn't
    quite correct.
    
    This goes back to only applying the reverse suffix optimization when
    there is a non-empty single common suffix.
    
    Fixes #1110
    Ref astral-sh/ruff#7980
    BurntSushi committed Oct 16, 2023
    Copy the full SHA
    eb950f6 View commit details
  2. changelog: 1.10.2

    BurntSushi committed Oct 16, 2023
    Copy the full SHA
    50fe7d1 View commit details
  3. regex-automata-0.4.3

    BurntSushi committed Oct 16, 2023
    Copy the full SHA
    61242b1 View commit details
  4. Copy the full SHA
    1a54a82 View commit details
  5. 1.10.2

    BurntSushi committed Oct 16, 2023
    Copy the full SHA
    5f1f1c8 View commit details

Commits on Oct 20, 2023

  1. automata: fix panic in dense DFA deserialization

    This fixes a hole in the validation logic that accidentally permitted a
    dense DFA to contain a match state with zero pattern IDs. Since search
    code is permitted to assume that every match state has at least one
    corresponding pattern ID, this led to a panic.
    
    Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63391
    BurntSushi committed Oct 20, 2023
    Copy the full SHA
    20b5317 View commit details

Commits on Oct 25, 2023

  1. syntax: add Hir::literal example for char

    The example shows a succinct way of creating an HIR literal from a
    `char` value by first encoding it to UTF-8.
    
    Closes #1114
    BurntSushi committed Oct 25, 2023
    Copy the full SHA
    6b72eec View commit details

Commits on Nov 1, 2023

  1. cli: change --no-captures to --captures (all|implicit|none)

    When we added the WhichCaptures type, we didn't update the CLI to expose
    the full functionality. This change does that.
    BurntSushi committed Nov 1, 2023
    Copy the full SHA
    662a8b9 View commit details
  2. regex-cli-0.2.0

    BurntSushi committed Nov 1, 2023
    Copy the full SHA
    837fd85 View commit details

Commits on Dec 5, 2023

  1. Copy the full SHA
    4f5992f View commit details

Commits on Dec 20, 2023

  1. doc: fix link in Index<&str> impl docs

    This referenced `Captures::get`, but it should reference
    `Captures::name`. This was likely a transcription error
    from the docs for the `Index<usize>` impl.
    kloune authored Dec 20, 2023
    Copy the full SHA
    a3d5975 View commit details

Commits on Dec 29, 2023

  1. ci: small clean-ups

    The regex 1.10 release bumped the MSRV to Rust 1.65, so we no longer
    need to pin to an older memchr release.
    
    We also bump to `actions/checkout@v4`.
    BurntSushi committed Dec 29, 2023
    Copy the full SHA
    dc0a9d2 View commit details

Commits on Jan 10, 2024

  1. cargo: set 'default-features = false' for memchr and aho-corasick

    I'm not sure how this one slipped by. Without this, I'd suppose that
    no-std support doesn't actually work? Or at least, one would have to
    disable the use of both memchr and aho-corasick entirely, since they
    depend on std by default. Not quite sure how to test this.
    
    Fixes #1147
    BurntSushi committed Jan 10, 2024
    Copy the full SHA
    027eebd View commit details

Commits on Jan 21, 2024

  1. safety: guard in Input::new against incorrect AsRef implementations

    Before this commit, Input::new calls haystack.as_ref() twice, once to
    get the actual haystack slice and the second time to get its length. It
    makes the assumption that the second call will return the same slice,
    but malicious implementations of AsRef can return different slices
    and thus different lengths. This is important because there's unsafe
    code relying on the Input's span being inbounds with respect to the
    haystack, but if the second call to .as_ref() returns a bigger slice
    this won't be true.
    
    For example, this snippet causes Miri to report UB on an unchecked
    slice access in find_fwd_imp (though it will also panic sometime later
    when run normally, but at that point the UB already happened):
    
        use regex_automata::{Input, meta::{Builder, Config}};
        use std::cell::Cell;
    
        struct Bad(Cell<bool>);
    
        impl AsRef<[u8]> for Bad {
            fn as_ref(&self) -> &[u8] {
                if self.0.replace(false) {
                    &[]
                } else {
                    &[0; 1000]
                }
            }
        }
    
        let bad = Bad(Cell::new(true));
        let input = Input::new(&bad);
        let regex = Builder::new()
            // Not setting this causes some checked access to occur before
            // the unchecked ones, avoiding the UB
            .configure(Config::new().auto_prefilter(false))
            .build("a+")
            .unwrap();
        regex.find(input);
    
    This commit fixes the problem by just calling .as_ref() once and use
    the length of the returned slice as the span's end value. A regression
    test has also been added.
    
    Closes #1154
    SkiFire13 authored and BurntSushi committed Jan 21, 2024
    Copy the full SHA
    fbd2537 View commit details
  2. changelog: 1.10.3

    BurntSushi committed Jan 21, 2024
    Copy the full SHA
    1bc667d View commit details
  3. regex-automata-0.4.4

    BurntSushi committed Jan 21, 2024
    Copy the full SHA
    e7b5401 View commit details
  4. Copy the full SHA
    653bb59 View commit details
  5. 1.10.3

    BurntSushi committed Jan 21, 2024
    Copy the full SHA
    0c09903 View commit details

Commits on Jan 25, 2024

  1. automata: make additional prefileter metadata public

    This commit exposes `is_fast` and also adds `max_needle_len`
    to a prefilter. This is useful for engines implemented outside
    of `regex-automata`.
    
    PR #1156
    pascalkuthe authored Jan 25, 2024
    Copy the full SHA
    07ef7f1 View commit details
  2. regex-automata-0.4.5

    BurntSushi committed Jan 25, 2024
    Copy the full SHA
    d7f9347 View commit details

Commits on Feb 26, 2024

  1. style: clean up some recent lint violations

    It looks like `dead_code` got a little smarter, and more pervasively,
    some new lint that detects superfluous imports found a bunch of them.
    BurntSushi committed Feb 26, 2024
    Copy the full SHA
    10fe722 View commit details

Commits on Mar 4, 2024

  1. automata: fix bug where reverse NFA lacked an unanchored prefix

    Previously, when compiling a Thompson NFA, we were omitting an
    unanchored prefix when the HIR contained a `^` in its prefix. We did
    this because unanchored prefix in that case would never match because of
    the requirement imposed by `^`.
    
    The problem with that is it's incorrect when compiling a reverse
    automaton. For example, in the case of building a reverse NFA for `^Qu`,
    we should sitll include an unanchored prefix because the `^` in that
    case has no conflict with it. It would be like if we omitted an
    unanchored prefix for `Qu$` in a forward NFA, which is obviously wrong.
    
    The fix here is pretty simple: in the reverse case, check for `$` in the
    suffix of the HIR rather than a `^` in the prefix.
    
    Fixes #1169
    BurntSushi committed Mar 4, 2024
    Copy the full SHA
    9cf4a42 View commit details
  2. regex-automata-0.4.6

    BurntSushi committed Mar 4, 2024
    Copy the full SHA
    a5ae351 View commit details

Commits on Mar 23, 2024

  1. api: add Cow guarantee to replace API

    This adds a guarantee to the API of the `replace`, `replace_all` and
    `replacen` routines that, when `Cow::Borrowed` is returned, it is
    guaranteed that it is equivalent to the `haystack` given.
    
    The implementation has always matched this behavior, but this elevates
    the implementation behavior to an API guarantee.
    
    There do exists implementations where this guarantee might not be upheld
    in every case. For example, if the final result were the empty string,
    we could return a `Cow::Borrowed`. Similarly, if the final result were a
    substring of `haystack`, then `Cow::Borrowed` could be returned in that
    case too. In practice, these sorts of optimizations are tricky to do in
    practice, and seem like niche corner cases that aren't important to
    optimize.
    
    Nevertheless, having this guarantee is useful because it can be used as
    a signal that the original input remains unchanged. This came up in
    discussions with @quicknir on Discord. Namely, in cases where one is
    doing a sequence of replacements and in most cases nothing is replaced,
    using a `Cow` is nice to be able to avoid copying the haystack over and
    over again. But to get this to work right, you have to know whether a
    `Cow::Borrowed` matches the input or not. If it doesn't, then you'd need
    to transform it into an owned string. For example, this code tries to do
    replacements on each of a sequence of `Cow<str>` values, where the
    common case is no replacement:
    
    ```rust
    use std::borrow::Cow;
    
    use regex::Regex;
    
    fn trim_strs(strs: &mut Vec<Cow<str>>) {
        strs
        .iter_mut()
        .for_each(|s| moo(s, &regex_replace));
    }
    
    fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
        let result = f(&c);
        match result {
            Cow::Owned(s) => *c = Cow::Owned(s),
            Cow::Borrowed(s) => {
                *c = Cow::Borrowed(s);
            }
        }
    }
    
    fn regex_replace(s: &str) -> Cow<str> {
        Regex::new(r"does-not-matter").unwrap().replace_all(s, "whatever")
    }
    ```
    
    But this doesn't pass `borrowck`. Instead, you could write `moo` like
    this:
    
    ```rust
    fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
        let result = f(&c);
        match result {
            Cow::Owned(s) => *c = Cow::Owned(s),
            Cow::Borrowed(s) => {
                if !std::ptr::eq(s, &**c) {
                    *c = Cow::Owned(s.to_owned())
                }
            }
        }
    }
    ```
    
    But the `std::ptr:eq` call here is a bit strange. Instead, after this PR
    and the new guarantee, one can write it like this:
    
    ```rust
    fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
        if let Cow::Owned(s) = f(&c) {
            *c = Cow::Owned(s);
        }
    }
    ```
    BurntSushi committed Mar 23, 2024
    Copy the full SHA
    088d7f3 View commit details
  2. 1.10.4

    BurntSushi committed Mar 23, 2024
    Copy the full SHA
    aa2d8bd View commit details

Commits on Mar 26, 2024

  1. syntax: accept {,n} as an equivalent to {0,n}

    Most regular expression engines don't accept the `{,n}` syntax, but
    some other do it (namely Python's `re` library). This introduces a new
    parser configuration option that enables the `{,n}` syntax.
    
    PR #1086
    plusvic authored Mar 26, 2024
    Copy the full SHA
    f5d0b69 View commit details
  2. regex-syntax-0.8.3

    BurntSushi committed Mar 26, 2024
    Copy the full SHA
    d895bd9 View commit details
Loading