Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Permit use of (?-u) in byte-regex strategies (#336)
It is desirable to be able to generate, from a regex, byte sequences that are not necessarily valid UTF-8. For example, suppose you have a parser that accepts any string generated by the regex ``` [0-9]+(\.[0-9]*)? ``` Then, in your test suite, you might want to generate strings from the complementary regular language, which you could do with the regex ``` (?s:|[^0-9].*|[0-9]+[^0-9.].*|[0-9]+\.[0-9]*[^0-9].*) ``` However, this will still only generate valid UTF-8 strings. Maybe you are parsing directly from byte sequences read from disk, in which case you want to test the parser’s ability to reject invalid UTF-8 _as well as_ valid UTF-8 but not within the accepted language. Then you want this slight variation: ``` (?s-u:|[^0-9].*|[0-9]+[^0-9.].*|[0-9]+\.[0-9]*[^0-9].*) ``` But this regex will be rejected by `bytes_regex`, because by default `regex_syntax::Parser` errors out on any regex that potentially matches invalid UTF-8. The application — i.e. proptest — must opt into use of such regexes. This patch makes proptest do just that, for `bytes_regex` only. There should be no change to the behavior of any existing test suite, because opting to allow use of `(?-u)` does not change the semantics of any regex that _doesn’t_ contain `(?-u)`, and any existing regex that _does_ contain `(?-u)` must be incapable of generating invalid UTF-8 for other reasons, or `regex_syntax::Parser` would be rejecting it. (For example, `(?-u:[a-z])` cannot generate invalid UTF-8.) This patch also adds a bunch of tests for `bytes_regex`, which AFAICT was not being tested at all. Some of these use the new functionality and others don’t. There is quite a bit of code duplication in the test helper functions — `do_test` and `do_test_bytes` are almost identical, as are `generate_values_matching_regex` and `generate_byte_values_matching_regex`. I am not good enough at generic metaprogramming in Rust to factor out the duplication.
- Loading branch information
Showing
1 changed file
with
103 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters