Skip to content

Commit

Permalink
Proofread some code/docs
Browse files Browse the repository at this point in the history
* `escapeable` should be replaced by `escapable`, but it is part of a pub fn
  • Loading branch information
nyurik committed Oct 16, 2023
1 parent e7bd19d commit 213e00b
Show file tree
Hide file tree
Showing 58 changed files with 147 additions and 147 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Expand Up @@ -11,7 +11,7 @@ on:
# `schedule` event. By specifying any permission explicitly all others are set
# to none. By using the principle of least privilege the damage a compromised
# workflow can do (because of an injection or compromised third party tool or
# action) is restricted. Currently the worklow doesn't need any additional
# action) is restricted. Currently, the workflow doesn't need any additional
# permission except for pulling the code. Adding labels to issues, commenting
# on pull-requests, etc. may need additional permissions:
#
Expand Down
28 changes: 14 additions & 14 deletions CHANGELOG.md
Expand Up @@ -25,7 +25,7 @@ The new word boundary assertions are:
* `\<` or `\b{start}`: a Unicode start-of-word boundary (`\W|\A` on the left,
`\w` on the right).
* `\>` or `\b{end}`: a Unicode end-of-word boundary (`\w` on the left, `\W|\z`
on the right)).
on the right).
* `\b{start-half}`: half of a Unicode start-of-word boundary (`\W|\A` on the
left).
* `\b{end-half}`: half of a Unicode end-of-word boundary (`\W|\z` on the
Expand Down Expand Up @@ -139,7 +139,7 @@ Bug fixes:

* [BUG #934](https://github.com/rust-lang/regex/issues/934):
Fix a performance bug where high contention on a single regex led to massive
slow downs.
slow-downs.


1.9.4 (2023-08-26)
Expand Down Expand Up @@ -382,14 +382,14 @@ New features:
Permit many more characters to be escaped, even if they have no significance.
More specifically, any ASCII character except for `[0-9A-Za-z<>]` can now be
escaped. Also, a new routine, `is_escapeable_character`, has been added to
`regex-syntax` to query whether a character is escapeable or not.
`regex-syntax` to query whether a character is escapable or not.
* [FEATURE #547](https://github.com/rust-lang/regex/issues/547):
Add `Regex::captures_at`. This fills a hole in the API, but doesn't otherwise
introduce any new expressive power.
* [FEATURE #595](https://github.com/rust-lang/regex/issues/595):
Capture group names are now Unicode-aware. They can now begin with either a `_`
or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints
can be any sequence of alpha-numeric codepoints, along with `_`, `.`, `[` and
can be any sequence of alphanumeric codepoints, along with `_`, `.`, `[` and
`]`. Note that replacement syntax has not changed.
* [FEATURE #810](https://github.com/rust-lang/regex/issues/810):
Add `Match::is_empty` and `Match::len` APIs.
Expand Down Expand Up @@ -433,7 +433,7 @@ Fix a number of issues with printing `Hir` values as regex patterns.
* [BUG #610](https://github.com/rust-lang/regex/issues/610):
Add explicit example of `foo|bar` in the regex syntax docs.
* [BUG #625](https://github.com/rust-lang/regex/issues/625):
Clarify that `SetMatches::len` does not (regretably) refer to the number of
Clarify that `SetMatches::len` does not (regrettably) refer to the number of
matches in the set.
* [BUG #660](https://github.com/rust-lang/regex/issues/660):
Clarify "verbose mode" in regex syntax documentation.
Expand Down Expand Up @@ -820,7 +820,7 @@ Bug fixes:

1.3.1 (2019-09-04)
==================
This is a maintenance release with no changes in order to try to work-around
This is a maintenance release with no changes in order to try to work around
a [docs.rs/Cargo issue](https://github.com/rust-lang/docs.rs/issues/400).


Expand Down Expand Up @@ -855,15 +855,15 @@ This release does a bit of house cleaning. Namely:
Rust project.
* Teddy has been removed from the `regex` crate, and is now part of the
`aho-corasick` crate.
[See `aho-corasick`'s new `packed` sub-module for details](https://docs.rs/aho-corasick/0.7.6/aho_corasick/packed/index.html).
[See `aho-corasick`'s new `packed` submodule for details](https://docs.rs/aho-corasick/0.7.6/aho_corasick/packed/index.html).
* The `utf8-ranges` crate has been deprecated, with its functionality moving
into the
[`utf8` sub-module of `regex-syntax`](https://docs.rs/regex-syntax/0.6.11/regex_syntax/utf8/index.html).
* The `ucd-util` dependency has been dropped, in favor of implementing what
little we need inside of `regex-syntax` itself.

In general, this is part of an ongoing (long term) effort to make optimizations
in the regex engine easier to reason about. The current code is too convoluted
in the regex engine easier to reason about. The current code is too convoluted,
and thus it is very easy to introduce new bugs. This simplification effort is
the primary motivation behind re-working the `aho-corasick` crate to not only
bundle algorithms like Teddy, but to also provide regex-like match semantics
Expand Down Expand Up @@ -1065,7 +1065,7 @@ need or want to use these APIs.
New features:

* [FEATURE #493](https://github.com/rust-lang/regex/pull/493):
Add a few lower level APIs for amortizing allocation and more fine grained
Add a few lower level APIs for amortizing allocation and more fine-grained
searching.

Bug fixes:
Expand Down Expand Up @@ -1111,7 +1111,7 @@ of the regex library should be able to migrate to 1.0 by simply bumping the
version number. The important changes are as follows:

* We adopt Rust 1.20 as the new minimum supported version of Rust for regex.
We also tentativley adopt a policy that permits bumping the minimum supported
We also tentatively adopt a policy that permits bumping the minimum supported
version of Rust in minor version releases of regex, but no patch releases.
That is, with respect to semver, we do not strictly consider bumping the
minimum version of Rust to be a breaking change, but adopt a conservative
Expand Down Expand Up @@ -1198,7 +1198,7 @@ Bug fixes:

0.2.8 (2018-03-12)
==================
Bug gixes:
Bug fixes:

* [BUG #454](https://github.com/rust-lang/regex/pull/454):
Fix a bug in the nest limit checker being too aggressive.
Expand All @@ -1219,7 +1219,7 @@ New features:
* Full support for intersection, difference and symmetric difference of
character classes. These can be used via the `&&`, `--` and `~~` binary
operators within classes.
* A Unicode Level 1 conformat implementation of `\p{..}` character classes.
* A Unicode Level 1 conformant implementation of `\p{..}` character classes.
Things like `\p{scx:Hira}`, `\p{age:3.2}` or `\p{Changes_When_Casefolded}`
now work. All property name and value aliases are supported, and properties
are selected via loose matching. e.g., `\p{Greek}` is the same as
Expand Down Expand Up @@ -1342,7 +1342,7 @@ Bug fixes:
0.2.1
=====
One major bug with `replace_all` has been fixed along with a couple of other
touchups.
touch-ups.

* [BUG #312](https://github.com/rust-lang/regex/issues/312):
Fix documentation for `NoExpand` to reference correct lifetime parameter.
Expand Down Expand Up @@ -1491,7 +1491,7 @@ A number of bugs have been fixed:
* Fix bug #277.
* [PR #270](https://github.com/rust-lang/regex/pull/270):
Fixes bugs #264, #268 and an unreported where the DFA cache size could be
drastically under estimated in some cases (leading to high unexpected memory
drastically underestimated in some cases (leading to high unexpected memory
usage).

0.1.73
Expand Down
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -171,7 +171,7 @@ assert!(matches.matched(6));
### Usage: regex internals as a library

The [`regex-automata` directory](./regex-automata/) contains a crate that
exposes all of the internal matching engines used by the `regex` crate. The
exposes all the internal matching engines used by the `regex` crate. The
idea is that the `regex` crate exposes a simple API for 99% of use cases, but
`regex-automata` exposes oodles of customizable behaviors.

Expand All @@ -192,7 +192,7 @@ recommended for general use.

### Crate features

This crate comes with several features that permit tweaking the trade off
This crate comes with several features that permit tweaking the trade-off
between binary size, compilation time and runtime performance. Users of this
crate can selectively disable Unicode tables, or choose from a variety of
optimizations performed by this crate to disable.
Expand Down Expand Up @@ -230,7 +230,7 @@ searches are "fast" in practice.

While the first interpretation is pretty unambiguous, the second one remains
nebulous. While nebulous, it guides this crate's architecture and the sorts of
the trade offs it makes. For example, here are some general architectural
the trade-offs it makes. For example, here are some general architectural
statements that follow as a result of the goal to be "fast":

* When given the choice between faster regex searches and faster _Rust compile
Expand Down
10 changes: 5 additions & 5 deletions UNICODE.md
Expand Up @@ -207,21 +207,21 @@ Finally, Unicode word boundaries can be disabled, which will cause ASCII word
boundaries to be used instead. That is, `\b` is a Unicode word boundary while
`(?-u)\b` is an ASCII-only word boundary. This can occasionally be beneficial
if performance is important, since the implementation of Unicode word
boundaries is currently sub-optimal on non-ASCII text.
boundaries is currently suboptimal on non-ASCII text.


## RL1.5 Simple Loose Matches

[UTS#18 RL1.5](https://unicode.org/reports/tr18/#Simple_Loose_Matches)

The regex crate provides full support for case insensitive matching in
The regex crate provides full support for case-insensitive matching in
accordance with RL1.5. That is, it uses the "simple" case folding mapping. The
"simple" mapping was chosen because of a key convenient property: every
"simple" mapping is a mapping from exactly one code point to exactly one other
code point. This makes case insensitive matching of character classes, for
code point. This makes case-insensitive matching of character classes, for
example, straight-forward to implement.

When case insensitive mode is enabled (e.g., `(?i)[a]` is equivalent to `a|A`),
When case-insensitive mode is enabled (e.g., `(?i)[a]` is equivalent to `a|A`),
then all characters classes are case folded as well.


Expand All @@ -248,7 +248,7 @@ Given Rust's strong ties to UTF-8, the following guarantees are also provided:
* All matches are reported on valid UTF-8 code unit boundaries. That is, any
match range returned by the public regex API is guaranteed to successfully
slice the string that was searched.
* By consequence of the above, it is impossible to match surrogode code points.
* By consequence of the above, it is impossible to match surrogate code points.
No support for UTF-16 is provided, so this is never necessary.

Note that when Unicode mode is disabled, the fundamental atom of matching is
Expand Down
2 changes: 1 addition & 1 deletion record/compile-test/README.md
@@ -1,5 +1,5 @@
This directory contains the results of compilation tests. Specifically,
the results are from testing both the from scratch compilation time and
the results are from testing both the from-scratch compilation time and
relative binary size increases of various features for both the `regex` and
`regex-automata` crates.

Expand Down
4 changes: 2 additions & 2 deletions regex-automata/README.md
Expand Up @@ -66,7 +66,7 @@ Below is an outline of how `unsafe` is used in this crate.

* `util::pool::Pool` makes use of `unsafe` to implement a fast path for
accessing an element of the pool. The fast path applies to the first thread
that uses the pool. In effect, the fast path is fast because it avoid a mutex
that uses the pool. In effect, the fast path is fast because it avoids a mutex
lock. `unsafe` is also used in the no-std version of `Pool` to implement a spin
lock for synchronization.
* `util::lazy::Lazy` uses `unsafe` to implement a variant of
Expand Down Expand Up @@ -112,6 +112,6 @@ In the end, I do still somewhat consider this crate an experiment. It is
unclear whether the strong boundaries between components will be an impediment
to ongoing development or not. De-coupling tends to lead to slower development
in my experience, and when you mix in the added cost of not introducing
breaking changes all of the time, things can get quite complicated. But, I
breaking changes all the time, things can get quite complicated. But, I
don't think anyone has ever release the internals of a regex engine as a
library before. So it will be interesting to see how it plays out!
2 changes: 1 addition & 1 deletion regex-automata/src/dfa/automaton.rs
Expand Up @@ -2202,7 +2202,7 @@ where
///
/// Specifically, this tries to succinctly distinguish the different types of
/// states: dead states, quit states, accelerated states, start states and
/// match states. It even accounts for the possible overlappings of different
/// match states. It even accounts for the possible overlapping of different
/// state types.
pub(crate) fn fmt_state_indicator<A: Automaton>(
f: &mut core::fmt::Formatter<'_>,
Expand Down
8 changes: 4 additions & 4 deletions regex-automata/src/dfa/dense.rs
Expand Up @@ -2810,7 +2810,7 @@ impl OwnedDFA {
}

// Collect all our non-DEAD start states into a convenient set and
// confirm there is no overlap with match states. In the classicl DFA
// confirm there is no overlap with match states. In the classical DFA
// construction, start states can be match states. But because of
// look-around, we delay all matches by a byte, which prevents start
// states from being match states.
Expand Down Expand Up @@ -3461,7 +3461,7 @@ impl TransitionTable<Vec<u32>> {
// Normally, to get a fresh state identifier, we would just
// take the index of the next state added to the transition
// table. However, we actually perform an optimization here
// that premultiplies state IDs by the stride, such that they
// that pre-multiplies state IDs by the stride, such that they
// point immediately at the beginning of their transitions in
// the transition table. This avoids an extra multiplication
// instruction for state lookup at search time.
Expand Down Expand Up @@ -4509,7 +4509,7 @@ impl<T: AsRef<[u32]>> MatchStates<T> {
+ (self.pattern_ids().len() * PatternID::SIZE)
}

/// Valides that the match state info is itself internally consistent and
/// Validates that the match state info is itself internally consistent and
/// consistent with the recorded match state region in the given DFA.
fn validate(&self, dfa: &DFA<T>) -> Result<(), DeserializeError> {
if self.len() != dfa.special.match_len(dfa.stride()) {
Expand Down Expand Up @@ -4767,7 +4767,7 @@ impl<'a, T: AsRef<[u32]>> Iterator for StateIter<'a, T> {

/// An immutable representation of a single DFA state.
///
/// `'a` correspondings to the lifetime of a DFA's transition table.
/// `'a` corresponding to the lifetime of a DFA's transition table.
pub(crate) struct State<'a> {
id: StateID,
stride2: usize,
Expand Down
2 changes: 1 addition & 1 deletion regex-automata/src/dfa/determinize.rs
Expand Up @@ -466,7 +466,7 @@ impl<'a> Runner<'a> {
) -> Result<(StateID, bool), BuildError> {
// Compute the look-behind assertions that are true in this starting
// configuration, and the determine the epsilon closure. While
// computing the epsilon closure, we only follow condiional epsilon
// computing the epsilon closure, we only follow conditional epsilon
// transitions that satisfy the look-behind assertions in 'look_have'.
let mut builder_matches = self.get_state_builder().into_matches();
util::determinize::set_lookbehind_from_start(
Expand Down
2 changes: 1 addition & 1 deletion regex-automata/src/dfa/mod.rs
Expand Up @@ -271,7 +271,7 @@ memory.) Conversely, compiling the same regex without Unicode support, e.g.,
`(?-u)\w{50}`, takes under 1 millisecond and about 15KB of memory. For this
reason, you should only use Unicode character classes if you absolutely need
them! (They are enabled by default though.)
* This module does not support Unicode word boundaries. ASCII word bondaries
* This module does not support Unicode word boundaries. ASCII word boundaries
may be used though by disabling Unicode or selectively doing so in the syntax,
e.g., `(?-u:\b)`. There is also an option to
[heuristically enable Unicode word boundaries](crate::dfa::dense::Config::unicode_word_boundary),
Expand Down
2 changes: 1 addition & 1 deletion regex-automata/src/dfa/onepass.rs
Expand Up @@ -926,7 +926,7 @@ impl<'a> InternalBuilder<'a> {
///
/// A one-pass DFA can be built from an NFA that is one-pass. An NFA is
/// one-pass when there is never any ambiguity about how to continue a search.
/// For example, `a*a` is not one-pass becuase during a search, it's not
/// For example, `a*a` is not one-pass because during a search, it's not
/// possible to know whether to continue matching the `a*` or to move on to
/// the single `a`. However, `a*b` is one-pass, because for every byte in the
/// input, it's always clear when to move on from `a*` to `b`.
Expand Down
4 changes: 2 additions & 2 deletions regex-automata/src/dfa/special.rs
Expand Up @@ -43,7 +43,7 @@ macro_rules! err {
// some other match state, even when searching an empty string.)
//
// These are not mutually exclusive categories. Namely, the following
// overlappings can occur:
// overlapping can occur:
//
// * {dead, start} - If a DFA can never lead to a match and it is minimized,
// then it will typically compile to something where all starting IDs point
Expand All @@ -62,7 +62,7 @@ macro_rules! err {
// though from the perspective of the DFA, they are equivalent. (Indeed,
// minimization special cases them to ensure they don't get merged.) The
// purpose of keeping them distinct is to use the quit state as a sentinel to
// distguish between whether a search finished successfully without finding
// distinguish between whether a search finished successfully without finding
// anything or whether it gave up before finishing.
//
// So the main problem we want to solve here is the *fast* detection of whether
Expand Down
6 changes: 3 additions & 3 deletions regex-automata/src/hybrid/dfa.rs
Expand Up @@ -1247,7 +1247,7 @@ impl DFA {
/// the unknown transition. Otherwise, trying to use the "unknown" state
/// ID will just result in transitioning back to itself, and thus never
/// terminating. (This is technically a special exemption to the state ID
/// validity rules, but is permissible since this routine is guarateed to
/// validity rules, but is permissible since this routine is guaranteed to
/// never mutate the given `cache`, and thus the identifier is guaranteed
/// to remain valid.)
///
Expand Down Expand Up @@ -1371,7 +1371,7 @@ impl DFA {
/// the unknown transition. Otherwise, trying to use the "unknown" state
/// ID will just result in transitioning back to itself, and thus never
/// terminating. (This is technically a special exemption to the state ID
/// validity rules, but is permissible since this routine is guarateed to
/// validity rules, but is permissible since this routine is guaranteed to
/// never mutate the given `cache`, and thus the identifier is guaranteed
/// to remain valid.)
///
Expand Down Expand Up @@ -1857,7 +1857,7 @@ pub struct Cache {
bytes_searched: usize,
/// The progress of the current search.
///
/// This is only non-`None` when callers utlize the `Cache::search_start`,
/// This is only non-`None` when callers utilize the `Cache::search_start`,
/// `Cache::search_update` and `Cache::search_finish` APIs.
///
/// The purpose of recording search progress is to be able to make a
Expand Down
2 changes: 1 addition & 1 deletion regex-automata/src/hybrid/id.rs
Expand Up @@ -30,7 +30,7 @@
/// setting for start states to be tagged. The reason for this is
/// that a DFA search loop is usually written to execute a prefilter once it
/// enters a start state. But if there is no prefilter, this handling can be
/// quite diastrous as the DFA may ping-pong between the special handling code
/// quite disastrous as the DFA may ping-pong between the special handling code
/// and a possible optimized hot path for handling untagged states. When start
/// states aren't specialized, then they are untagged and remain in the hot
/// path.
Expand Down
4 changes: 2 additions & 2 deletions regex-automata/src/meta/wrappers.rs
Expand Up @@ -383,7 +383,7 @@ impl OnePassEngine {
// that we either have at least one explicit capturing group or
// there's a Unicode word boundary somewhere. If we don't have
// either of these things, then the lazy DFA will almost certainly
// be useable and be much faster. The only case where it might
// be usable and be much faster. The only case where it might
// not is if the lazy DFA isn't utilizing its cache effectively,
// but in those cases, the underlying regex is almost certainly
// not one-pass or is too big to fit within the current one-pass
Expand Down Expand Up @@ -886,7 +886,7 @@ impl DFAEngine {
// Enabling this is necessary for ensuring we can service any
// kind of 'Input' search without error. For the full DFA, this
// can be quite costly. But since we have such a small bound
// on the size of the DFA, in practice, any multl-regexes are
// on the size of the DFA, in practice, any multi-regexes are
// probably going to blow the limit anyway.
.starts_for_each_pattern(true)
.byte_classes(info.config().get_byte_classes())
Expand Down

0 comments on commit 213e00b

Please sign in to comment.