Proofread some code/docs

* `escapeable` should be replaced by `escapable`, but it is part of a pub fn
rust-lang · Oct 16, 2023 · 213e00b · 213e00b
1 parent e7bd19d
commit 213e00b
Show file tree

Hide file tree

Showing 58 changed files with 147 additions and 147 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -11,7 +11,7 @@ on:
 # `schedule` event. By specifying any permission explicitly all others are set
 # to none. By using the principle of least privilege the damage a compromised
 # workflow can do (because of an injection or compromised third party tool or
-# action) is restricted. Currently the worklow doesn't need any additional
+# action) is restricted. Currently, the workflow doesn't need any additional
 # permission except for pulling the code. Adding labels to issues, commenting
 # on pull-requests, etc. may need additional permissions:
 #

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -25,7 +25,7 @@ The new word boundary assertions are:
 * `\<` or `\b{start}`: a Unicode start-of-word boundary (`\W|\A` on the left,
 `\w` on the right).
 * `\>` or `\b{end}`: a Unicode end-of-word boundary (`\w` on the left, `\W|\z`
-on the right)).
+on the right).
 * `\b{start-half}`: half of a Unicode start-of-word boundary (`\W|\A` on the
 left).
 * `\b{end-half}`: half of a Unicode end-of-word boundary (`\W|\z` on the
@@ -139,7 +139,7 @@ Bug fixes:
 
 * [BUG #934](https://github.com/rust-lang/regex/issues/934):
 Fix a performance bug where high contention on a single regex led to massive
-slow downs.
+slow-downs.
 
 
 1.9.4 (2023-08-26)
@@ -382,14 +382,14 @@ New features:
 Permit many more characters to be escaped, even if they have no significance.
 More specifically, any ASCII character except for `[0-9A-Za-z<>]` can now be
 escaped. Also, a new routine, `is_escapeable_character`, has been added to
-`regex-syntax` to query whether a character is escapeable or not.
+`regex-syntax` to query whether a character is escapable or not.
 * [FEATURE #547](https://github.com/rust-lang/regex/issues/547):
 Add `Regex::captures_at`. This fills a hole in the API, but doesn't otherwise
 introduce any new expressive power.
 * [FEATURE #595](https://github.com/rust-lang/regex/issues/595):
 Capture group names are now Unicode-aware. They can now begin with either a `_`
 or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints
-can be any sequence of alpha-numeric codepoints, along with `_`, `.`, `[` and
+can be any sequence of alphanumeric codepoints, along with `_`, `.`, `[` and
 `]`. Note that replacement syntax has not changed.
 * [FEATURE #810](https://github.com/rust-lang/regex/issues/810):
 Add `Match::is_empty` and `Match::len` APIs.
@@ -433,7 +433,7 @@ Fix a number of issues with printing `Hir` values as regex patterns.
 * [BUG #610](https://github.com/rust-lang/regex/issues/610):
 Add explicit example of `foo|bar` in the regex syntax docs.
 * [BUG #625](https://github.com/rust-lang/regex/issues/625):
-Clarify that `SetMatches::len` does not (regretably) refer to the number of
+Clarify that `SetMatches::len` does not (regrettably) refer to the number of
 matches in the set.
 * [BUG #660](https://github.com/rust-lang/regex/issues/660):
 Clarify "verbose mode" in regex syntax documentation.
@@ -820,7 +820,7 @@ Bug fixes:
 
 1.3.1 (2019-09-04)
 ==================
-This is a maintenance release with no changes in order to try to work-around
+This is a maintenance release with no changes in order to try to work around
 a [docs.rs/Cargo issue](https://github.com/rust-lang/docs.rs/issues/400).
 
 
@@ -855,15 +855,15 @@ This release does a bit of house cleaning. Namely:
   Rust project.
 * Teddy has been removed from the `regex` crate, and is now part of the
   `aho-corasick` crate.
-  [See `aho-corasick`'s new `packed` sub-module for details](https://docs.rs/aho-corasick/0.7.6/aho_corasick/packed/index.html).
+  [See `aho-corasick`'s new `packed` submodule for details](https://docs.rs/aho-corasick/0.7.6/aho_corasick/packed/index.html).
 * The `utf8-ranges` crate has been deprecated, with its functionality moving
   into the
   [`utf8` sub-module of `regex-syntax`](https://docs.rs/regex-syntax/0.6.11/regex_syntax/utf8/index.html).
 * The `ucd-util` dependency has been dropped, in favor of implementing what
   little we need inside of `regex-syntax` itself.
 
 In general, this is part of an ongoing (long term) effort to make optimizations
-in the regex engine easier to reason about. The current code is too convoluted
+in the regex engine easier to reason about. The current code is too convoluted,
 and thus it is very easy to introduce new bugs. This simplification effort is
 the primary motivation behind re-working the `aho-corasick` crate to not only
 bundle algorithms like Teddy, but to also provide regex-like match semantics
@@ -1065,7 +1065,7 @@ need or want to use these APIs.
 New features:
 
 * [FEATURE #493](https://github.com/rust-lang/regex/pull/493):
-  Add a few lower level APIs for amortizing allocation and more fine grained
+  Add a few lower level APIs for amortizing allocation and more fine-grained
   searching.
 
 Bug fixes:
@@ -1111,7 +1111,7 @@ of the regex library should be able to migrate to 1.0 by simply bumping the
 version number. The important changes are as follows:
 
 * We adopt Rust 1.20 as the new minimum supported version of Rust for regex.
-  We also tentativley adopt a policy that permits bumping the minimum supported
+  We also tentatively adopt a policy that permits bumping the minimum supported
   version of Rust in minor version releases of regex, but no patch releases.
   That is, with respect to semver, we do not strictly consider bumping the
   minimum version of Rust to be a breaking change, but adopt a conservative
@@ -1198,7 +1198,7 @@ Bug fixes:
 
 0.2.8 (2018-03-12)
 ==================
-Bug gixes:
+Bug fixes:
 
 * [BUG #454](https://github.com/rust-lang/regex/pull/454):
   Fix a bug in the nest limit checker being too aggressive.
@@ -1219,7 +1219,7 @@ New features:
 * Full support for intersection, difference and symmetric difference of
   character classes. These can be used via the `&&`, `--` and `~~` binary
   operators within classes.
-* A Unicode Level 1 conformat implementation of `\p{..}` character classes.
+* A Unicode Level 1 conformant implementation of `\p{..}` character classes.
   Things like `\p{scx:Hira}`, `\p{age:3.2}` or `\p{Changes_When_Casefolded}`
   now work. All property name and value aliases are supported, and properties
   are selected via loose matching. e.g., `\p{Greek}` is the same as
@@ -1342,7 +1342,7 @@ Bug fixes:
 0.2.1
 =====
 One major bug with `replace_all` has been fixed along with a couple of other
-touchups.
+touch-ups.
 
 * [BUG #312](https://github.com/rust-lang/regex/issues/312):
   Fix documentation for `NoExpand` to reference correct lifetime parameter.
@@ -1491,7 +1491,7 @@ A number of bugs have been fixed:
 * Fix bug #277.
 * [PR #270](https://github.com/rust-lang/regex/pull/270):
   Fixes bugs #264, #268 and an unreported where the DFA cache size could be
-  drastically under estimated in some cases (leading to high unexpected memory
+  drastically underestimated in some cases (leading to high unexpected memory
   usage).
 
 0.1.73

diff --git a/README.md b/README.md
@@ -171,7 +171,7 @@ assert!(matches.matched(6));
 ### Usage: regex internals as a library
 
 The [`regex-automata` directory](./regex-automata/) contains a crate that
-exposes all of the internal matching engines used by the `regex` crate. The
+exposes all the internal matching engines used by the `regex` crate. The
 idea is that the `regex` crate exposes a simple API for 99% of use cases, but
 `regex-automata` exposes oodles of customizable behaviors.
 
@@ -192,7 +192,7 @@ recommended for general use.
 
 ### Crate features
 
-This crate comes with several features that permit tweaking the trade off
+This crate comes with several features that permit tweaking the trade-off
 between binary size, compilation time and runtime performance. Users of this
 crate can selectively disable Unicode tables, or choose from a variety of
 optimizations performed by this crate to disable.
@@ -230,7 +230,7 @@ searches are "fast" in practice.
 
 While the first interpretation is pretty unambiguous, the second one remains
 nebulous. While nebulous, it guides this crate's architecture and the sorts of
-the trade offs it makes. For example, here are some general architectural
+the trade-offs it makes. For example, here are some general architectural
 statements that follow as a result of the goal to be "fast":
 
 * When given the choice between faster regex searches and faster _Rust compile

diff --git a/UNICODE.md b/UNICODE.md
@@ -207,21 +207,21 @@ Finally, Unicode word boundaries can be disabled, which will cause ASCII word
 boundaries to be used instead. That is, `\b` is a Unicode word boundary while
 `(?-u)\b` is an ASCII-only word boundary. This can occasionally be beneficial
 if performance is important, since the implementation of Unicode word
-boundaries is currently sub-optimal on non-ASCII text.
+boundaries is currently suboptimal on non-ASCII text.
 
 
 ## RL1.5 Simple Loose Matches
 
 [UTS#18 RL1.5](https://unicode.org/reports/tr18/#Simple_Loose_Matches)
 
-The regex crate provides full support for case insensitive matching in
+The regex crate provides full support for case-insensitive matching in
 accordance with RL1.5. That is, it uses the "simple" case folding mapping. The
 "simple" mapping was chosen because of a key convenient property: every
 "simple" mapping is a mapping from exactly one code point to exactly one other
-code point. This makes case insensitive matching of character classes, for
+code point. This makes case-insensitive matching of character classes, for
 example, straight-forward to implement.
 
-When case insensitive mode is enabled (e.g., `(?i)[a]` is equivalent to `a|A`),
+When case-insensitive mode is enabled (e.g., `(?i)[a]` is equivalent to `a|A`),
 then all characters classes are case folded as well.
 
 
@@ -248,7 +248,7 @@ Given Rust's strong ties to UTF-8, the following guarantees are also provided:
 * All matches are reported on valid UTF-8 code unit boundaries. That is, any
   match range returned by the public regex API is guaranteed to successfully
   slice the string that was searched.
-* By consequence of the above, it is impossible to match surrogode code points.
+* By consequence of the above, it is impossible to match surrogate code points.
   No support for UTF-16 is provided, so this is never necessary.
 
 Note that when Unicode mode is disabled, the fundamental atom of matching is

diff --git a/record/compile-test/README.md b/record/compile-test/README.md
@@ -1,5 +1,5 @@
 This directory contains the results of compilation tests. Specifically,
-the results are from testing both the from scratch compilation time and
+the results are from testing both the from-scratch compilation time and
 relative binary size increases of various features for both the `regex` and
 `regex-automata` crates.
 

diff --git a/regex-automata/README.md b/regex-automata/README.md
@@ -66,7 +66,7 @@ Below is an outline of how `unsafe` is used in this crate.
 
 * `util::pool::Pool` makes use of `unsafe` to implement a fast path for
 accessing an element of the pool. The fast path applies to the first thread
-that uses the pool. In effect, the fast path is fast because it avoid a mutex
+that uses the pool. In effect, the fast path is fast because it avoids a mutex
 lock. `unsafe` is also used in the no-std version of `Pool` to implement a spin
 lock for synchronization.
 * `util::lazy::Lazy` uses `unsafe` to implement a variant of
@@ -112,6 +112,6 @@ In the end, I do still somewhat consider this crate an experiment. It is
 unclear whether the strong boundaries between components will be an impediment
 to ongoing development or not. De-coupling tends to lead to slower development
 in my experience, and when you mix in the added cost of not introducing
-breaking changes all of the time, things can get quite complicated. But, I
+breaking changes all the time, things can get quite complicated. But, I
 don't think anyone has ever release the internals of a regex engine as a
 library before. So it will be interesting to see how it plays out!
diff --git a/regex-automata/src/dfa/automaton.rs b/regex-automata/src/dfa/automaton.rs
@@ -2202,7 +2202,7 @@ where
 ///
 /// Specifically, this tries to succinctly distinguish the different types of
 /// states: dead states, quit states, accelerated states, start states and
-/// match states. It even accounts for the possible overlappings of different
+/// match states. It even accounts for the possible overlapping of different
 /// state types.
 pub(crate) fn fmt_state_indicator<A: Automaton>(
     f: &mut core::fmt::Formatter<'_>,

diff --git a/regex-automata/src/dfa/dense.rs b/regex-automata/src/dfa/dense.rs
@@ -2810,7 +2810,7 @@ impl OwnedDFA {
         }
 
         // Collect all our non-DEAD start states into a convenient set and
-        // confirm there is no overlap with match states. In the classicl DFA
+        // confirm there is no overlap with match states. In the classical DFA
         // construction, start states can be match states. But because of
         // look-around, we delay all matches by a byte, which prevents start
         // states from being match states.
@@ -3461,7 +3461,7 @@ impl TransitionTable<Vec<u32>> {
         // Normally, to get a fresh state identifier, we would just
         // take the index of the next state added to the transition
         // table. However, we actually perform an optimization here
-        // that premultiplies state IDs by the stride, such that they
+        // that pre-multiplies state IDs by the stride, such that they
         // point immediately at the beginning of their transitions in
         // the transition table. This avoids an extra multiplication
         // instruction for state lookup at search time.
@@ -4509,7 +4509,7 @@ impl<T: AsRef<[u32]>> MatchStates<T> {
         + (self.pattern_ids().len() * PatternID::SIZE)
     }
 
-    /// Valides that the match state info is itself internally consistent and
+    /// Validates that the match state info is itself internally consistent and
     /// consistent with the recorded match state region in the given DFA.
     fn validate(&self, dfa: &DFA<T>) -> Result<(), DeserializeError> {
         if self.len() != dfa.special.match_len(dfa.stride()) {
@@ -4767,7 +4767,7 @@ impl<'a, T: AsRef<[u32]>> Iterator for StateIter<'a, T> {
 
 /// An immutable representation of a single DFA state.
 ///
-/// `'a` correspondings to the lifetime of a DFA's transition table.
+/// `'a` corresponding to the lifetime of a DFA's transition table.
 pub(crate) struct State<'a> {
     id: StateID,
     stride2: usize,

diff --git a/regex-automata/src/dfa/determinize.rs b/regex-automata/src/dfa/determinize.rs
@@ -466,7 +466,7 @@ impl<'a> Runner<'a> {
     ) -> Result<(StateID, bool), BuildError> {
         // Compute the look-behind assertions that are true in this starting
         // configuration, and the determine the epsilon closure. While
-        // computing the epsilon closure, we only follow condiional epsilon
+        // computing the epsilon closure, we only follow conditional epsilon
         // transitions that satisfy the look-behind assertions in 'look_have'.
         let mut builder_matches = self.get_state_builder().into_matches();
         util::determinize::set_lookbehind_from_start(

diff --git a/regex-automata/src/dfa/mod.rs b/regex-automata/src/dfa/mod.rs
@@ -271,7 +271,7 @@ memory.) Conversely, compiling the same regex without Unicode support, e.g.,
 `(?-u)\w{50}`, takes under 1 millisecond and about 15KB of memory. For this
 reason, you should only use Unicode character classes if you absolutely need
 them! (They are enabled by default though.)
-* This module does not support Unicode word boundaries. ASCII word bondaries
+* This module does not support Unicode word boundaries. ASCII word boundaries
 may be used though by disabling Unicode or selectively doing so in the syntax,
 e.g., `(?-u:\b)`. There is also an option to
 [heuristically enable Unicode word boundaries](crate::dfa::dense::Config::unicode_word_boundary),

diff --git a/regex-automata/src/dfa/onepass.rs b/regex-automata/src/dfa/onepass.rs
@@ -926,7 +926,7 @@ impl<'a> InternalBuilder<'a> {
 ///
 /// A one-pass DFA can be built from an NFA that is one-pass. An NFA is
 /// one-pass when there is never any ambiguity about how to continue a search.
-/// For example, `a*a` is not one-pass becuase during a search, it's not
+/// For example, `a*a` is not one-pass because during a search, it's not
 /// possible to know whether to continue matching the `a*` or to move on to
 /// the single `a`. However, `a*b` is one-pass, because for every byte in the
 /// input, it's always clear when to move on from `a*` to `b`.

diff --git a/regex-automata/src/dfa/special.rs b/regex-automata/src/dfa/special.rs
@@ -43,7 +43,7 @@ macro_rules! err {
 //   some other match state, even when searching an empty string.)
 //
 // These are not mutually exclusive categories. Namely, the following
-// overlappings can occur:
+// overlapping can occur:
 //
 // * {dead, start} - If a DFA can never lead to a match and it is minimized,
 //   then it will typically compile to something where all starting IDs point
@@ -62,7 +62,7 @@ macro_rules! err {
 // though from the perspective of the DFA, they are equivalent. (Indeed,
 // minimization special cases them to ensure they don't get merged.) The
 // purpose of keeping them distinct is to use the quit state as a sentinel to
-// distguish between whether a search finished successfully without finding
+// distinguish between whether a search finished successfully without finding
 // anything or whether it gave up before finishing.
 //
 // So the main problem we want to solve here is the *fast* detection of whether

diff --git a/regex-automata/src/hybrid/dfa.rs b/regex-automata/src/hybrid/dfa.rs
@@ -1247,7 +1247,7 @@ impl DFA {
     /// the unknown transition. Otherwise, trying to use the "unknown" state
     /// ID will just result in transitioning back to itself, and thus never
     /// terminating. (This is technically a special exemption to the state ID
-    /// validity rules, but is permissible since this routine is guarateed to
+    /// validity rules, but is permissible since this routine is guaranteed to
     /// never mutate the given `cache`, and thus the identifier is guaranteed
     /// to remain valid.)
     ///
@@ -1371,7 +1371,7 @@ impl DFA {
     /// the unknown transition. Otherwise, trying to use the "unknown" state
     /// ID will just result in transitioning back to itself, and thus never
     /// terminating. (This is technically a special exemption to the state ID
-    /// validity rules, but is permissible since this routine is guarateed to
+    /// validity rules, but is permissible since this routine is guaranteed to
     /// never mutate the given `cache`, and thus the identifier is guaranteed
     /// to remain valid.)
     ///
@@ -1857,7 +1857,7 @@ pub struct Cache {
     bytes_searched: usize,
     /// The progress of the current search.
     ///
-    /// This is only non-`None` when callers utlize the `Cache::search_start`,
+    /// This is only non-`None` when callers utilize the `Cache::search_start`,
     /// `Cache::search_update` and `Cache::search_finish` APIs.
     ///
     /// The purpose of recording search progress is to be able to make a

diff --git a/regex-automata/src/hybrid/id.rs b/regex-automata/src/hybrid/id.rs
@@ -30,7 +30,7 @@
 /// setting for start states to be tagged. The reason for this is
 /// that a DFA search loop is usually written to execute a prefilter once it
 /// enters a start state. But if there is no prefilter, this handling can be
-/// quite diastrous as the DFA may ping-pong between the special handling code
+/// quite disastrous as the DFA may ping-pong between the special handling code
 /// and a possible optimized hot path for handling untagged states. When start
 /// states aren't specialized, then they are untagged and remain in the hot
 /// path.

diff --git a/regex-automata/src/meta/wrappers.rs b/regex-automata/src/meta/wrappers.rs
@@ -383,7 +383,7 @@ impl OnePassEngine {
             // that we either have at least one explicit capturing group or
             // there's a Unicode word boundary somewhere. If we don't have
             // either of these things, then the lazy DFA will almost certainly
-            // be useable and be much faster. The only case where it might
+            // be usable and be much faster. The only case where it might
             // not is if the lazy DFA isn't utilizing its cache effectively,
             // but in those cases, the underlying regex is almost certainly
             // not one-pass or is too big to fit within the current one-pass
@@ -886,7 +886,7 @@ impl DFAEngine {
                 // Enabling this is necessary for ensuring we can service any
                 // kind of 'Input' search without error. For the full DFA, this
                 // can be quite costly. But since we have such a small bound
-                // on the size of the DFA, in practice, any multl-regexes are
+                // on the size of the DFA, in practice, any multi-regexes are
                 // probably going to blow the limit anyway.
                 .starts_for_each_pattern(true)
                 .byte_classes(info.config().get_byte_classes())