Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parse and walk globs in parallel #7244

Merged
merged 2 commits into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/turborepo-globwalk/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ camino = { workspace = true }
itertools.workspace = true
path-clean = "1.0.1"
path-slash = "0.2.1"
rayon = "1"
regex.workspace = true
thiserror.workspace = true
tracing = "0.1.37"
Expand Down
32 changes: 17 additions & 15 deletions crates/turborepo-globwalk/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use camino::Utf8PathBuf;
use itertools::Itertools;
use path_clean::PathClean;
use path_slash::PathExt;
use rayon::prelude::*;
use regex::Regex;
use turbopath::{AbsoluteSystemPath, AbsoluteSystemPathBuf, PathError};
use wax::{walk::FileIterator, BuildError, Glob};
Expand Down Expand Up @@ -308,37 +309,38 @@ pub fn globwalk_internal(
let (base_path_new, include_paths, exclude_paths) =
preprocess_paths_and_globs(base_path, include, exclude)?;

let ex_patterns = exclude_paths
let ex_patterns: Vec<_> = exclude_paths
.into_iter()
.map(glob_with_contextual_error)
.collect::<Result<_, _>>()?;

include_paths
.into_iter()
let include_patterns = include_paths
.into_par_iter()
.map(glob_with_contextual_error)
.map_ok(|glob| walk_glob(walk_type, &base_path_new, &ex_patterns, glob))
// flat map to bring the results in the vec to the same level as the potential outer err
// this is the same as a flat_map_ok
.flat_map(|s| s.unwrap_or_else(|e| vec![Err(e)]))
.collect::<Result<Vec<_>, _>>()?;

include_patterns
.into_par_iter()
// Use flat_map_iter as we only want parallelism for walking the globs and not iterating
// over the results.
// See https://docs.rs/rayon/latest/rayon/iter/trait.ParallelIterator.html#method.flat_map_iter
.flat_map_iter(|glob| walk_glob(walk_type, &base_path_new, ex_patterns.clone(), glob))
.collect()
}

#[tracing::instrument(fields(glob=glob.to_string().as_str()))]
fn walk_glob(
walk_type: WalkType,
base_path_new: &PathBuf,
ex_patterns: &Vec<Glob>,
base_path_new: &Path,
ex_patterns: Vec<Glob>,
glob: Glob,
) -> Vec<Result<AbsoluteSystemPathBuf, WalkError>> {
glob.walk(&base_path_new)
.not(ex_patterns.clone())
glob.walk(base_path_new)
.not(ex_patterns)
.unwrap_or_else(|e| {
// Per docs, only fails if exclusion list is too large, since we're using
// pre-compiled globs
panic!(
"Failed to compile exclusion globs: {:?}: {}",
ex_patterns, e,
)
Comment on lines -338 to -341
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing the exclusion patterns from the panic message as this should only happen if the list is too large. I don't think that information is useful enough to warrant adding an additional clone.

panic!("Failed to compile exclusion globs: {}", e,)
})
.filter_map(|entry| visit_file(walk_type, entry))
.collect::<Vec<_>>()
Expand Down