Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AhoCorasick to speed up quote match #9773

Merged
merged 1 commit into from
Feb 2, 2024
Merged

Use AhoCorasick to speed up quote match #9773

merged 1 commit into from
Feb 2, 2024

Conversation

charliermarsh
Copy link
Member

Summary

When I was looking at the v0.2.0 release, this method showed up in a CodSpeed regression (we were calling it more), so I decided to quickly look at speeding it up. @BurntSushi suggested using Aho-Corasick, and it looks like it's about 7 or 8x faster:

Parser/AhoCorasick      time:   [8.5646 ns 8.5914 ns 8.6191 ns]
Parser/Iterator         time:   [64.992 ns 65.124 ns 65.271 ns]

Test Plan

cargo test

@charliermarsh charliermarsh added the performance Potential performance improvement label Feb 2, 2024
Some("\"")
} else {
None
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a bit faster than the iterator approach though both are quite fast (1ns vs. 800ps).

@charliermarsh
Copy link
Member Author

I bet we could speed this up even more but this feels like good bang for buck: https://lemire.me/blog/2023/07/14/recognizing-string-prefixes-with-simd-instructions/

Copy link
Contributor

github-actions bot commented Feb 2, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

sphinx-doc/sphinx (error)

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

sphinx-doc/sphinx (error)

ruff format --no-preview --exclude tests/roots/test-pycode/cp_1251_coded.py

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet.

.chain(SINGLE_QUOTE_STR_PREFIXES)
.chain(SINGLE_QUOTE_BYTE_PREFIXES),
)
.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. I'm glad you found StartKind::Anchored.

@charliermarsh charliermarsh merged commit ea1c089 into main Feb 2, 2024
17 checks passed
@charliermarsh charliermarsh deleted the charlie/str branch February 2, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants