Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak sequence tokenization #10904

Merged
merged 2 commits into from Feb 8, 2024
Merged

Tweak sequence tokenization #10904

merged 2 commits into from Feb 8, 2024

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 6, 2024

Allow calling normalize_token() outside of tokenize(). This is useful for debugging purposes.
Additionally, polish some tests in test_tokenize.

@crusaderky crusaderky self-assigned this Feb 6, 2024
@crusaderky crusaderky marked this pull request as ready for review February 6, 2024 19:42
Copy link
Contributor

github-actions bot commented Feb 6, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ± 0       15 suites  ±0   3h 13m 3s ⏱️ - 6m 13s
 12 988 tests + 1   12 059 ✅ + 1     929 💤 ±0  0 ❌ ±0 
160 507 runs  +15  144 002 ✅ +16  16 505 💤  - 1  0 ❌ ±0 

Results for commit 097a48a. ± Comparison against base commit f51fa77.

This pull request removes 1 and adds 2 tests. Note that renamed tests count towards both.
dask.tests.test_tokenize ‑ test_normalize_function
dask.tests.test_tokenize ‑ test_tokenize_composite_functions
dask.tests.test_tokenize ‑ test_tokenize_numpy_array

seen = _seen.get()
tok = None
except LookupError:
# This is for debug only, for when normalize_token is called outside of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd, do we really want to keep this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is extremely useful to figure out why tokens have changed when anything goes wrong. I'm already using it in dask/distributed#8185

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@phofl phofl merged commit 1cee596 into dask:main Feb 8, 2024
27 of 28 checks passed
@phofl
Copy link
Collaborator

phofl commented Feb 8, 2024

thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants