-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle wordnet synsets that were lost in mapping #2985
Conversation
Test failure on Ubuntu only, seems unrelated with this PR:
|
All tests passed after changing the key of the third-party cache in ci.yaml. This change should probably be undone, but I don't know if that would restore the bad cache. |
Apologies for the mess with the CI tests... One of these days we'll manage to get a test suite working with some consistency. Truthfully, I'm not sure what went wrong with those failing tests. I haven't had the time to help you out with the tests or look at this work, sadly, so I'll leave this for Steven, or for later, but I wanted to mention that I appreciate the work you're putting into NLTK's WordNet. |
Thank you @ekaf. Sorry about our CI consistency issues! |
Thanks @tomaarsen and @stevenbird! According to actions/cache#2 (comment), other projects see persistent corruptions of the CI cache, so it is not a specific NLTK issue. But maybe the actions/cache docs include ideas that could be applied from the NLTK side. For ex. the keys used by NLTK in .github/workflows/ci.yaml include the variable ${{ secrets.CACHE_VERSION }}. The value of this variable is interpreted as void, because secrets are not passed publicly (which makes sense, since they would not be secrets otherwise), so NLTK keys might as well have nothing instead of this variable. Then it could be advantageous to use some other variable instead, for ex. the current Year/Month/Day; this would only be a slight improvement though, limiting eventual cache corruption to one day. |
I added the current This problem of not being able to easily update the cache on a PR is one that I've looked at before. I've tried an approach where if the last commit message contains These are fixes of a consequence, while your solution could be the fix to the cause. Ensuring there is a new cache every e.g. week could prevent the cache from ever corrupting. That is just speculation, though. |
@tomaarsen, according to the actions/cache/README.md:
It could be nice, not to have to wait that long though, and the same page includes a recipe for automatically refreshing the cache each day. In NLTK's case, this would approximatively incur a cost of 2.5 Gbs*30 days. |
This PR fixes #2984 so that, when a wordsense has been lost in a wordnet mapping, the remaining senses of the word can still be retrieved. For ex: instead of raising a fatal error, the following retrieves the French senses of 'perceptible' in WordNet 3.1:
[Synset('detectable.s.02'), Synset('None'), Synset('perceptible.a.01')]