Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add get_hash, make DAGs deterministic #457

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

matthiasdiener
Copy link
Collaborator

No description provided.

@inducer
Copy link
Owner

inducer commented Sep 18, 2023

How does this relate to pytools.persistent_dict? (Which is applicable to pytato DAGs btw.)

@matthiasdiener
Copy link
Collaborator Author

How does this relate to pytools.persistent_dict? (Which is applicable to pytato DAGs btw.)

Hmm, I'm not sure - I could only imagine using a KeyBuilder here.

@matthiasdiener matthiasdiener self-assigned this Sep 18, 2023
@matthiasdiener matthiasdiener changed the title add get_hash add get_hash, make DAGs deterministic Sep 19, 2023
@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Sep 19, 2023

Some (discouraging) performance results regarding other dict implementations:

import immutables
import frozendict
import immutabledict
import timeit

d_init = {}

print("init_dict", timeit.timeit("for i in range(10000000): d_init[i] = i", globals=globals(), number=3))

d = dict(d_init)
print("dict", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))

d = immutabledict.immutabledict(d_init)
print("immutabledict", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))

d = frozendict.frozendict(d_init)
print("frozendict", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))

d = immutables.Map(d_init)
print("immutables", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))
init_dict 1.2013676660135388
dict 0.6082258749520406
immutabledict 2.560783916967921
frozendict 0.6182653750292957
immutables 2.350354666938074
Package Performance License Iteration order Frozen
dict
immutables.Map
frozendict
immutabledict
pyrsistent.map

@inducer
Copy link
Owner

inducer commented Sep 19, 2023

This is surprising. immutabledict is a really simple animal. Given the implementation of iteration, it should perform no different than iteration over a dict:

https://github.com/corenting/immutabledict/blob/9684df33bdcd68c4a1678be349592dd8b0c6d940/immutabledict/__init__.py#L41-L42

Also, I'm not sure that iteration is the most salient thing to benchmark, I would imagine creation is likely more important.

Also, since most of our dicts are going to be quite small: If we model cost as $T(n)=\alpha n+\beta$, $\beta$ it might be interesting to find both $\alpha$ and $\beta$.

And: a cheap win might be had by simply feeding the single-file implementation of immutabledict to Cython.

@matthiasdiener
Copy link
Collaborator Author

This is surprising. immutabledict is a really simple animal. Given the implementation of iteration, it should perform no different than iteration over a dict:

https://github.com/corenting/immutabledict/blob/9684df33bdcd68c4a1678be349592dd8b0c6d940/immutabledict/__init__.py#L41-L42

Adding an explicit implementation for items() (instead of relying on super().items() I guess?) makes the package run as fast as dict. Same for keys() and values().

@inducer
Copy link
Owner

inducer commented Sep 19, 2023

Oh cool. So we may have a winner?

@inducer
Copy link
Owner

inducer commented Sep 19, 2023

(After a PR to that package, at least?)

@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Sep 19, 2023

Oh cool. So we may have a winner?

Yep, I'm working on a PR (corenting/immutabledict#265).

@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Sep 19, 2023

We didn't finish the discussion in the morning - is there still value in this PR?

From what I can see, there are 3 pieces here:

  • get_hash() (probably not worth it)
  • new Datawrapper.__hash__
  • test with PYTHONHASHSEED (probably not worth it)

@inducer
Copy link
Owner

inducer commented Sep 20, 2023

Could you explain what get_hash would do, compared to hash or the persistent-dict hash procedure?

If the answer is "make the hash consistent" run-over-run, I'm not sure I like that idea.

@matthiasdiener
Copy link
Collaborator Author

Could you explain what get_hash would do, compared to hash or the persistent-dict hash procedure?

If the answer is "make the hash consistent" run-over-run, I'm not sure I like that idea.

Right, in that case there wouldn't really a difference between get_hash and hash, I think.

@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Sep 21, 2023

1bfc5a2 adds a PytatoKeyBuilder, based off LoopyKeyBuilder (and makes get_hash use it, just as a test). Is this going in the direction you had in mind @inducer ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants