Avoid relying on a hashtable iteration order. #3960
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The toolchain was iterating a map of package name to import list in a couple of places to do things beyond counting or other order-invariant operations. The result was that the specific hashtable iteration order influenced the order of SemIR generated (and other behaviors like diagnostic emission I suspect, but the source location sorting probably hid this). Switching to a hashtable implementation with any order seeding that changes run-to-run immediately shows the SemIR order fluctuating without this.
This PR fixes that by instead accumulating the package imports data in a vector and using a map to vector indices. This is also slightly more efficient, although that seems unlikely to be an important factor here.
There was only one insertion point so I've just hand coded the management of the indices and map, but happy to take a different approach or use an abstraction here if desired.
One awkward aspect of this is that some of the loops need access to the package name as well. I've just added storage for that as these seem unlikely to be huge arrays of 10s of 1000s of imported packages, so the double storage of the identifier ID seems likely OK. But again, happy to take a different approach here if desired. It also seems like it might be possible to work out the identifier from the node, but I kept the patch more direct for simplicity.
I've also not used the LLVM
MapVector
abstraction of this pattern. This was mostly to avoid adding another layer of abstractions to our data structures, and because this is the first time we've hit this really. My experience is also that it is reasonably often that there is a more efficient way to orient the vector and map than what is automatically provided. But that may just be my experience.