Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check current session's pending-write queue when recalling snapshots (e.g. diffing) #927

Merged
merged 8 commits into from
Jan 13, 2025

Conversation

huonw
Copy link
Contributor

@huonw huonw commented Dec 23, 2024

Description

This PR fixes assert ... == snapshot(diff=...) mode to work better in two scenarios:

  • Adding two snapshots at the same time (i.e. not already "on disk"), where the second snapshot is a diff from the first
  • Changing the base data of a diff'd snapshot

In particular, it seems like the nice performance optimisation of #606, meant the "recalling" used for diffing wasn't finding newly created or updated data, and instead just reading whatever snapshot is available on disk.

This PR resolves the problem by having the recalling managed via the SnapshotSession, which can thus check its write queue before reading disk.

Related Issues

Checklist

  • This PR has sufficient documentation.
  • This PR has sufficient test coverage.
  • This PR title satisfies semantic convention.

Additional Comments

No additional comments.

Sorry, something went wrong.

…hots (e.g. diffing)
Copy link

codecov bot commented Dec 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #927      +/-   ##
==========================================
+ Coverage   97.75%   97.76%   +0.01%     
==========================================
  Files          21       21              
  Lines        1604     1613       +9     
==========================================
+ Hits         1568     1577       +9     
  Misses         36       36              

Copy link
Contributor Author

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, thank you for Syrupy!

Comment on lines 115 to 118
# find the last (i.e. most recent) write to this index/location in the queue:
for queue_data, queue_test_location, queue_index in reversed(queue):
if queue_index == index and queue_test_location == test_location:
return queue_data
Copy link
Contributor Author

@huonw huonw Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loops is a potential performance issue, but I suspect it is okay. In particular: it scales like O(length of queue), i.e. number of pending writes.

This function is called twice in SnapshotAssertion._assert:

If every assertion in a ambr file needs an update, this leads to quadratic O(number of assertions in that file^2) behaviour:

  • the first assert ... == snapshot has an empty queue, and adds an item
  • the second checks one element of the queue (doesn't match), and adds an item
  • the third checks two elements of the queue (don't match) and adds an item
  • ...
  • the Nth checks N - 1 elements of the queue (all don't match) and adds an item

i.e. N assertions ends up iterating over 0 + 1 + 2 + . . . + ( N 1 ) = ( N 1 ) ( N 2 ) 2 = O ( N 2 ) elements of queue.

I think this is probably okay:

  • this scales per snapshot location, not across the full testing session, since each queue is just for one key, which incorporates the snapshot location (i.e. if one has 1000 assertions, spread evenly across 100 test file (i.e. 10 assertions in each file), this is scaling with N = 10, not N = 1000).
  • how often does one have to completely re-write a whole file?
  • the work here is individually cheap (i.e. the constant factor is low)

Copy link
Collaborator

@noahnu noahnu Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a slowdown: #927 (comment)

I ran inv benchmark locally as well. Results for this branch:

------------------------------------------------------------------------------ benchmark: 3 tests ------------------------------------------------------------------------------
Name (time in s)         Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      1.0756 (1.0)      1.2278 (1.0)      1.1648 (1.0)      0.0630 (4.10)     1.1974 (1.0)      0.0948 (3.49)          2;0  0.8585 (1.0)           5           1
test_standard         1.1904 (1.11)     1.3294 (1.08)     1.2822 (1.10)     0.0534 (3.48)     1.2960 (1.08)     0.0439 (1.62)          1;1  0.7799 (0.91)          5           1
test_1000x_writes     1.3974 (1.30)     1.4278 (1.16)     1.4089 (1.21)     0.0153 (1.0)      1.3986 (1.17)     0.0272 (1.0)           1;0  0.7098 (0.83)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

and "main":

------------------------------------------------------------------------------ benchmark: 3 tests ------------------------------------------------------------------------------
Name (time in s)         Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      1.0264 (1.0)      1.1148 (1.0)      1.0855 (1.0)      0.0362 (1.0)      1.1006 (1.0)      0.0474 (1.0)           1;0  0.9212 (1.0)           5           1
test_standard         1.0904 (1.06)     1.1974 (1.07)     1.1461 (1.06)     0.0477 (1.32)     1.1414 (1.04)     0.0876 (1.85)          2;0  0.8725 (0.95)          5           1
test_1000x_writes     1.0937 (1.07)     1.2011 (1.08)     1.1406 (1.05)     0.0405 (1.12)     1.1313 (1.03)     0.0534 (1.13)          2;0  0.8767 (0.95)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The 1000x writes test performs 0.8767 operations per second on main but only 0.7098 operations per second with this change. Roughly 20% slowdown.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this scales per snapshot location,

This comes up when users are leveraging test parameterization over data sets. i.e. the benchmark case of parameterizing over 1000 test cases isn't too unrealistic from how I've seen syrupy used in the past.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the benchmarking is broken in CI for contributors (something to do with permissions in the github workflow I need to fix), however you should be able to run it locally with inv benchmark.

Copy link
Contributor Author

@huonw huonw Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. See #927 (comment)

Thanks for proving my "I think" wasn't accurate 😄

self.extension.read_snapshot(
test_location=self.test_location,
index=index,
session_id=str(id(self.session)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that #606 added this session_id parameter, which I imagine was meant to help with circumstances like this, but it seems like it ends up ignored/unused in practice?

And, in practice, I think it may be quite hard to use, since the relevant cached data is held in memory in the session itself (i.e. the extension _read_snapshot_data_from_location method calls don't really have access to it at all).

Copy link
Collaborator

@noahnu noahnu Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session_id is used as the cache_key argument in Amber's __cacheable_read_snapshot method. Although it's not used in the function, it's used by the lru_cache decorator (which caches based on the kwargs I believe). So it essentially invalidates the cache

@noahnu
Copy link
Collaborator

noahnu commented Dec 26, 2024

Benchmark

Benchmark suite Current: dbf395a Previous: d283e87 Ratio
benchmarks/test_1000x.py::test_1000x_reads 0.7313520234418077 iter/sec (stddev: 0.06327692389499665) 0.8206510272360953 iter/sec (stddev: 0.06414211339510058) 1.12
benchmarks/test_1000x.py::test_1000x_writes 0.5937400226479332 iter/sec (stddev: 0.16220190601651813) 0.7296660677418227 iter/sec (stddev: 0.160353318333188) 1.23
benchmarks/test_standard.py::test_standard 0.6769499288981089 iter/sec (stddev: 0.05068615977797293) 0.758693423573993 iter/sec (stddev: 0.06688161104621605) 1.12

This comment was automatically generated by workflow using github-action-benchmark.

@huonw
Copy link
Contributor Author

huonw commented Dec 27, 2024

Thanks for the review. I'll get back to it when I'm back at work after this holiday period.

huonw added 7 commits January 13, 2025 11:00

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
----------------------------------------------------------------------------------- benchmark: 3 tests -----------------------------------------------------------------------------------
Name (time in ms)          Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      666.9710 (1.0)      748.6652 (1.0)      705.2418 (1.0)      37.2862 (1.0)      703.0552 (1.0)      70.1912 (1.07)          2;0  1.4180 (1.0)           5           1
test_standard         669.7840 (1.00)     843.3747 (1.13)     733.8905 (1.04)     68.2257 (1.83)     705.8282 (1.00)     85.6269 (1.30)          1;0  1.3626 (0.96)          5           1
test_1000x_writes     793.8229 (1.19)     937.1953 (1.25)     850.9716 (1.21)     54.4067 (1.46)     847.3260 (1.21)     65.9041 (1.0)           2;0  1.1751 (0.83)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name (time in ms)          Min                   Max                Mean              StdDev              Median                 IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      625.5781 (1.0)        887.4346 (1.0)      694.6221 (1.0)      109.0048 (1.0)      658.3128 (1.0)       87.7517 (1.0)           1;1  1.4396 (1.0)           5           1
test_1000x_writes     637.3099 (1.02)     1,021.0924 (1.15)     812.9789 (1.17)     150.2342 (1.38)     757.7635 (1.15)     215.9572 (2.46)          2;0  1.2300 (0.85)          5           1
test_standard         694.1814 (1.11)     1,037.9224 (1.17)     845.1463 (1.22)     136.2068 (1.25)     785.6973 (1.19)     194.9636 (2.22)          2;0  1.1832 (0.82)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@huonw
Copy link
Contributor Author

huonw commented Jan 13, 2025

I've updated this to improve performance and address that concern.

This required making PyTestLocation hashable to be able to use it in a dictionary. This gives O(1) access to the relevant snapshot no matter how many snapshot writes are queued, rather than having to do a linear scan of a whole file's queue. The most reliable way to do this is setting @dataclass(frozen=True), but this requires some "hacks" using object.__setattr__ for initialising the no-init fields.

I've done this work across many independent commits to hopefully make review easier.

Benchmarks on my machine, reporting the mean time ± std-dev:

benchmark before (main) initial version of this PR current version
test_1000x_reads 637.6 ± 24.8 664.9 ± 20.2 656.9 ± 30.1
test_1000x_writes 655.2 ± 33.4 810.1 ± 37.6 ‼️ 675.5 ± 53.5
test_standard 691.8 ± 32.5 714.9 ± 32.5 695.0 ± 20.4

In summary, I think:

  • I can reproduce the large slow-down that we were concerned about in test_1000x_writes in the initial version of this PR ✅
  • the new version seems to get the numbers back to main ✅ (at least, seems likely to be within normal variation, based on the standard deviation there, and also when I run the benchmarks again locally)
Expand for the full details

Before (main):

----------------------------------------------------------------------------------- benchmark: 3 tests -----------------------------------------------------------------------------------
Name (time in ms)          Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      609.2453 (1.0)      665.1862 (1.0)      637.6388 (1.0)      24.7847 (1.0)      642.8187 (1.0)      45.0147 (1.37)          2;0  1.5683 (1.0)           5           1
test_1000x_writes     624.3305 (1.02)     705.0878 (1.06)     655.2347 (1.03)     33.3592 (1.35)     647.2840 (1.01)     51.7828 (1.57)          1;0  1.5262 (0.97)          5           1
test_standard         644.7847 (1.06)     735.3719 (1.11)     691.7773 (1.08)     32.4726 (1.31)     695.5663 (1.08)     32.9661 (1.0)           2;0  1.4456 (0.92)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Initial version of this PR:

----------------------------------------------------------------------------------- benchmark: 3 tests -----------------------------------------------------------------------------------
Name (time in ms)          Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      641.2010 (1.0)      688.5666 (1.0)      664.8896 (1.0)      20.2277 (1.0)      657.1012 (1.0)      33.6769 (1.11)          2;0  1.5040 (1.0)           5           1
test_1000x_writes     775.0379 (1.21)     871.4801 (1.27)     810.0917 (1.22)     37.6060 (1.86)     797.2370 (1.21)     45.4147 (1.50)          1;0  1.2344 (0.82)          5           1
test_standard         669.9822 (1.04)     734.9884 (1.07)     714.8819 (1.08)     26.4767 (1.31)     721.9564 (1.10)     30.3537 (1.0)           1;0  1.3988 (0.93)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Current version of this PR:

----------------------------------------------------------------------------------- benchmark: 3 tests -----------------------------------------------------------------------------------
Name (time in ms)          Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_1000x_reads      620.9546 (1.0)      694.4665 (1.0)      656.9467 (1.0)      30.0991 (1.47)     660.8149 (1.0)      50.0408 (1.57)          2;0  1.5222 (1.0)           5           1
test_1000x_writes     624.0394 (1.00)     762.7209 (1.10)     675.4554 (1.03)     53.5257 (2.62)     666.0885 (1.01)     63.9635 (2.01)          1;0  1.4805 (0.97)          5           1
test_standard         668.8180 (1.08)     720.2504 (1.04)     695.0192 (1.06)     20.4068 (1.0)      699.0625 (1.06)     31.8778 (1.0)           2;0  1.4388 (0.95)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

"modulename",
".".join([*test_relpath.parent.parts, test_relpath.stem]),
)
object.__setattr__(self, "methodname", None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this attribute is newly added: it previously wasn't being set on this codepath, and attempting to hash/compare the location values (to put into the queue dictionary) was blowing up when accessing it.

It seems like it was previously not read at all for doc tests?

@noahnu noahnu merged commit 0f6bb55 into syrupy-project:main Jan 13, 2025
15 of 17 checks passed
@noahnu
Copy link
Collaborator

noahnu commented Jan 13, 2025

🎉 This PR is included in version 4.8.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

@huonw
Copy link
Contributor Author

huonw commented Jan 13, 2025

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diff snapshot requires two test runs for new tests and two to 'stabilise' after changes to base snapshot
2 participants