BUG: wrap median_filter stability #22402

tylerjereddy · 2025-01-26T03:35:31Z

Fixes BUG: median_filter on 1D array crashes with memory corruption using SciPy 1.15.0 #22250
Fixes BUG: ndimage.median_filter: Heap corruption with scipy 1.15 #22325
Fixes BUG: signal.medfilt and ndimage.median_filter not returning correct output #22333
The shims in this branch ~~seem to substantially improve the probability of the new~~ allow test_gh_22250 to pass under high repeat counts with pytest-repeat on Linux. ~~There still seems to be a memory instability that leads to stochastically incorrect results between test incantations though.~~
I wonder if any of the memory issues Warren mentions at BUG: ndimage: C++ memory management issues in _rank_filter_1d.cpp #22365 might be related to the instability (Warren suggests the crashes are not related, but I'm no longer seeing crashes on Linux on this branch). (kind of, Warren mentioned unchecked memory for data, but it was specifically the lack of initialization/zeroing out that was causing algorithmic problems on Linux, separate from the index segfault).

~~[ci skip] [skip cirrus]~~

tylerjereddy · 2025-01-26T03:40:43Z

scipy/ndimage/src/_rank_filter_1d.cpp

+    }
+    else {
+        offset = 0;
+    }


this block is based on an analysis of the indices visited by median_filter in the default reflect mode because of the patterning similarities between the two approaches from the docs:

‘reflect’ d c b a | a b c d | d c b a
‘wrap’ a b c d | a b c d | a b c d

kind of a "flipped" order to boundary handling.

This is a bit of a fishing expedition though.

One point of caution--with the memory initialization issue repaired, I no longer need this patch, though the patch is also not harmful with the current suite of tests that we have.

Objectively that probably means that we should remove this part of the shim. That said, it may also be sensible to expand the testing to include non-zero origin values vs. 1.14.1 and n-dims vs. 1.14.1 because the lack of sensitivity on the indexing here makes me a bit suspicious.

tylerjereddy · 2025-01-26T15:28:54Z

On ARM Mac, python dev.py test -t scipy/ndimage/tests/test_filters.py::test_gh_22250 -- --count=1000 passes on this branch, while on Linux it variably produced 382 failures (all for a filter size of 20 that matches the image itself) and 397 failures (mostly size 20 failures, but 1 filter size 15 failure) over two incantations.

Reverting the patch on MacOS ARM results in segfaults.

That might be a decent clue that memory shenanigans and/or something different compilers can optimize differently might be to blame on this branch rather than a fundamental algorithmic misunderstanding (though I probably should add even more 1.14.1-based test cases given that I'm not operating from a place of deep expertise).

tylerjereddy · 2025-01-26T15:41:09Z

Switching to clang compiler toolchain (FC=gfortran-11 CC=clang-15 CXX=clang++-15) for a clean build on Linux doesn't resolve the test failures. Hmm, so fundamental memory management issue rather than compiler-specific behavior perhaps?

tylerjereddy · 2025-01-26T17:53:58Z

If I link against address sanitizer on Linux and LD_PRELOAD libasan.so, the tests pass reliably on Linux over thousands of repeat incantations. The actual output from the sanitizer seems completely unhelpful though, and it also detects leaks in other completely unrelated test modules.

Nonetheless, IIUC, the preload intercepts and overrides malloc, so that is probably evidence that memory allocation is indeed the remaining issue.

ev-br · 2025-01-26T18:00:38Z

it also detects leaks in other completely unrelated test modules.

Quoting from Tyler in an unrelated thread: Open separate issues for separate issues? :-)

tylerjereddy · 2025-01-26T22:26:15Z

scipy/ndimage/src/_rank_filter_1d.cpp

  Mediator *m = MediatorNew(win_len, rank);
  T *data = new T[win_len];
+  for (int i = 0; i < win_len; ++i) {
+    data[i] = 0;
+  }


After many more hours of debugging, and static analyzers proving useless, this block of code seems to do the trick. data is uninitialized memory, and the treatment of such memory apparently differs on many Linux platforms vs. MacOS, explaining why pytest-repeat was stochastically exhibiting algorithmic failures.

With this additional shim, test_gh_22250 passes with thousands of repeat incantations on my Linux box vs. failing before.

I'll open this PR for review now and see if the CI agrees with me (there are two unrelated test_batch.py failures I see locally on main as well).

There are many open issues about this machinery, so if there is interest in this patch, I'd appreciate suggestions for which additional test(s) you'd like me to add in (from specific issues, multi-dimensional cases as well? etc.).

ev-br

Wow, that was a hard catch!

Confirmed that locally this PR avoids the segfault from gh-22250. The CI seems to agree, modulo an unrelated failure which is safe to ignore.

FWIW, I don't think ndim > 1 tests are essential here, given that the problematic addition of scipy 1.5.1 is specifically a 1D shortcut.

Looking through related issues:

BUG: ndimage.median_filter: Heap corruption with scipy 1.15 #22325 does not have a reproducible example, and is quite likely to be the same, so IMO it's OK to add closes gh-22325 on this PR unless there's evidence to the contrary
Warren's MWE at BUG: signal.medfilt and ndimage.median_filter not returning correct output #22333 (comment) seems to be fixed by this PR
BUG: ndimage: C++ memory management issues in _rank_filter_1d.cpp #22365 highlights several issues which, as Warren notes, are unlikely to be related; I'd agree addressing these in this PR is optional.

All in all, I'd be +1 for landing this as is for 1.5.2, and addressing #22365 in a less hectic pace in the 1.6.0 timeframe.

j-bowhay · 2025-01-31T03:49:14Z

Also possibly closes #22368 and #22333

tylerjereddy · 2025-02-05T21:14:20Z

@WarrenWeckesser opinion?

* Related to scipygh-22250 * The shims in this branch seem to substantially improve the probability of the new `test_gh_22250` to pass under high repeat counts with `pytest-repeat`. There still seems to be a memory instability that leads to stochastically incorrect results between test incantations though. [ci skip] [skip cirrus]

* An uninitialized data structure in `_rank_filter()` C++ code was causing stochastic algorithm failures on Linux but not MacOS, so force a zero-based initialization. This allows `test_gh_22250` to pass over thousands of `pytest-repeat` incantations.

* Add a regression test for scipygh-22333, which reliably passes on this branch with `pytest-repeat` (and fails on `main` with sufficient repeats).

tylerjereddy · 2025-02-13T23:38:38Z

Also possibly closes #22368 and #22333

For gh-22368, I found no evidence that my patch is helpful per analysis at #22368 (comment)

For gh-22333, the patch here indeed resolves the problem on x86_64 Linux per analysis at #22333 (comment). I've adjusted the original comment above accordingly and added the additional regression test for that issue.

I'm somewhat-tempted to consider self-merging this for the 1.15.2 release if this remains free of objections. The biggest objection is my own at #22402 (comment), I hate having code changes that are poorly understood and not even changing the test results like that. If we backed that out, would need to be careful to pytest-repeat-based probe the new regression tests to be sure it wasn't helping.

WarrenWeckesser · 2025-02-14T03:35:08Z

@tylerjereddy, I probably won't be able to get back to this until sometime next week, so don't wait for me if you need to wrap this up sooner.

dschmitz89 · 2025-02-14T10:25:47Z

I can't review but this bug is hitting a lot of people and this seems to resolve it based on the tests. I am +1 for back porting into 1.15.2.

tylerjereddy · 2025-02-16T00:29:48Z

The CI failures are unrelated and covered by the issue Matt opened recently at gh-22534. Given the swath of issues this fixes, the two core developer "approvals," and seriousness of the bugs fixed (segfaults possible on Linux), I think I'll go ahead and squash merge for backporting despite this being a sub-optimal self-merge activity.

I suspect what is going to happen is that the lack of understanding around #22402 (comment) will come back to bite us someday, but I think this at least has support as an incremental improvement we could get more feedback on while preventing some bad results/segfaults.

* An uninitialized data structure in `_rank_filter()` C++ code was causing stochastic algorithm failures on Linux but not MacOS, so force a zero-based initialization. This allows `test_gh_22250` to pass over thousands of `pytest-repeat` incantations. * Add a regression test for scipygh-22333, which reliably passes on this branch with `pytest-repeat` (and fails on `main` with sufficient repeats).

* BUG: immortalize uarray global strings in order to prevent negative refcounts (#22038) #### Reference issue Fixes #21214 Closes #21218 #### What does this implement/fix? This PR addresses the segfaults caused by the occurrence of negative refcounts when uarray static strings were being released when the interpreter didn't exist. --------- Co-authored-by: peterbell10 <peterbell10@live.co.uk> * BUG: special: Fix unchecked memory allocations in `specfun.h` (#22080) * WIP: Fix unchecked memory allocation in aswfa() Summary of the changes: * Add another SF error code, "SF_ERROR_NOMEM", that is used by ufuncs that require internally allocated memory. Just like the other possible errors, how this error is handled can be queried and set using `geterr()` and `seterr()`. Unlike the other error conditions, the default behavior is to raise an exception if a memory allocation fails. * Modify the function `aswfa()` in `specfun.h` to return a status code that indicates when a mempory allocation failed. * Modify the functions `prolate_aswfa_nocv()`, `oblate_aswfa_nocv()`, `prolate_aswfa()` and `oblate_aswfa()` in `sphd_wave.h` to check the return value of `aswfa()`, and set the error state and return values appropriately if `aswfa()` returns `NoMemory`. * MAINT: Rename 'nomem' -> 'memory'. * Fix all uses of unchecked allocations in specfun.h * Fix aswfa() in specfunc.h * Fix refguide check. * Add comment about the return value when nonconvergence is detected in rmn2l. * Change (c/m)alloc calls to use 'new' wrapped in a unique_ptr. * Fix a few more names passed to set_error() * Set outputs to nan in mtu12 when alloc fails Also update docstring-like comments of some C++ functions. * BUG: cluster: `cophenet` intercept invalid linkage matrix count (#22187) * BUG: catch invalid linkage count * Fixes gh-22183. * The input data from gh-22183 causes an out of bounds memory access in a 1-D `memoryview` in the Cython `cophenetic_distances` function. Prevent this by enforcing a check for an allowable upper bound on the cluster membership (4th column) of the linkage matrix `Z` received by `is_valid_linkage`. Co-authored-by: Warren Weckesser <warren.weckesser@gmail.com> * TST: PR 22187 revisions * Adjust `test_gh_22183()` such that the linkage matrix is invalid only because of an excessive observation count, rather than also being invalid because the large observation count is not integral. * Adjust `test_gh_22183()` such that is properly flushes through array API backends. * DOC: backticks --------- Co-authored-by: Warren Weckesser <warren.weckesser@gmail.com> Co-authored-by: Jake Bowhay <60778417+j-bowhay@users.noreply.github.com> * MAINT: stats.Mixture: make return type consistent * MAINT: stats.Mixture: fix inverse functions when mean is undefined (#22337) * MAINT: stats.Mixture: fix inverse when mean is undefined * Apply suggestions from code review * Update scipy/stats/_distribution_infrastructure.py * BUG: special: Fix unchecked malloc in stirling2.h (#22339) Use std::unique_ptr and new (std::nothrow) instead of malloc. Closes gh-22336. * BUG: sparse: fix selecting wrong dtype for coo coords (#22353) * Fixes bug with selecting wrong dtype for coo coords * add test for reshape having wrong dtype * test that smaller dtype is maintained even if intermediate values are big --------- Co-authored-by: Dan Schult <dschult@colgate.edu> * TST: interpolate: small tolerance bump to TestAAA.test_basic_functions * MAINT: Stop installing libhighs Co-authored-by: drew-parsons <drew-parsons@users.noreply.github.com> Closes #22349 * BUG: sparse.linalg.norm: fix return type (#22372) cupy reports cupy/cupy#8866 that sparse.linalg.norm returns a python number instead of a 0-dim numpy array * revert NotImplemented return values in dot & others (#22373) * DOC: sparse.linalg: add two recent functions to namespace and fix doctests (#22374) * fix doctests and make two recent functions visible * update refguide [docs only] * fix capitalization [docs only] Co-authored-by: Daniel Schmitz <40656107+dschmitz89@users.noreply.github.com> --------- Co-authored-by: Daniel Schmitz <40656107+dschmitz89@users.noreply.github.com> * MAINT: stats.zmap: restore support for complex data (#22405) * MAINT: stats.zmap: restore support for complex data * MAINT: stats.zmap: fix masked array support * Apply suggestions from code review * BUG: scipy.spatial: Fix inaccurate orthonormalization in `Rotation.from_matrix()` (#22418) * BUG: fix inaccurate orthonormalization in Rotation.from_matrix() * Code review updates --------- Co-authored-by: Scott Shambaugh <scottshambaugh@users.noreply.github.com> * BUG: special: Fix a memory leak in the AMOS function besy(). (#22423) Use a unique_ptr to manage the heap-allocated array. Closes gh-22314. * CI: build Linux aarch64 wheels on GitHub Actions instead of Cirrus CI This unifies the wheels builds, which is a lot easier on maintenance. In the process, did the following: - update to setup-miniconda 3.1.1. for GHA Linux arm64 runner support - add `conda-remove-defaults` to `setup-miniconda` * CI: migrate Linux aarch64 job to GitHub Actions * CI: remove Cirrus CI configuration [wheel build] * BUG: io.loadmat: throw error on file containing all zeros (#22447) [ci skip] * BUG: PR 22471 revisions * Fixes issues related to backport manual merge conflict resolution during the preparation of SciPy `1.15.2`. * Restore an accidentally-removed size-0 array check in `_contains_nan()` function--this allows SciPy `stats` tests to pass again on the maintenance branch. * Remove some accidentally-added tests in `test_mio.py` [skip circle] * BUG: sparse: fix spmatrix indexing with None and implicit padding to match matrix behavior (#22472) * fix index using None when index is padded with : * align None indexing for spmatrix with np.matrix behavior. And tests. * rewrite spmatrix handling to ease reading * MAINT: pearsonr SIMD-related shim * Fixes #22479. * This small patch allows `test_stats.py::TestRegression::test_regressZEROX` to pass on x86_64 Linux instead of failing due to an extra division by zero warning over a fairly narrow range of NumPy versions near `1.25.2` where SIMD implementation details appear to have been slightly different. [skip circle] * DOC: PR 22471 revisions * Update the SciPy `1.15.2` release notes following backport activity. * BUG: wrap median_filter stability (#22402) * An uninitialized data structure in `_rank_filter()` C++ code was causing stochastic algorithm failures on Linux but not MacOS, so force a zero-based initialization. This allows `test_gh_22250` to pass over thousands of `pytest-repeat` incantations. * Add a regression test for gh-22333, which reliably passes on this branch with `pytest-repeat` (and fails on `main` with sufficient repeats). * MAINT: integrate.cumulative_simpson: bump test tolerance * do not check dtype in test_compare_with_GCVSPL * CI: PR 22471 revisions * temporarily disable testing against `meson` `master` branch in CI because of gh-22534. * DOC: PR 22471 revisions * Update the SciPy `1.15.2` release notes following additional backport activity. * CI: PR 22471 wheel builds [wheel build] * This is an empty commit to test the wheel build matrix in PR 22471. [wheel build] --------- Co-authored-by: Edgar Andrés Margffoy Tuay <andfoy@gmail.com> Co-authored-by: peterbell10 <peterbell10@live.co.uk> Co-authored-by: Warren Weckesser <warren.weckesser@gmail.com> Co-authored-by: Jake Bowhay <60778417+j-bowhay@users.noreply.github.com> Co-authored-by: Matt Haberland <mhaberla@calpoly.edu> Co-authored-by: Parth Nobel <parthnobel@berkeley.edu> Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Rohit Goswami <rog32@hi.is> Co-authored-by: Daniel Schmitz <40656107+dschmitz89@users.noreply.github.com> Co-authored-by: Scott Shambaugh <14363975+scottshambaugh@users.noreply.github.com> Co-authored-by: Scott Shambaugh <scottshambaugh@users.noreply.github.com> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Matthew Brett <matthew.brett@gmail.com> Co-authored-by: Charles Bousseau <cbousseau@anaconda.com>

tylerjereddy added the needs-work label Jan 26, 2025

github-actions bot added scipy.ndimage C/C++ defect labels Jan 26, 2025

tylerjereddy commented Jan 26, 2025

View reviewed changes

tylerjereddy changed the title ~~WIP, BUG: wrap median_filter stability~~ BUG: wrap median_filter stability Jan 26, 2025

tylerjereddy added this to the 1.16.0 milestone Jan 26, 2025

tylerjereddy commented Jan 26, 2025

View reviewed changes

tylerjereddy mentioned this pull request Jan 26, 2025

DEV: asan experience with pytest a bit messy on Linux #22409

Open

ev-br approved these changes Jan 27, 2025

View reviewed changes

tylerjereddy removed the needs-work label Jan 29, 2025

tylerjereddy added the backport-candidate label Feb 5, 2025

tylerjereddy mentioned this pull request Feb 5, 2025

MAINT: backports for 1.15.2 #22471

Merged

7 tasks

tylerjereddy added 2 commits February 13, 2025 15:48

BUG: PR 22402 revisions

a432554

* An uninitialized data structure in `_rank_filter()` C++ code was causing stochastic algorithm failures on Linux but not MacOS, so force a zero-based initialization. This allows `test_gh_22250` to pass over thousands of `pytest-repeat` incantations.

tylerjereddy mentioned this pull request Feb 13, 2025

BUG: signal filter functions raises TypeError when np.int64 dtype inputs are given #22368

Open

TST: PR 22402 revisions

2ddcd59

* Add a regression test for scipygh-22333, which reliably passes on this branch with `pytest-repeat` (and fails on `main` with sufficient repeats).

tylerjereddy force-pushed the treddy_issue_22250 branch from 915c94f to 2ddcd59 Compare February 13, 2025 23:30

tylerjereddy merged commit 97bbf4e into scipy:main Feb 16, 2025
37 of 39 checks passed

tylerjereddy deleted the treddy_issue_22250 branch February 16, 2025 00:31

tylerjereddy modified the milestones: 1.16.0, 1.15.2 Feb 16, 2025

tylerjereddy removed the backport-candidate label Feb 16, 2025

tylerjereddy mentioned this pull request Feb 24, 2025

BUG: ndimage.median_filter: additional hard crashes #22586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

BUG: wrap median_filter stability #22402

BUG: wrap median_filter stability #22402

tylerjereddy commented Jan 26, 2025 •

edited

Loading

tylerjereddy Jan 26, 2025

tylerjereddy Jan 26, 2025

tylerjereddy commented Jan 26, 2025

tylerjereddy commented Jan 26, 2025

tylerjereddy commented Jan 26, 2025

ev-br commented Jan 26, 2025

tylerjereddy Jan 26, 2025 •

edited

Loading

ev-br left a comment

j-bowhay commented Jan 31, 2025

tylerjereddy commented Feb 5, 2025

tylerjereddy commented Feb 13, 2025 •

edited

Loading

WarrenWeckesser commented Feb 14, 2025

dschmitz89 commented Feb 14, 2025

tylerjereddy commented Feb 16, 2025

BUG: wrap median_filter stability #22402

BUG: wrap median_filter stability #22402

Conversation

tylerjereddy commented Jan 26, 2025 • edited Loading

tylerjereddy Jan 26, 2025

Choose a reason for hiding this comment

tylerjereddy Jan 26, 2025

Choose a reason for hiding this comment

tylerjereddy commented Jan 26, 2025

tylerjereddy commented Jan 26, 2025

tylerjereddy commented Jan 26, 2025

ev-br commented Jan 26, 2025

tylerjereddy Jan 26, 2025 • edited Loading

Choose a reason for hiding this comment

ev-br left a comment

Choose a reason for hiding this comment

j-bowhay commented Jan 31, 2025

tylerjereddy commented Feb 5, 2025

tylerjereddy commented Feb 13, 2025 • edited Loading

WarrenWeckesser commented Feb 14, 2025

dschmitz89 commented Feb 14, 2025

tylerjereddy commented Feb 16, 2025

tylerjereddy commented Jan 26, 2025 •

edited

Loading

tylerjereddy Jan 26, 2025 •

edited

Loading

tylerjereddy commented Feb 13, 2025 •

edited

Loading