Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coordinate-based coactivation-based parcellation class #533

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Jun 30, 2021

Closes #260. Tagging @DiveicaV in case she wants to look at this.

We are using Chase et al. (2020) as the basis for our general approach- especially the metrics we're using for kernel and order selection.

EDIT: A recommendation from @SBEickhoff is to look at Liu et al. (2020) and Plachti et al. (2019) as well.

To do:

  • Support lists of values for r and n parameters. These correspond to the "filter sizes" in Chase et al. (2020).
  • Determine clustering options
  • Filter size selection step
  • Metric: misclassified voxels
  • Metric: variation of information
  • Metric: silhouette value
  • Metric: percentage of voxels not related to the dominant parent cluster
  • Metric: change in inter- versus intra-cluster distance ratio
  • Refactor to easily support ImageCBP and MAMP with limited code duplication
  • Tests
  • Documentation

Changes proposed in this pull request:

  • Add n option to Dataset.get_studies_by_coordinate().
  • Draft new parcellate module with CoordCBP class.

@tsalo
Copy link
Member Author

tsalo commented Jun 30, 2021

@mriedel56 @62442katieb if possible, I'd love it if you could check out the new class (especially the _fit method, which does the actual CBP) and give your thoughts. So far, I just have the most basic elements of the algorithm implemented, so I still need input on (1) the clustering algorithm options, (2) the metrics to use, and (3) the outputs to save.

Ultimately, I want this class to be fairly basic, meaning not including too many tunable parameters, with some documentation pointing toward cbptools for users who require more control.

Additional questions:

  • Should we run PCA before clustering? From the sklearn clustering user guide:

    in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”). Running a dimensionality reduction algorithm such as Principal component analysis (PCA) prior to k-means clustering can alleviate this problem and speed up the computations.

  • Do we want to leverage sample weights at all? E.g., by weighting by studies' sample sizes?
  • How do we want to structure our outputs? The label maps can go in a standard MetaResult, but we have additional information, like filter selection ranges and metrics, that we probably want to output as well.

nimare/parcellate.py Outdated Show resolved Hide resolved
images = {"labels": labels}
return images

def _filter_selection(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chase 2020:

We implemented a two-step procedure that involved a decision on those filter sizes to be included in the final analysis and subsequently a decision on the optimal cluster solution. In the first step, we examined the consistency of the cluster assignment for the individual voxels across the cluster solutions of the co-occurrence maps performed at different filter sizes. We selected a filter range with the lowest number of deviants, that is, number of voxels that were assigned differently compared with the solution from the majority of filters. In other words, we identified those filter sizes which produced solutions most similar to the consensus-solution across all filter sizes. For example, the proportion of deviants for the second parcellation is illustrated in Figure S1; this shows the borders of the filter range to be used for subsequent steps was based on the Z-scores of the number of deviants.

I interpret this to mean:

  1. Derive mode array of label assignments for each cluster count across filter sizes.
    • I assume this means mode of each voxel determined independently, rather than mode of full set of assignments.
    • What if label numbers don't match? E.g., label 1 in filter size 1 is most similar to label 2 in filter size 2.
    • I assume we should do some kind of synchronization, unless there's some inherent order to KMeans labels?
  2. Count number of voxels that don't match mode for each filter size.
  3. Calculate proportion of deviants in each cluster solution and filter size.
  4. Calculate weighted z-score for each filter size (across cluster solutions) somehow?
    • What is it weighted by?
  5. Select range of filter sizes with lowest z-scores.
    • How? Is there some kind of threshold? Figure S1 grabs range with z-scores < -0.5. No clue if that's a meaningful threshold or something like 2 standard deviations from avg z-score or what.
    • What if there are multiple dips? Does amplitude (z-scores of filters below threshold) or width (number of filters below threshold) matter more?

@codecov
Copy link

codecov bot commented Jul 12, 2021

Codecov Report

Base: 88.55% // Head: 84.29% // Decreases project coverage by -4.26% ⚠️

Coverage data is based on head (0c60dd5) compared to base (e269941).
Patch coverage: 7.89% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
- Coverage   88.55%   84.29%   -4.27%     
==========================================
  Files          38       36       -2     
  Lines        4370     4069     -301     
==========================================
- Hits         3870     3430     -440     
- Misses        500      639     +139     
Impacted Files Coverage Δ
nimare/parcellate.py 0.00% <0.00%> (ø)
nimare/dataset.py 90.33% <100.00%> (+0.37%) ⬆️
nimare/utils.py
nimare/base.py
nimare/__init__.py

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tsalo tsalo added the enhancement New feature or request label Nov 6, 2021
@tsalo tsalo added the parcellate Issues/PRs related to the parcellate module. label Jan 5, 2022
@tsalo tsalo changed the title [ENH] Add coordinate-based coactivation-based parcellation class Add coordinate-based coactivation-based parcellation class Jan 29, 2022
@tsalo tsalo added the help wanted Extra attention is needed label Mar 27, 2022
@adelavega
Copy link
Member

@tsalo
Copy link
Member Author

tsalo commented Apr 20, 2022

@62442katieb has some code from her naturalistic meta-analysis that may implement some of these metrics: https://github.com/62442katieb/meta-analytic-kmeans/blob/daf3904caad990aeadc89bc98769aaed32857e09/evaluating_clustering_solutions.ipynb

@JulioAPeraza JulioAPeraza self-assigned this Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed parcellate Issues/PRs related to the parcellate module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coordinate-based coactivation-based parcellation
3 participants