Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medianIndex/quantileIndex doesn’t handle missing data #275

Closed
mbostock opened this issue May 30, 2023 · 3 comments · Fixed by #276
Closed

medianIndex/quantileIndex doesn’t handle missing data #275

mbostock opened this issue May 30, 2023 · 3 comments · Fixed by #276
Assignees
Labels
bug Something isn’t working

Comments

@mbostock
Copy link
Member

mbostock commented May 30, 2023

d3.medianIndex returns one less than the index of the median element. For example:

penguins[d3.medianIndex(penguins, (d) => d.body_mass_g)].body_mass_g // 3600
penguins[d3.medianIndex(penguins, (d) => d.body_mass_g) + 1].body_mass_g // 4050

I don’t think this is the intended behavior, even if we were trying to chose the “lower” element in the case where the array length is even. I expect it to return the index of the median value instead.

d3.median(penguins, (d) => d.body_mass_g) // 4050

Previously #140 #159.

@mbostock mbostock added the bug Something isn’t working label May 30, 2023
@mbostock
Copy link
Member Author

It also seems to returning the last index if there are multiple elements with the median value?

{
  const sorted = d3.sort(penguins, (d) => d.body_mass_g);
  const i = d3.medianIndex(sorted, (d) => d.body_mass_g);
  return sorted[i - 5].body_mass_g; // 4050
}

@mbostock
Copy link
Member Author

Ah ha. The problem is that we’re dropping undefined values in the initial filter:

values = Float64Array.from(numbers(values, valueof));

This means that the indexes are indexes into the filtered array, and hence are wrong. If we pre-filter the data the correct result is returned:

{
  const filtered = penguins.filter((d) => d.body_mass_g);
  const i = d3.medianIndex(filtered, (d) => d.body_mass_g);
  return filtered[i];
}

Although, we are still returning the last index of equivalent values (it appears), and I think it would be preferable to return the first. But, any of those would be correct as long as we document it.

@mbostock mbostock changed the title medianIndex/quantileIndex is off by one medianIndex/quantileIndex is wrong May 30, 2023
@mbostock mbostock self-assigned this May 30, 2023
@mbostock mbostock changed the title medianIndex/quantileIndex is wrong medianIndex/quantileIndex doesn’t handle missing data May 30, 2023
@Fil
Copy link
Member

Fil commented May 30, 2023

ouch!

  const a = [1, 1, 1, 2, 2, undefined, 2, 3];
  a[d3.medianIndex(a)]; // undefined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn’t working
Development

Successfully merging a pull request may close this issue.

2 participants