Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: value_counts returning incorrect dtype for string dtype #55627

Merged
merged 8 commits into from Oct 25, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Oct 22, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@@ -2855,7 +2859,9 @@ def _value_counts(
result_series.name = name
result_series.index = index.set_names(range(len(columns)))
result_frame = result_series.reset_index()
result_frame.columns = columns + [name]
result_frame.columns = Index(
columns + [name], dtype=self.grouper.groupings[0].obj.columns.dtype
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach is there a better way of getting the dtype of the original column index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.obj.columns.dtype I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a Series in SeriesGroupBy unfortunately

@mroeschke mroeschke added Groupby Strings String extension data type and string data labels Oct 23, 2023
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - I think the size change could use a test.

@phofl
Copy link
Member Author

phofl commented Oct 24, 2023

added a test

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - just missing the GH#

],
)
def test_size_strings(dtype):
df = DataFrame({"a": ["a", "a", "b"], "b": "a"}, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GH#

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions labels Oct 25, 2023
@rhshadrach rhshadrach added this to the 2.2 milestone Oct 25, 2023
@rhshadrach rhshadrach merged commit 4412800 into pandas-dev:main Oct 25, 2023
49 of 51 checks passed
@rhshadrach
Copy link
Member

Thanks @phofl

@phofl phofl deleted the value_counts_groupby branch October 25, 2023 10:47
@phofl
Copy link
Member Author

phofl commented Oct 25, 2023

Sorry for the confusion, this should go into 2.1.2 (it#s only really relevant if you use the new option)

@phofl
Copy link
Member Author

phofl commented Oct 25, 2023

@meeseeksdev backport 2.1.x

@phofl phofl modified the milestones: 2.2, 2.1.2 Oct 25, 2023
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 25, 2023
mroeschke pushed a commit that referenced this pull request Oct 25, 2023
…rect dtype for string dtype) (#55682)

Backport PR #55627: BUG: value_counts returning incorrect dtype for string dtype

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants