Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pandas-2.x support in cudf #14916

Merged
merged 246 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from 226 commits
Commits
Show all changes
246 commits
Select commit Hold shift + click to select a range
14f54ac
Update value_counts with new behavior (#12835)
galipremsagar Feb 23, 2023
fc33639
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Feb 24, 2023
395fa58
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Feb 24, 2023
7d62d4e
Drop inplace parameter in categorical methods (#12846)
galipremsagar Feb 24, 2023
d1b1ea8
[REVIEW] Raise error when `numeric_only=True` for non-numeric Series …
galipremsagar Feb 24, 2023
9f2b0c2
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Feb 25, 2023
9ef1b37
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Feb 27, 2023
ae00532
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Feb 27, 2023
6317733
Drop is_monotonic (#12853)
galipremsagar Feb 28, 2023
fd0c1dd
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Feb 28, 2023
10ea515
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 6, 2023
ea2099c
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 6, 2023
25b87f0
merge
galipremsagar Mar 7, 2023
5af0583
[REVIEW] Drop `datetime_is_numeric` parameter from `describe` (#12890)
galipremsagar Mar 8, 2023
531f52c
Drop `names`, `dtype` in `Index.copy` and `dtype`, `levels`, `codes` …
galipremsagar Mar 8, 2023
620e35f
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 8, 2023
7ec76b7
Drop `kind` parameter from `Index.get_slice_bound` (#12856)
galipremsagar Mar 8, 2023
cecf651
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 9, 2023
58b9acb
[REVIEW] Update `numeric_only` behavior in reduction APIs (#12847)
galipremsagar Mar 10, 2023
99a4148
merge
galipremsagar Mar 10, 2023
e115ba5
[REVIEW] Drop `DataFrame.append` and `Series.append` (#12839)
galipremsagar Mar 10, 2023
620483b
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 10, 2023
dc1b813
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 13, 2023
4a87cbd
Drop `na_sentinel` from `factorize` (#12924)
galipremsagar Mar 13, 2023
c0ab786
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar Mar 13, 2023
61843ed
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 13, 2023
d1377a5
Add information about `Index.is_*` method deprecation (#12909)
galipremsagar Mar 13, 2023
55de5a4
Merge remote-tracking branch 'upstream/branch-23.04' into pandas_2.0_…
galipremsagar Mar 16, 2023
30f6f8e
Merge branch 'pandas_2.0_feature_branch' of https://github.com/rapids…
galipremsagar Mar 16, 2023
e42619b
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Mar 28, 2023
0f3172f
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Mar 29, 2023
48c1016
[REVIEW] Miscellaneous pytest fixes for pandas-2.0 (#12962)
galipremsagar Mar 31, 2023
ddf8996
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 3, 2023
6bbcc23
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 4, 2023
dd15a19
Add get_indexer
galipremsagar Apr 7, 2023
6dce4ef
Fix ufunc tests (#13083)
galipremsagar Apr 7, 2023
192e204
[REVIEW] datetime and timedelta improvements (#12934)
galipremsagar Apr 7, 2023
1621fcb
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 7, 2023
60c257a
Fix MultiIndex construction in pandas 2.0 (#13092)
galipremsagar Apr 7, 2023
f472c0d
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 11, 2023
be19968
[REVIEW] Enable `numeric_only` for row-wise ops (#13090)
galipremsagar Apr 14, 2023
095f17b
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 14, 2023
8ff4861
[REVIEW] Fix `DataFrame.__getitem__` to work with `pandas-2.0` (#13139)
galipremsagar Apr 15, 2023
8a41a7f
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 17, 2023
bd38d70
Drop backfill and pad in GroupBy (#13156)
galipremsagar Apr 18, 2023
81565cf
[REVIEW] Add `no_default` and adapt `Series.reset_index` to different…
galipremsagar Apr 18, 2023
199787d
Fix `is_string_dtype` to adapt to `pandas-2.0` changes (#13141)
galipremsagar Apr 18, 2023
27d2a75
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 18, 2023
47492da
Handle pandas warnings for pad and backfill (#13168)
galipremsagar Apr 19, 2023
fbe1848
[REVIEW] Fix datetime pytests & raise errors for timezone un-aware ty…
galipremsagar Apr 19, 2023
352753a
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 19, 2023
a31d62c
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar Apr 19, 2023
615828d
[REVIEW] Fix pytests where empty column indexes are compared (#13166)
galipremsagar Apr 19, 2023
8e8a1ea
[REVIEW] Raise error when there is a binary operation between certain…
galipremsagar Apr 19, 2023
69af242
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar Apr 19, 2023
901a971
Fix `datetime64` related inconsistencies in pytests (#13175)
galipremsagar Apr 20, 2023
e7ddc69
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 20, 2023
8a13810
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 20, 2023
31e08c9
Fix `DataFrame.describe` pytests (#13191)
galipremsagar Apr 20, 2023
b772017
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 21, 2023
27e18c8
Change default `dtype` for `get_dummies` to `bool` (#13174)
galipremsagar Apr 22, 2023
6a86385
[REVIEW] Update parameter ordering in `DataFrame.pivot` (#13190)
galipremsagar Apr 22, 2023
00f61cd
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 24, 2023
e4500a6
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Apr 25, 2023
4fe23bb
Merge branch 'pandas_2.0_feature_branch' into get_indexer_2.0
galipremsagar Apr 25, 2023
ea7d18c
Fix ceil, floor and round pytests (#13218)
galipremsagar Apr 26, 2023
e355ba4
More implementation for get_indexer
galipremsagar Apr 27, 2023
569b3e7
Fix `kurtosis` pytests to support `numeric_only` parameter (#13217)
galipremsagar Apr 27, 2023
bbc84f6
Fix parquet pytests errors with pandas-2.0 (#13216)
galipremsagar Apr 27, 2023
34aa2c5
merge
galipremsagar Apr 27, 2023
38fbdf5
Merge branch 'pandas_2.0_feature_branch' into get_indexer_2.0
galipremsagar Apr 27, 2023
71a5a88
Merge
galipremsagar May 15, 2023
3b679e5
merge
galipremsagar May 16, 2023
b057436
merge
galipremsagar May 17, 2023
97b1642
Merge
galipremsagar May 17, 2023
e47e5c0
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar May 22, 2023
97c0eee
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar May 23, 2023
3a85f64
Fix csv reader pytest & MultiIndex docstring (#13417)
galipremsagar May 23, 2023
6bfbfe3
Merge
galipremsagar May 23, 2023
1fd2c91
Merge branch 'branch-23.06' into pandas_2.0_feature_branch
galipremsagar May 24, 2023
7b13714
Merge
galipremsagar May 25, 2023
c1e78b9
Deprecate `Groupby.dtypes` (#13453)
galipremsagar May 26, 2023
2dafcfc
Enforce Groupby.__iter__ deprecation and miscellaneous pytest fixes (…
galipremsagar May 26, 2023
16c987e
Preserve Index and grouped columns in `Groupby.nth` (#13442)
galipremsagar May 30, 2023
258bf3d
`Index` class deprecation enforcements (#13204)
galipremsagar May 30, 2023
bb1c8d5
Merge branch 'pandas_2.0_feature_branch' into get_indexer_2.0
galipremsagar May 30, 2023
72a663e
Fix MultiIndex.get_indexer pytest
galipremsagar May 30, 2023
8791749
Complete get_indexer implementation
galipremsagar May 30, 2023
ac39341
Update docs
galipremsagar May 30, 2023
a92ad86
Fix parquet paritioning pytest failures (#13474)
galipremsagar May 31, 2023
0b81bd6
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar May 31, 2023
f56ea26
Merge remote-tracking branch 'upstream/branch-23.06' into pandas_2.0_…
galipremsagar Jun 2, 2023
63b8fb1
Enforce merge validation deprecation (#13499)
galipremsagar Jun 3, 2023
139e32d
Enable `sort=True` for `Index.union`, `Index.difference` and `Index.i…
galipremsagar Jun 3, 2023
a6869e8
Fix a groupby pytest related to numeric_only (#13496)
galipremsagar Jun 3, 2023
6001bbf
Drop special handling of `min_periods` for `Rolling.count` (#13483)
galipremsagar Jun 3, 2023
4416a24
Fix JSON pytests (#13476)
galipremsagar Jun 3, 2023
d6324d1
Fixed strings
galipremsagar Jun 6, 2023
361e96e
Fix `DataFrame.mode` pytest (#13500)
galipremsagar Jun 6, 2023
8e4b448
Merge branch 'pandas_2.0_feature_branch' into get_indexer_2.0
galipremsagar Jun 6, 2023
8bf7b04
Address first round of reviews
galipremsagar Jun 6, 2023
0dc0a3d
annotate
galipremsagar Jun 7, 2023
261f594
Fix issues
galipremsagar Jun 7, 2023
dc08ef0
Switch to outer inner
galipremsagar Jun 7, 2023
fbd6d12
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jun 7, 2023
9760d57
Merge branch 'pandas_2.0_feature_branch' into get_indexer_2.0
galipremsagar Jun 7, 2023
2f30179
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jun 7, 2023
20ed443
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jun 8, 2023
b47ddc5
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jun 8, 2023
41d2c6c
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jun 9, 2023
6d0c3a4
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar Jun 9, 2023
d3d1780
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jun 12, 2023
4289ef4
Fix `dask_cudf` pytest failures for `pandas-2.0` upgrade (#13548)
galipremsagar Jun 13, 2023
e7eb1d3
simplify
galipremsagar Jun 13, 2023
fb99b0a
Enable writing column names with mixed dtype in parquet writer when `…
galipremsagar Jun 14, 2023
b69fb13
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
galipremsagar Jun 20, 2023
2488d91
address reviews
galipremsagar Jun 20, 2023
7f216cf
fix
galipremsagar Jun 20, 2023
13d62c5
simplify
galipremsagar Jun 20, 2023
a1f5581
Merge pull request #13234 from galipremsagar/get_indexer_2.0
wence- Jun 21, 2023
fdac177
merge
galipremsagar Jun 21, 2023
fa2f0da
Merge
galipremsagar Jul 11, 2023
85846be
Merge remote-tracking branch 'upstream/branch-23.08' into pandas_2.0_…
galipremsagar Jul 14, 2023
bc9b7b2
Merge
galipremsagar Jul 18, 2023
ca463a7
merge
galipremsagar Jul 31, 2023
273945b
Fix default behavior of index metaclass instance and subclass checks …
vyasr Jul 31, 2023
10e5459
merge
galipremsagar Aug 17, 2023
22db02f
Merge branch 'pandas_2.0_feature_branch' of https://github.com/rapids…
galipremsagar Aug 17, 2023
de76328
Merge
galipremsagar Oct 11, 2023
db92536
merge fix
galipremsagar Oct 11, 2023
fc6a30f
Handle PandasArray renaming
galipremsagar Oct 12, 2023
ad3ae65
Deprecate `is_categorical_dtype` (#14274)
galipremsagar Oct 12, 2023
7c6d8f2
Deprecate is_interval_dtype and is_datetime64tz_dtype (#14275)
galipremsagar Oct 12, 2023
2461315
Deprecate `method` in `fillna` API (#14278)
galipremsagar Oct 13, 2023
88e1978
Merge
galipremsagar Nov 9, 2023
90788f2
Deprecate `fill_method` and `limit` in `pct_change` APIs (#14277)
galipremsagar Nov 29, 2023
b15e438
Merge
galipremsagar Nov 29, 2023
c01a8db
Merge
galipremsagar Dec 2, 2023
c51444f
Replace PandasArray with NumpyExtensionArray (#14549)
galipremsagar Dec 5, 2023
e04b88b
Fix copy creation of a columnAccessor (#14551)
galipremsagar Dec 5, 2023
29b3ac8
Fix to_pandas calls (#14552)
galipremsagar Dec 5, 2023
19952eb
Add missing `is_categorical_dtype` to `cudf.api.types` namespace (#14…
galipremsagar Dec 5, 2023
ac07b3d
Fix name in Index.difference (#14556)
galipremsagar Dec 5, 2023
2bdd8b8
Filter deprecation warning in `ffill` and `bfill` APIs (#14554)
galipremsagar Dec 5, 2023
a068b10
Fix typo in value_counts (#14550)
galipremsagar Dec 5, 2023
ccfbe71
Enforce `Index.to_frame` deprecations (#14553)
galipremsagar Dec 5, 2023
9b478b0
Deprecate DataFrame.applymap and use map instead (#14579)
galipremsagar Dec 6, 2023
0e83e20
Deprecate first and last (#14583)
galipremsagar Dec 7, 2023
5f3ecd6
Fix CategoricalDtype docstring (#14622)
galipremsagar Dec 13, 2023
72221b3
Fix `DataFrame.sort_index` when a index is a `MultiIndex` (#14621)
galipremsagar Dec 13, 2023
d7dc16e
Deprecate reading literal string in cudf.read_json (#14619)
galipremsagar Dec 13, 2023
eea5f10
Preserve column ordering in DataFrame.stack (#14626)
galipremsagar Dec 16, 2023
bc5584b
Change `is_.._dtype` deprecations to `DeprecationWarning` instead of …
galipremsagar Dec 18, 2023
194e487
Version dataframe.mode pytest (#14650)
galipremsagar Dec 19, 2023
f736d72
Filter ufunc related warnings in pytests (#14652)
galipremsagar Dec 19, 2023
4539f4f
Deprecate positional access for label based indexes in Series.__getit…
galipremsagar Dec 19, 2023
c1411b6
Deprecate `method` in `interpolate` and calculation on `object` dtype…
galipremsagar Dec 27, 2023
2b9ab53
Add more validation to MultiIndex.to_frame (#14671)
galipremsagar Dec 27, 2023
46ef148
Deprecate ignoring empty objects in concat (#14672)
galipremsagar Dec 27, 2023
e218f5c
Deprecate setting of incompatible dtypes to an existing column (#14668)
galipremsagar Dec 27, 2023
fd1f986
Fix datetime related assertions and warnings in pytests (#14673)
galipremsagar Dec 27, 2023
cb09a39
Fix pytest condition to include more warning scenarios (#14680)
galipremsagar Dec 29, 2023
1c54354
Sort `Index.difference` & `union` results for early exit scenarios (#…
galipremsagar Dec 29, 2023
8a8b627
Fix column parameter handling in `read_orc` (#14666)
galipremsagar Dec 30, 2023
3344377
Handle missing warning assertions for concat pytests (#14682)
galipremsagar Dec 30, 2023
eabba98
Fix a typo error in where pytest (#14683)
galipremsagar Dec 30, 2023
bcdeb19
Change empty column dtype to `string` from `float64` (#14691)
galipremsagar Jan 8, 2024
50cf007
Fix merge
galipremsagar Jan 20, 2024
6bcaf44
Preserve empty index types in parquet reader (#14818)
galipremsagar Jan 23, 2024
bdbf0bc
Fix `Dataframe.agg` to not return incorrect dtypes (#14851)
galipremsagar Jan 24, 2024
28b1814
Catch warnings in reductions (#14852)
galipremsagar Jan 24, 2024
df5c78b
Catch groupby jit apply warnings (#14858)
galipremsagar Jan 24, 2024
8784551
Fix all reduction pytest failures (#14869)
galipremsagar Jan 24, 2024
d7f9688
Fix empty groupby return types (#14871)
shwina Jan 25, 2024
8a25f70
Support kurt/skew(axis=None) for multi columns/low row count (#14874)
mroeschke Jan 25, 2024
7bf4376
Fix miscellaneous failures in pytests (#14879)
galipremsagar Jan 25, 2024
d83f12e
Preserve columns dtype in dataframe constructor (#14878)
galipremsagar Jan 25, 2024
8db3b70
Disable style check
vyasr Jan 25, 2024
4b5b8af
Pin pandas
vyasr Jan 25, 2024
d2cc4db
Disable some more jobs
vyasr Jan 25, 2024
32e0982
Actually remove the jobs
vyasr Jan 25, 2024
302c876
Unpin numpy<1.25
mroeschke Jan 26, 2024
0b79d70
Merge pull request #14890 from vyasr/feat/enable_ci
vyasr Jan 26, 2024
2b07cd1
Merge remote-tracking branch 'upstream/branch-24.04' into pandas_2.0_…
vyasr Jan 26, 2024
80090d7
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
mroeschke Jan 26, 2024
481ea9c
Remove pandas shim and use result_type
mroeschke Jan 26, 2024
4444909
FIx more miscellaneous pytests failures (#14895)
galipremsagar Jan 26, 2024
23d189b
Fix some pytests (#14894)
mroeschke Jan 26, 2024
7df96e7
Align datetimeindex slicing behaviour with Pandas 2.x (#14887)
shwina Jan 26, 2024
6368c47
Merge branch 'chore/merge_2404' into pandas_2.0_feature_branch
galipremsagar Jan 26, 2024
87a4d12
Deprecations in replace
galipremsagar Jan 26, 2024
7d3e72a
Parquet Writer: Write `non-string` columns pandas-compatibility mode …
galipremsagar Jan 26, 2024
b662093
Merge remote-tracking branch 'upstream/pandas_2.0_feature_branch' int…
mroeschke Jan 26, 2024
b61b39d
Use sets for argument checking.
bdice Jan 26, 2024
1256825
Merge pull request #14900 from galipremsagar/replace_deprecations
vyasr Jan 26, 2024
78eff48
Fix usage
vyasr Jan 26, 2024
2ff132e
Merge pull request #14892 from mroeschke/deps/numpy/unpin
mroeschke Jan 26, 2024
cb1889b
Merge pull request #14903 from vyasr/fix/revert_incorrect_set_usage
vyasr Jan 26, 2024
5618d3d
Remove pandas Index subclasses in cudf pandas (#14902)
mroeschke Jan 27, 2024
d8df8e4
Allow `any` and `all` only for all-`NA` and empty string columns (#14…
galipremsagar Jan 28, 2024
9fa9dc5
Prevent converting strings to arrow strings in `dask_cudf` pytests (#…
galipremsagar Jan 29, 2024
f7b0bf6
Merge remote-tracking branch 'upstream/branch-24.04' into pandas_2.0_…
galipremsagar Jan 29, 2024
784fe95
Enable full CI
galipremsagar Jan 29, 2024
51e42c1
Fix spacings
galipremsagar Jan 29, 2024
eae873e
Update pr.yaml
galipremsagar Jan 29, 2024
dbf08cb
Fix style issues in 2.0 feature branch (#14918)
galipremsagar Jan 29, 2024
bf49a66
Merge remote-tracking branch 'upstream/branch-24.04' into pandas_2.0_…
galipremsagar Jan 29, 2024
e74fe0a
Remove gated xfails (#14905)
vyasr Jan 29, 2024
f69ae1d
Add `Groupby.indices` property and deprecate `obj` in `get_group` (#1…
galipremsagar Jan 29, 2024
fc790ab
Change pandas version range (#14919)
galipremsagar Jan 29, 2024
2a2c9c8
Merge remote-tracking branch 'upstream/branch-24.04' into pandas_2.0_…
galipremsagar Jan 29, 2024
5abe6b5
Fix custreamz pytests to test on float64 types (#14925)
galipremsagar Jan 29, 2024
eb957d9
Revert unnecessary copyright changes
vyasr Jan 29, 2024
7f7e237
Undo a few incorrect copyright fixes
vyasr Jan 30, 2024
c635335
Remove pandas 1.3, 1.4 checks (#14927)
mroeschke Jan 30, 2024
adcd7e9
Apply suggestions from code review
galipremsagar Jan 30, 2024
86a4068
Allow hash_array to be findable in pandas 2.0; add workaround for tes…
mroeschke Jan 30, 2024
92b6472
Remove pandas 1.5 checks (#14928)
mroeschke Jan 30, 2024
601ce8f
Merge remote-tracking branch 'upstream/branch-24.04' into pandas_2.0_…
galipremsagar Jan 30, 2024
132978f
Address all remaining reviews
galipremsagar Jan 30, 2024
30f873d
Address all dask_cudf reviews
galipremsagar Jan 30, 2024
2b05b59
Fix custreamz pytests to test on float64 types (#14934)
galipremsagar Jan 30, 2024
2e30753
Remaining custreamz test fix
galipremsagar Jan 30, 2024
1937252
Remove missing docstrings
galipremsagar Jan 30, 2024
6d07cc2
Fix another custreamz test
galipremsagar Jan 30, 2024
71d87d5
Add back reftarget change for cudf.Index
mroeschke Jan 30, 2024
3438af0
Revert "Add back reftarget change for cudf.Index"
vyasr Jan 30, 2024
ffa473e
Move abs to IndexedFrame
vyasr Jan 30, 2024
abcd15d
Move head and tail to IndexedFrame
vyasr Jan 30, 2024
50d287f
Move isnull (alias) to IndexedFrame
vyasr Jan 30, 2024
0013faa
Move kurtosis and skew to IndexedFrame
vyasr Jan 30, 2024
11ab9e8
Move mask to IndexedFrame
vyasr Jan 30, 2024
2563b90
Move various reductions to IndexedFrame
vyasr Jan 30, 2024
9716f52
Move nans_to_nulls to IndexedFrame
vyasr Jan 30, 2024
fdf31e3
Move rolling to IndexedFrame
vyasr Jan 30, 2024
0bcdb2d
Move notnull (alias) to IndexedFrame
vyasr Jan 30, 2024
ea7ebfb
Move pipe to IndexedFrame
vyasr Jan 30, 2024
7b0bcde
Move conversion functions
vyasr Jan 30, 2024
28548f6
Add missing methods to the docs
vyasr Jan 30, 2024
59af57d
Add isnull and notnull to index docs
vyasr Jan 30, 2024
c6f5392
Revert "Move isnull (alias) to IndexedFrame"
vyasr Jan 30, 2024
6301538
Revert "Move notnull (alias) to IndexedFrame"
vyasr Jan 30, 2024
a95bc6a
Make sure str works even if to_string does not
vyasr Jan 30, 2024
4f0563d
Remove tests of now unsupported reductions
vyasr Jan 30, 2024
07e9872
Address feedback
vyasr Jan 30, 2024
f281b90
Merge pull request #14937 from vyasr/fix/doc_errors
vyasr Jan 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ repos:
entry: '(category=|\s)DeprecationWarning[,)]'
language: pygrep
types_or: [python, cython]
# We need to exclude just the following file because few APIs still need
# DeprecationWarning: https://github.com/pandas-dev/pandas/issues/54970
exclude: |
(?x)^(
^python/cudf/cudf/core/dtypes.py
)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
- id: no-programmatic-xfail
name: no-programmatic-xfail
description: 'Enforce that pytest.xfail is not introduced (see dev docs for details)'
Expand Down
4 changes: 2 additions & 2 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,13 @@ dependencies:
- ninja
- notebook
- numba>=0.57
- numpy>=1.21,<1.25
- numpy>=1.21
- numpydoc
- nvcc_linux-64=11.8
- nvcomp==3.0.5
- nvtx>=0.2.1
- packaging
- pandas>=1.3,<1.6.0dev0
- pandas>=2.0,<2.1.5dev0
- pandoc
- pip
- pre-commit
Expand Down
4 changes: 2 additions & 2 deletions conda/environments/all_cuda-120_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,12 @@ dependencies:
- ninja
- notebook
- numba>=0.57
- numpy>=1.21,<1.25
- numpy>=1.21
- numpydoc
- nvcomp==3.0.5
- nvtx>=0.2.1
- packaging
- pandas>=1.3,<1.6.0dev0
- pandas>=2.0,<2.1.5dev0
- pandoc
- pip
- pre-commit
Expand Down
5 changes: 2 additions & 3 deletions conda/recipes/cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,11 @@ requirements:
- {{ pin_compatible('protobuf', min_pin='x.x', max_pin='x') }}
- python
- typing_extensions >=4.0.0
- pandas >=1.3,<1.6.0dev0
- pandas >=2.0,<2.1.5dev0
- cupy >=12.0.0
# TODO: Pin to numba<0.58 until #14160 is resolved
- numba >=0.57,<0.58
# TODO: Pin to numpy<1.25 until cudf requires pandas 2
- numpy >=1.21,<1.25
- numpy >=1.21
- {{ pin_compatible('pyarrow', max_pin='x') }}
- libcudf ={{ version }}
- {{ pin_compatible('rmm', max_pin='x.x') }}
Expand Down
5 changes: 2 additions & 3 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -266,8 +266,7 @@ dependencies:
- *cmake_ver
- cython>=3.0.3
- *ninja
# TODO: Pin to numpy<1.25 until cudf requires pandas 2
- &numpy numpy>=1.21,<1.25
- &numpy numpy>=1.21
# Hard pin the patch version used during the build. This must be kept
# in sync with the version pinned in get_arrow.cmake.
- pyarrow==14.0.1.*
Expand Down Expand Up @@ -502,7 +501,7 @@ dependencies:
packages:
- fsspec>=0.6.0
- *numpy
- pandas>=1.3,<1.6.0dev0
- pandas>=2.0,<2.1.5dev0
run_cudf:
common:
- output_types: [conda, requirements, pyproject]
Expand Down
4 changes: 2 additions & 2 deletions docs/cudf/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -454,8 +454,8 @@ def on_missing_reference(app, env, node, contnode):
_prefixed_domain_objects[f"{prefix}{name}"] = name

reftarget = node.get("reftarget")
if reftarget == "cudf.core.index.GenericIndex":
# We don't exposed docs for `cudf.core.index.GenericIndex`
if reftarget == "cudf.core.index.Index":
# We don't exposed docs for `cudf.core.index.Index`
# hence we would want the docstring & mypy references to
# use `cudf.Index`
node["reftarget"] = "cudf.Index"
Expand Down
25 changes: 10 additions & 15 deletions docs/cudf/source/developer_guide/library_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Finally we tie these pieces together to provide a more holistic view of the proj
% class IndexedFrame
% class SingleColumnFrame
% class BaseIndex
% class GenericIndex
% class Index
% class MultiIndex
% class RangeIndex
% class DataFrame
Expand All @@ -42,8 +42,8 @@ Finally we tie these pieces together to provide a more holistic view of the proj
% BaseIndex <|-- MultiIndex
% Frame <|-- MultiIndex
%
% BaseIndex <|-- GenericIndex
% SingleColumnFrame <|-- GenericIndex
% BaseIndex <|-- Index
% SingleColumnFrame <|-- Index
%
% @enduml

Expand Down Expand Up @@ -89,31 +89,26 @@ While we've highlighted some exceptional cases of Indexes before, let's start wi
In practice, `BaseIndex` does have concrete implementations of a small set of methods.
However, currently many of these implementations are not applicable to all subclasses and will be eventually be removed.

Almost all indexes are subclasses of `GenericIndex`, a single-columned index with the class hierarchy:
Almost all indexes are subclasses of `Index`, a single-columned index with the class hierarchy:
```python
class GenericIndex(SingleColumnFrame, BaseIndex)
class Index(SingleColumnFrame, BaseIndex)
```
Integer, float, or string indexes are all composed of a single column of data.
Most `GenericIndex` methods are inherited from `Frame`, saving us the trouble of rewriting them.
Most `Index` methods are inherited from `Frame`, saving us the trouble of rewriting them.

We now consider the three main exceptions to this model:

- A `RangeIndex` is not backed by a column of data, so it inherits directly from `BaseIndex` alone.
Wherever possible, its methods have special implementations designed to avoid materializing columns.
Where such an implementation is infeasible, we fall back to converting it to an `Int64Index` first instead.
Where such an implementation is infeasible, we fall back to converting it to an `Index` of `int64`
dtype first instead.
- A `MultiIndex` is backed by _multiple_ columns of data.
Therefore, its inheritance hierarchy looks like `class MultiIndex(Frame, BaseIndex)`.
Some of its more `Frame`-like methods may be inherited,
but many others must be reimplemented since in many cases a `MultiIndex` is not expected to behave like a `Frame`.
- Just like in pandas, `Index` itself can never be instantiated.
`pandas.Index` is the parent class for indexes,
but its constructor returns an appropriate subclass depending on the input data type and shape.
Unfortunately, mimicking this behavior requires overriding `__new__`,
which in turn makes shared initialization across inheritance trees much more cumbersome to manage.
To enable sharing constructor logic across different index classes,
we instead define `BaseIndex` as the parent class of all indexes.
- To enable sharing constructor logic across different index classes,
we define `BaseIndex` as the parent class of all indexes.
`Index` inherits from `BaseIndex`, but it masquerades as a `BaseIndex` to match pandas.
This class should contain no implementations since it is simply a factory for other indexes.


## The Column layer
Expand Down
6 changes: 3 additions & 3 deletions docs/cudf/source/user_guide/api_docs/dataframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,14 @@ Function application, GroupBy & window
.. autosummary::
:toctree: api/

DataFrame.agg
DataFrame.apply
DataFrame.applymap
DataFrame.apply_chunks
DataFrame.apply_rows
DataFrame.pipe
DataFrame.agg
DataFrame.groupby
DataFrame.map
DataFrame.pipe
DataFrame.rolling

.. _api.dataframe.stats:
Expand Down Expand Up @@ -232,7 +233,6 @@ Combining / comparing / joining / merging
.. autosummary::
:toctree: api/

DataFrame.append
DataFrame.assign
DataFrame.join
DataFrame.merge
Expand Down
4 changes: 0 additions & 4 deletions docs/cudf/source/user_guide/api_docs/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ Computations / descriptive stats
:toctree: api/

GroupBy.bfill
GroupBy.backfill
GroupBy.count
GroupBy.cumcount
GroupBy.cummax
Expand All @@ -63,7 +62,6 @@ Computations / descriptive stats
GroupBy.ngroup
GroupBy.nth
GroupBy.nunique
GroupBy.pad
GroupBy.prod
GroupBy.shift
GroupBy.size
Expand All @@ -82,7 +80,6 @@ application to columns of a specific data type.
.. autosummary::
:toctree: api/

DataFrameGroupBy.backfill
DataFrameGroupBy.bfill
DataFrameGroupBy.count
DataFrameGroupBy.cumcount
Expand All @@ -96,7 +93,6 @@ application to columns of a specific data type.
DataFrameGroupBy.idxmax
DataFrameGroupBy.idxmin
DataFrameGroupBy.nunique
DataFrameGroupBy.pad
DataFrameGroupBy.quantile
DataFrameGroupBy.shift
DataFrameGroupBy.size
Expand Down
7 changes: 3 additions & 4 deletions docs/cudf/source/user_guide/api_docs/index_objects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Properties
Index.empty
Index.has_duplicates
Index.hasnans
Index.is_monotonic
Index.is_monotonic_increasing
Index.is_monotonic_decreasing
Index.is_unique
Expand Down Expand Up @@ -143,6 +142,7 @@ Selecting
.. autosummary::
:toctree: api/

Index.get_indexer
Index.get_level_values
Index.get_loc
Index.get_slice_bound
Expand All @@ -168,9 +168,6 @@ Numeric Index
RangeIndex.step
RangeIndex.to_numpy
RangeIndex.to_arrow
Int64Index
UInt64Index
Float64Index

.. _api.categoricalindex:

Expand Down Expand Up @@ -212,6 +209,7 @@ IntervalIndex components

IntervalIndex.from_breaks
IntervalIndex.values
IntervalIndex.get_indexer
IntervalIndex.get_loc

.. _api.multiindex:
Expand Down Expand Up @@ -258,6 +256,7 @@ MultiIndex selecting
.. autosummary::
:toctree: api/

MultiIndex.get_indexer
MultiIndex.get_loc
MultiIndex.get_level_values

Expand Down
2 changes: 0 additions & 2 deletions docs/cudf/source/user_guide/api_docs/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,6 @@ Computations / descriptive stats
Series.unique
Series.nunique
Series.is_unique
Series.is_monotonic
Series.is_monotonic_increasing
Series.is_monotonic_decreasing
Series.value_counts
Expand Down Expand Up @@ -226,7 +225,6 @@ Combining / comparing / joining / merging
.. autosummary::
:toctree: api/

Series.append
Series.update

Time Series-related
Expand Down
21 changes: 21 additions & 0 deletions docs/cudf/source/user_guide/pandas-comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,27 @@ module, which allow you to compare values up to a desired precision.
Unlike Pandas, cuDF does not support duplicate column names.
It is best to use unique strings for column names.

## Writing a DataFrame to Parquet with non-string column names

When there is a DataFrame with non-string column names, pandas casts each
column name to `str` before writing to a Parquet file. `cudf` raises an
error by default if this is attempted. However, to achieve similar behavior
as pandas you can enable the `mode.pandas_compatible` option, which will
enable `cudf` to cast the column names to `str` just like pandas.

```python
>>> import cudf
>>> df = cudf.DataFrame({1: [1, 2, 3], "1": ["a", "b", "c"]})
>>> df.to_parquet("df.parquet")

Traceback (most recent call last):
ValueError: Writing a Parquet file requires string column names
>>> cudf.set_option("mode.pandas_compatible", True)
>>> df.to_parquet("df.parquet")

UserWarning: The DataFrame has column names of non-string type. They will be converted to strings on write.
```

## No true `"object"` data type

In Pandas and NumPy, the `"object"` data type is used for
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/benchmarks/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@
In addition to the above fixtures, we also provide the following more
specialized fixtures:
- rangeindex: Since RangeIndex always holds int64 data we cannot conflate
it with index_dtype_int64 (a true Int64Index), and it cannot hold nulls.
As a result, it is provided as a separate fixture.
it with index_dtype_int64 (a true Index with int64 dtype), and it
cannot hold nulls. As a result, it is provided as a separate fixture.
"""

import os
Expand Down
26 changes: 1 addition & 25 deletions python/cudf/cudf/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2018-2023, NVIDIA CORPORATION.
# Copyright (c) 2018-2024, NVIDIA CORPORATION.

# _setup_numba _must be called before numba.cuda is imported, because
# it sets the numba config variable responsible for enabling
Expand Down Expand Up @@ -41,22 +41,10 @@
BaseIndex,
CategoricalIndex,
DatetimeIndex,
Float32Index,
Float64Index,
GenericIndex,
Index,
Int8Index,
Int16Index,
Int32Index,
Int64Index,
IntervalIndex,
RangeIndex,
StringIndex,
TimedeltaIndex,
UInt8Index,
UInt16Index,
UInt32Index,
UInt64Index,
interval_range,
)
from cudf.core.missing import NA, NaT
Expand Down Expand Up @@ -109,15 +97,8 @@
"DatetimeIndex",
"Decimal32Dtype",
"Decimal64Dtype",
"Float32Index",
"Float64Index",
"GenericIndex",
"Grouper",
"Index",
"Int16Index",
"Int32Index",
"Int64Index",
"Int8Index",
"IntervalDtype",
"IntervalIndex",
"ListDtype",
Expand All @@ -127,13 +108,8 @@
"RangeIndex",
"Scalar",
"Series",
"StringIndex",
"StructDtype",
"TimedeltaIndex",
"UInt16Index",
"UInt32Index",
"UInt64Index",
"UInt8Index",
"api",
"concat",
"crosstab",
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/_fuzz_testing/csv.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
# Copyright (c) 2020-2024, NVIDIA CORPORATION.

import logging
import random
Expand Down Expand Up @@ -99,7 +99,7 @@ def set_rand_params(self, params):
if dtype_val is not None:
dtype_val = {
col_name: "category"
if cudf.utils.dtypes.is_categorical_dtype(dtype)
if cudf.utils.dtypes._is_categorical_dtype(dtype)
else pandas_dtypes_to_np_dtypes[dtype]
for col_name, dtype in dtype_val.items()
}
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/_fuzz_testing/json.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
# Copyright (c) 2020-2024, NVIDIA CORPORATION.

import logging
import random
Expand Down Expand Up @@ -27,7 +27,7 @@ def _get_dtype_param_value(dtype_val):
if dtype_val is not None and isinstance(dtype_val, abc.Mapping):
processed_dtypes = {}
for col_name, dtype in dtype_val.items():
if cudf.utils.dtypes.is_categorical_dtype(dtype):
if cudf.utils.dtypes._is_categorical_dtype(dtype):
processed_dtypes[col_name] = "category"
else:
processed_dtypes[col_name] = str(
Expand Down