Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33666: [R] Remove extraneous argument to semi_join #33693

Merged
merged 4 commits into from Jan 17, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 1 addition & 2 deletions r/R/dplyr-join.R
Expand Up @@ -116,9 +116,8 @@ semi_join.arrow_dplyr_query <- function(x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
...) {
do_join(x, y, by, copy, suffix, ..., join_type = "LEFT_SEMI")
do_join(x, y, by, copy, ..., join_type = "LEFT_SEMI")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are changes in this file necessary to make the tests pass? (Or do they have another purpose, like perhaps making this function more consistent with other arrow join implementations?)

Copy link
Member Author

@thisisnic thisisnic Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the API consistent with dplyr, but not necessary to make the tests pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that change; however, I think it should be consistent for all of our join wrappers (anti, left, right, etc.). Currently they all define the suffix argument explicitly (and I think the default value of c(".x", ".y") is necessary and may result in an error if you try to semi_join overlapping columns?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this change as the anti_join() signature is also different to dplyr's. I don't quite understand what you mean; please can we discuss this in #33709?

}
semi_join.Dataset <- semi_join.ArrowTabular <- semi_join.RecordBatchReader <- semi_join.arrow_dplyr_query

Expand Down
11 changes: 2 additions & 9 deletions r/tests/testthat/test-dplyr-join.R
Expand Up @@ -82,7 +82,7 @@ test_that("left_join with join_by", {
left_join(
to_join %>%
rename(the_grouping = some_grouping),
join_by(some_grouping == the_grouping)
join_by(some_grouping == the_grouping)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Member Author

@thisisnic thisisnic Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the consequence of running the styler.

) %>%
collect(),
left
Expand Down Expand Up @@ -240,14 +240,7 @@ test_that("full_join", {
test_that("semi_join", {
compare_dplyr_binding(
.input %>%
semi_join(to_join, by = "some_grouping", keep = TRUE) %>%
collect(),
left
)

compare_dplyr_binding(
.input %>%
semi_join(to_join, by = "some_grouping", keep = FALSE) %>%
semi_join(to_join, by = "some_grouping") %>%
collect(),
left
)
Expand Down