Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-920892: v3.3.0 nanoarrow based connector #1750

Merged
merged 52 commits into from
Oct 11, 2023
Merged

Conversation

sfc-gh-aling
Copy link
Collaborator

@sfc-gh-aling sfc-gh-aling commented Oct 3, 2023

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

SNOW-920892: GA release nanoarow based connector 3.3.0

  • This PR refactored the arrow data conversion part of connector to use arrow-nanoarrow project.
  • In this PR/release, vendored arrow converted is kept so that users can switch back to the old approach is the new approach turns out not working properly
    • users can switch by client parameter, env var or session parameters
  • The change shall be transparent to users in all perspective:
    • all features remain the same
    • no api change
    • no performance regression

the recommended way to review the PR:

  • review the *.py files in dir src/snowflake/connector
  • review the pyx file src/snowflake/connector/nanoarrow_cpp/ArrowIterator/nanoarrow_arrow_iterator.pyx
  • review the test python code
  • review the setup/build related files, ci/cd related files
  • review the cpp converter code, best practice, open the vendored arrow converter code side by side and review
  • YOU DO NOT NEED TO (unless you want to) review the code of the following paths, they are vendored nanoarrow project:
    • src/snowflake/connector/nanoarrow_cpp/flatcc.cc
    • src/snowflake/connector/nanoarrow_cpp/flatcc
    • src/snowflake/connector/nanoarrow_cpp/nanoarrow*.*, except for src/snowflake/connector/nanoarrow_cpp/ArrowIterator/nanoarrow_arrow_iterator.pyx
  1. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am modifying authorization mechanisms
    • I am adding new credentials
    • I am modifying OCSP code
    • I am adding a new dependency
  2. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

Sorry, something went wrong.

Copy link
Contributor

@sfc-gh-sfan sfc-gh-sfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/**/*.py

test/helpers.py Outdated Show resolved Hide resolved
test/helpers.py Outdated Show resolved Hide resolved
test/helpers.py Outdated Show resolved Hide resolved
test/helpers.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sfc-gh-sfan sfc-gh-sfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup looks good :)

setup.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@ofek
Copy link
Contributor

ofek commented Oct 5, 2023

Can we please add an environment variable that would prevent building the extension even if the required dependencies are available?

@sfc-gh-aling
Copy link
Collaborator Author

hey @ofek, can you elaborate on your requirements here, is this related to this specific PR or to the general connector build steps

@ofek
Copy link
Contributor

ofek commented Oct 5, 2023

This is a general request not specific to this PR but it would be easy to add since we are already modifying the file. We have a need to skip compiling under certain circumstances.

Copy link
Contributor

@sfc-gh-sfan sfc-gh-sfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converters

@sfc-gh-aling
Copy link
Collaborator Author

@ofek thanks for the feedback, I think this is a good one. however, this PR is dedicated only for the nanoarrow changes.
to control the scope and for better maintainability, I'd rather not to include other changes in this PR just in case this PR has to be reverted in a future time due to critical issue we might found in the code.

can you open a separate github issue for the request? we'd like to address your issue separately

@ofek
Copy link
Contributor

ofek commented Oct 6, 2023

@sfc-gh-aling #1754

Copy link
Contributor

@sfc-gh-bwarsaw sfc-gh-bwarsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check pointing my review.

setup.py Outdated Show resolved Hide resolved
src/snowflake/connector/connection.py Outdated Show resolved Hide resolved
@sfc-gh-aling
Copy link
Collaborator Author

@sfc-gh-bwarsaw , your understanding is correct. I think this is a good feedback, I will make the change.
one thing to call out is that those will not be part the public apis as they will be removed in the pure nanoarrow connector release

@sfc-gh-aling
Copy link
Collaborator Author

I put up a draft PR for feedback improvement: https://github.com/snowflakedb/snowflake-connector-python/pull/1758/files

Copy link
Contributor

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a welcome addition and will make packaging simpler once you can drop the bundled arrow libs. I will give this a test run in conda-forge in the next days and report back whether I see any build issues.

setup.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sfc-gh-mkeller sfc-gh-mkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all .py and .pyx files

.github/workflows/build_test.yml Outdated Show resolved Hide resolved
MANIFEST.in Outdated Show resolved Hide resolved

ext.sources += [
os.path.join(
NANOARROW_ARROW_ITERATOR_SRC_DIR, "CArrowIterator.cpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we could introduce a helper for all this repeating code, maybe in a follow-up PR

setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
test/stress/e2e_iterator.py Outdated Show resolved Hide resolved
test/stress/e2e_iterator.py Outdated Show resolved Hide resolved
test/stress/local_iterator.py Show resolved Hide resolved
test/stress/local_iterator.py Outdated Show resolved Hide resolved
plt.title("per iteration execution time")
plt.show()
plt.plot(
[item[0] for item in memory_records], [item[1] for item in memory_records]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use map and operator.itemgetter here instead

Copy link
Collaborator

@sfc-gh-mkeller sfc-gh-mkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🚢

@sfc-gh-aling sfc-gh-aling enabled auto-merge (squash) October 11, 2023 01:27
@sfc-gh-aling sfc-gh-aling disabled auto-merge October 11, 2023 01:27
@sfc-gh-aling sfc-gh-aling merged commit de93ef4 into main Oct 11, 2023
78 checks passed
@sfc-gh-aling sfc-gh-aling deleted the dev/nanoarrow branch October 11, 2023 01:29
@github-actions github-actions bot locked and limited conversation to collaborators Oct 11, 2023
@sfc-gh-aling sfc-gh-aling restored the dev/nanoarrow branch October 11, 2023 01:33
@sfc-gh-mkeller sfc-gh-mkeller deleted the dev/nanoarrow branch November 9, 2023 23:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants