Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1325701: Use scoped temp object in write pandas #2068

Merged
merged 12 commits into from
Oct 17, 2024
Merged

Conversation

sfc-gh-yuwang
Copy link
Contributor

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #NNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am modifying authorization mechanisms
    • I am adding new credentials
    • I am modifying OCSP code
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

  4. (Optional) PR for stored-proc connector:

Sorry, something went wrong.

@sfc-gh-yuwang sfc-gh-yuwang marked this pull request as ready for review October 11, 2024 18:25
DESCRIPTION.md Outdated
@@ -7,6 +7,8 @@ https://docs.snowflake.com/
Source code is also available at: https://github.com/snowflakedb/snowflake-connector-python

# Release Notes
- v3.13.0(TBD)
- Improved `write_pandas` and `BindUploadAgent.upload` to create scoped temporary object instead of temporary object.
Copy link
Collaborator

@sfc-gh-aling sfc-gh-aling Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this change in the public connector?
I thought "scoped object" only applied to the concept of stored proc. is this for aligning the change with the sproc connector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is some context: https://docs.google.com/document/d/1qfJevIMEZL7O12H5pQK5WZG-a-uwA2K2cmFDusHjdXM/edit?usp=sharing.
I think mostly they want temp object being created as job-scoped object, and they also want to align with sp connector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct?: for regular / non-SP session, "job" is equivalent to the session itself. So job-scoped objects are de facto session-scoped ones. Therefore this change does not affect behavior of regular sessions.

Comment on lines 30 to 34
self,
cursor: SnowflakeCursor,
rows: list[bytes],
stream_buffer_size: int = 1024 * 1024 * 10,
use_scoped_temp_object: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this as part of the public api/do we want users to control the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change this to private

@@ -181,6 +209,7 @@ def write_pandas(
overwrite: bool = False,
table_type: Literal["", "temp", "temporary", "transient"] = "",
use_logical_type: bool | None = None,
use_scoped_temp_object: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, do we need this to be part of public api?

@sfc-gh-yuwang sfc-gh-yuwang added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Oct 14, 2024
Copy link
Collaborator

@sfc-gh-aling sfc-gh-aling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connector is not only used by snowpark, but also by non-snowpark users. I think we should make the job scoped change only for snowpark users.

Copy link
Contributor

@sfc-gh-zyao sfc-gh-zyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me. Only 2 nitpicking comments. Thanks for looking into this!

cursor.connection._session_parameters.get(
_PYTHON_SNOWPARK_USE_SCOPED_TEMP_OBJECTS_STRING, False
)
if cursor.connection._session_parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a strong opinion) Will cursor.connection._session_parameters ever be None? iirc we always at least initialized to an empty dict, so we can even simplify do

self._use_scoped_temp_object = cursor.connection._session_parameters.get(
    _PYTHON_SNOWPARK_USE_SCOPED_TEMP_OBJECTS_STRING, False
)

But to err on the side of caution is also a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the practice in snowpark here, and this approach handled all possible errors too

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hit an issue with this change. The latest 3.12.3 broke our tests.

In our tests, we have a MockConnection class which mocks all the public functions of Connection. This change, however, started using a private class member (connection._session_parameters), which doesn't seem right.



TEMP_OBJECT_NAME_PREFIX = "SNOWPARK_TEMP_"
ALPHANUMERIC = string.digits + string.ascii_lowercase
Copy link
Contributor

@sfc-gh-zyao sfc-gh-zyao Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is there any special reason that we set the character set to be number + lowercase, but then later uppercase the entire name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, I followed snowpark's practice, I guess because we need a special name format for scoped object

Comment on lines 50 to 54
self._stage_name = (
random_name_for_temp_object(TempObjectType.STAGE)
if self._use_scoped_temp_object
else "SYSTEMBIND"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I just realized that if _use_scoped_temp_object is True, then each time when we instantiate a BindUploadAgent (we do this when binding parameters), a new temp stage created.

is this the same behavior as the sproc connector change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky problem, the original design is that the temp stage name in bind upload agent is a constant. So that there is only one temp stage created by it.
what i can think of is creating a constant stage name following the naming rule, what do you think?

@sfc-gh-yuwang sfc-gh-yuwang merged commit a959fc6 into main Oct 17, 2024
93 of 94 checks passed
@sfc-gh-yuwang sfc-gh-yuwang deleted the SNOW-1325701 branch October 17, 2024 16:48
@github-actions github-actions bot locked and limited conversation to collaborators Oct 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants