-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1325701: Use scoped temp object in write pandas #2068
Conversation
DESCRIPTION.md
Outdated
@@ -7,6 +7,8 @@ https://docs.snowflake.com/ | |||
Source code is also available at: https://github.com/snowflakedb/snowflake-connector-python | |||
|
|||
# Release Notes | |||
- v3.13.0(TBD) | |||
- Improved `write_pandas` and `BindUploadAgent.upload` to create scoped temporary object instead of temporary object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this change in the public connector?
I thought "scoped object" only applied to the concept of stored proc. is this for aligning the change with the sproc connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is some context: https://docs.google.com/document/d/1qfJevIMEZL7O12H5pQK5WZG-a-uwA2K2cmFDusHjdXM/edit?usp=sharing.
I think mostly they want temp object being created as job-scoped object, and they also want to align with sp connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct?: for regular / non-SP session, "job" is equivalent to the session itself. So job-scoped objects are de facto session-scoped ones. Therefore this change does not affect behavior of regular sessions.
self, | ||
cursor: SnowflakeCursor, | ||
rows: list[bytes], | ||
stream_buffer_size: int = 1024 * 1024 * 10, | ||
use_scoped_temp_object: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this as part of the public api/do we want users to control the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change this to private
@@ -181,6 +209,7 @@ def write_pandas( | |||
overwrite: bool = False, | |||
table_type: Literal["", "temp", "temporary", "transient"] = "", | |||
use_logical_type: bool | None = None, | |||
use_scoped_temp_object: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, do we need this to be part of public api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connector is not only used by snowpark, but also by non-snowpark users. I think we should make the job scoped change only for snowpark users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me. Only 2 nitpicking comments. Thanks for looking into this!
cursor.connection._session_parameters.get( | ||
_PYTHON_SNOWPARK_USE_SCOPED_TEMP_OBJECTS_STRING, False | ||
) | ||
if cursor.connection._session_parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not a strong opinion) Will cursor.connection._session_parameters
ever be None? iirc we always at least initialized to an empty dict, so we can even simplify do
self._use_scoped_temp_object = cursor.connection._session_parameters.get(
_PYTHON_SNOWPARK_USE_SCOPED_TEMP_OBJECTS_STRING, False
)
But to err on the side of caution is also a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the practice in snowpark here, and this approach handled all possible errors too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hit an issue with this change. The latest 3.12.3
broke our tests.
In our tests, we have a MockConnection
class which mocks all the public functions of Connection. This change, however, started using a private class member (connection._session_parameters
), which doesn't seem right.
|
||
|
||
TEMP_OBJECT_NAME_PREFIX = "SNOWPARK_TEMP_" | ||
ALPHANUMERIC = string.digits + string.ascii_lowercase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is there any special reason that we set the character set to be number + lowercase, but then later uppercase the entire name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, I followed snowpark's practice, I guess because we need a special name format for scoped object
self._stage_name = ( | ||
random_name_for_temp_object(TempObjectType.STAGE) | ||
if self._use_scoped_temp_object | ||
else "SYSTEMBIND" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I just realized that if _use_scoped_temp_object
is True, then each time when we instantiate a BindUploadAgent (we do this when binding parameters), a new temp stage created.
is this the same behavior as the sproc connector change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky problem, the original design is that the temp stage name in bind upload agent is a constant. So that there is only one temp stage created by it.
what i can think of is creating a constant stage name following the naming rule, what do you think?
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes #NNNN
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.
(Optional) PR for stored-proc connector: