-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Glue catalog strip trailing slash on DB URI #8870
AWS: Glue catalog strip trailing slash on DB URI #8870
Conversation
cc @bryanck @okayhooni I just took another look at the issue and see that the DB URI In the end, I think we'll still need this fix because without it there's double slash possibilities but I would need to dig further to find the root cause for the issue's specific case |
@@ -281,6 +281,7 @@ protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) { | |||
.build()); | |||
String dbLocationUri = response.database().locationUri(); | |||
if (dbLocationUri != null) { | |||
dbLocationUri = LocationUtil.stripTrailingSlash(dbLocationUri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like same response was used in method loadNamespaceMetadata
(Line 518) as well, do we need to strip there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think we'd need to do the same in loadNamespaceMetadata()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was worried about the behavior change for callers of the API (now they cannot expect a trailing slash in case they were using the location in the response), but overall changing loadNamespaceMetadata
makes sense. The API is pretty much used when updating namespace properties in which case stripping the trailing slash can serve as a correction mechanism for any future tables in the namespace. Existing tables in the namespace wouldn't see a change in behavior. Updated
Just a wild guess, but could it be possible that we're missing to strip a trailing slash in
|
That's another place I spotted as well, unfortunately while I think we should have stripped the trailing slash there to begin with, I don't think we can change that. The reason being, is if we change that the metadataFileLocation path for new metadata files would be different compared to the original location which would have double slash. This difference in where the metadata file gets written may confuse users and also have unintended side effects for external systems. Also I need to step through the code a bit more, but not sure if we always guaranteed at build/write time of the location that the trailing slash would be stripped. In that case we would already be covered. |
ca9161a
to
9f3f552
Compare
Thanks @bryanck @dramaticlly @singhpk234 @nastra for reviews! @okayhooni feel free to comment on this PR if the db location in your case did not have a trailing slash. In that case there's still some more cases we'll need to look into. |
Sorry for the late response and thanks to handle this issue..! Yes, there is a trailing slash on the database location, as you said..! so, your commit will handle the issue..! thank you! |
Currently the Glue catalog does not strip the trailing slash on default DB location this can lead to path issues as seen in the below issue.
Related to: databricks/iceberg-kafka-connect#127