Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Glue catalog strip trailing slash on DB URI #8870

Merged

Conversation

amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Oct 18, 2023

Currently the Glue catalog does not strip the trailing slash on default DB location this can lead to path issues as seen in the below issue.

Related to: databricks/iceberg-kafka-connect#127

@github-actions github-actions bot added the AWS label Oct 18, 2023
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Oct 18, 2023

cc @bryanck @okayhooni I just took another look at the issue and see that the DB URI s3://test_bucket/temp_beta does not have a trailing slash, so maybe my fix is insufficient. Can you confirm that in Glue your default DB location does not have a trailing slash?

In the end, I think we'll still need this fix because without it there's double slash possibilities but I would need to dig further to find the root cause for the issue's specific case

@@ -281,6 +281,7 @@ protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) {
.build());
String dbLocationUri = response.database().locationUri();
if (dbLocationUri != null) {
dbLocationUri = LocationUtil.stripTrailingSlash(dbLocationUri);
Copy link
Contributor

@dramaticlly dramaticlly Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like same response was used in method loadNamespaceMetadata (Line 518) as well, do we need to strip there as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think we'd need to do the same in loadNamespaceMetadata()

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was worried about the behavior change for callers of the API (now they cannot expect a trailing slash in case they were using the location in the response), but overall changing loadNamespaceMetadata makes sense. The API is pretty much used when updating namespace properties in which case stripping the trailing slash can serve as a correction mechanism for any future tables in the namespace. Existing tables in the namespace wouldn't see a change in behavior. Updated

@nastra
Copy link
Contributor

nastra commented Oct 19, 2023

Just a wild guess, but could it be possible that we're missing to strip a trailing slash in

return String.format("%s/%s/%s", metadata.location(), METADATA_FOLDER_NAME, filename);
?

@amogh-jahagirdar
Copy link
Contributor Author

Just a wild guess, but could it be possible that we're missing to strip a trailing slash in

return String.format("%s/%s/%s", metadata.location(), METADATA_FOLDER_NAME, filename);

?

That's another place I spotted as well, unfortunately while I think we should have stripped the trailing slash there to begin with, I don't think we can change that. The reason being, is if we change that the metadataFileLocation path for new metadata files would be different compared to the original location which would have double slash. This difference in where the metadata file gets written may confuse users and also have unintended side effects for external systems. Also I need to step through the code a bit more, but not sure if we always guaranteed at build/write time of the location that the trailing slash would be stripped. In that case we would already be covered.

@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-glue-db-trailing-slash branch from ca9161a to 9f3f552 Compare October 19, 2023 12:45
@amogh-jahagirdar
Copy link
Contributor Author

Thanks @bryanck @dramaticlly @singhpk234 @nastra for reviews! @okayhooni feel free to comment on this PR if the db location in your case did not have a trailing slash. In that case there's still some more cases we'll need to look into.

@amogh-jahagirdar amogh-jahagirdar merged commit d92be9b into apache:main Oct 19, 2023
@okayhooni
Copy link

@amogh-jahagirdar

Sorry for the late response and thanks to handle this issue..!

Yes, there is a trailing slash on the database location, as you said..! so, your commit will handle the issue..! thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants