-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSFileLoader retrieve blob custom metadata and append to document metadata #11066
Conversation
bharatl
commented
Sep 26, 2023
- Description: GCSFileLoader retrieve blob's custom metadata and append to document's metadata
- Issue: GCSFileLoader need to read blob's metadata and populate it to documents metadata #9975,
- Tag maintainer: @baskaryan please review
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@@ -62,6 +62,8 @@ def load(self) -> List[Document]: | |||
bucket = storage_client.get_bucket(self.bucket) | |||
# Create a blob object from the filepath | |||
blob = bucket.blob(self.blob) | |||
# retrieve custom metadata associated with the blob | |||
metadata = bucket.get_blob(self.blob).metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just do metadata = blob.metadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baskaryan
blob()
does not actually send any request to GCS while get_blob()
fetches meta-information about the blob from GCS. So metadata = blob.metadata
wont fetch any meta information.
What I can think of instead of blob = bucket.blob(self.blob)
we can have blob = bucket.get_blob(self.blob)
and fetch metadata using blob.metadata
. Your thoughts please
reference: link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baskaryan any thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, missed the blob vs get_blob part!
thanks @bharatl! |
…tadata (langchain-ai#11066) - **Description:** GCSFileLoader retrieve blob's custom metadata and append to document's metadata - **Issue:** langchain-ai#9975, - **Tag maintainer:** @baskaryan please review Co-authored-by: b0l00ib <bharat.lal@walmart.com>