Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCSFileLoader retrieve blob custom metadata and append to document metadata #11066

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

bharatl
Copy link
Contributor

@bharatl bharatl commented Sep 26, 2023

@vercel
Copy link

vercel bot commented Sep 26, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Sep 26, 2023 0:55am

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Sep 26, 2023
@@ -62,6 +62,8 @@ def load(self) -> List[Document]:
bucket = storage_client.get_bucket(self.bucket)
# Create a blob object from the filepath
blob = bucket.blob(self.blob)
# retrieve custom metadata associated with the blob
metadata = bucket.get_blob(self.blob).metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just do metadata = blob.metadata?

Copy link
Contributor Author

@bharatl bharatl Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan
blob() does not actually send any request to GCS while get_blob() fetches meta-information about the blob from GCS. So metadata = blob.metadata wont fetch any meta information.

What I can think of instead of blob = bucket.blob(self.blob) we can have blob = bucket.get_blob(self.blob) and fetch metadata using blob.metadata. Your thoughts please

reference: link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan any thoughts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbusso @jarib @zeke need your thoughts on the above proposed solution

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, missed the blob vs get_blob part!

@bharatl bharatl requested a review from baskaryan October 12, 2023 13:53
@baskaryan baskaryan requested a review from eyurtsev October 12, 2023 19:11
@baskaryan
Copy link
Collaborator

thanks @bharatl!

@baskaryan baskaryan merged commit 6730056 into langchain-ai:master Oct 17, 2023
@bharatl bharatl deleted the gscMetadata branch October 18, 2023 08:22
hoanq1811 pushed a commit to hoanq1811/langchain that referenced this pull request Feb 2, 2024
…tadata (langchain-ai#11066)

- **Description:** GCSFileLoader retrieve blob's custom metadata and
append to document's metadata
- **Issue:** langchain-ai#9975,
- **Tag maintainer:** @baskaryan please review

Co-authored-by: b0l00ib <bharat.lal@walmart.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants