Skip to content

Commit

Permalink
community[minor]: S3FileLoader to use expose mode and post_processors…
Browse files Browse the repository at this point in the history
… arguments of unstructured loader (langchain-ai#19270)

**Description:** Update s3_file.py to use arguments **mode** and
**post_processors** from the base class **UnstructuredBaseLoader** to
include more metadata about the files from the S3 bucket such as
*'page_number', 'languages'* etc.

**Issue:** NA
**Dependencies:** None
**Twitter handle:** preak95

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
  • Loading branch information
3 people authored and rahul-trip committed Mar 27, 2024
1 parent 56e5a17 commit 8133942
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions libs/community/langchain_community/document_loaders/s3_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import os
import tempfile
from typing import TYPE_CHECKING, List, Optional, Union
from typing import TYPE_CHECKING, Callable, List, Optional, Union

from langchain_community.document_loaders.unstructured import UnstructuredBaseLoader

Expand All @@ -27,6 +27,8 @@ def __init__(
aws_secret_access_key: Optional[str] = None,
aws_session_token: Optional[str] = None,
boto_config: Optional[botocore.client.Config] = None,
mode: str = "single",
post_processors: Optional[List[Callable]] = None,
):
"""Initialize with bucket and key name.
Expand Down Expand Up @@ -82,8 +84,12 @@ def __init__(
object is set on the session, the config object used when creating
the client will be the result of calling ``merge()`` on the
default config with the config provided to this call.
:param mode: Mode in which to read the file. Valid options are: single,
paged and elements
:param post_processors: Post processing functions to be applied to
extracted elements
"""
super().__init__()
super().__init__(mode, post_processors)
self.bucket = bucket
self.key = key
self.region_name = region_name
Expand Down

0 comments on commit 8133942

Please sign in to comment.