-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: added multiprocessing for creating hf embedddings #15260
Feature: added multiprocessing for creating hf embedddings #15260
Conversation
…feat/multiprocessing_hfembedddings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shail2512-lm10 for the contribution. Left a minor comment in my review.
""" | ||
Args: | ||
parallel_process (bool): Default to False. If True it will start a multi-process pool to process the encoding | ||
with several independent processes. | ||
target_devices (List[str], optional): It will only taken into account if `parallel_process` = `True`. PyTorch | ||
target devices, e.g. ["cuda:0", "cuda:1", ...], ["npu:0", "npu:1", ...], or ["cpu", "cpu", "cpu", "cpu"]. | ||
If target_devices is None and CUDA/NPU is available, then all available CUDA/NPU devices will be used. | ||
If target_devices is None and CUDA/NPU is not available, then 4 CPU devices will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the class docstring. We should probably expand this to all fields and methods?
@shail2512-lm10: We should also bump the version number in this package's |
Thank you @nerdai. Sure, will do all the modifications!! |
…feat/multiprocessing_hfembedddings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @shail2512-lm10!
Description
It would be great if
HuggingFaceEmbedding
class had a multiprocessing feature for creating embeddings for vast amounts of text just likeSentenceTransformers
has. Hence I added a multiprocessing support for the same.HuggingFaceEmbedding
class takes additional two arguments: parallel_process and traget_devices. If parallel_process is True then the multiprocess_pool will start as per the methods available inSentenceTransformers
and when that task is done the mutliprocess_pool will stop.Reference: SentenceTransformer implementation
PS: This is my first PR in LLamaIndex :)
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods