Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Async Embeddings via michaelfeil/infinity #596

Closed
michaelfeil opened this issue Feb 13, 2024 · 9 comments
Closed

Support for Async Embeddings via michaelfeil/infinity #596

michaelfeil opened this issue Feb 13, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@michaelfeil
Copy link

Describe the Feature
I would like to integrate https://github.com/michaelfeil/infinity for embeddings inference. This would automatically batch up concurrent request, uses flash-attention2, compatible with cuda, rocm, apple mps and cpu.
Depending on the usage, you might expect between a 2.5x-22x throughput improvment / speedup over using the default hf embeddings langchain code.

@michaelfeil michaelfeil added the enhancement New feature or request label Feb 13, 2024
@shahules786
Copy link
Member

Hey @michaelfeil , thank you for sharing this man. We would love to have this, are you interested in working on this?

@michaelfeil
Copy link
Author

@shahules786 Perhaps I'll just push the pure python / async into langchain directly, then it should be reusable, right?

https://github.com/explodinggradients/ragas/blob/41c0c286ae3632a77db13ddd265b7699fe6a4adc/src/ragas/embeddings/base.py#L45C1-L78C61

@jjmachan
Copy link
Member

he @michaelfeil this would be awesome, like you said if you drop something in langchain, that will be the easiest for you in terms of time spent. What we would love to do is build an integration doc with infinity and showcase how fast it is and how it improves people who are using Ragas as well, hopefully, driving some traffic your way.

if you check this section we use embed a lot of chunks in sequence and is limiting on how your embedding is served. Maybe we can do a comparison here? would that be something interested?

for i, n in enumerate(nodes):
if n.embedding is None:
nodes_to_embed.update({i: result_idx})
executor.submit(
self.embeddings.embed_text,
n.page_content,
name=f"embed_node_task[{i}]",
)
result_idx += 1
if n.keyphrases == []:
nodes_to_extract.update({i: result_idx})
executor.submit(
self.extractor.extract,
n,
name=f"keyphrase-extraction[{i}]",
)
result_idx += 1
results = executor.results()
if results == []:
raise ExceptionInRunner()

we can do other comparisons too but the LLM is the limiting factor for performance so there won't be a lot of diff but the above usecase would be solid for a comparison

let me know if its something that interests you :)

@michaelfeil
Copy link
Author

Would be interesting. Fyi, I added the PR for langchain here, took me some hours over the weekend, hope its getting merged soon. langchain-ai/langchain#17671

I would not recommend submitting the nodes (assuming each node has 1 sentence) with ThreadPoolExecutor. At a minimum batch the requests, this will help whatever backend, even API's.

Also is using async def an option for the function you linked above @jjmachan ?

@michaelfeil
Copy link
Author

michaelfeil commented Feb 22, 2024

FYI all the thing is now finally in langchain (community, see PR mentined above). Also, you might be interested in in https://github.com/michaelfeil/infinity/blob/1fe3a34e295c95fc4a75297de842ec55c6761457/docs/benchmarks/benchmarking.md for benchmarking.

@michaelfeil
Copy link
Author

@jjmachan It should be now in some versions of langchain.

@michaelfeil
Copy link
Author

Hey all, looking forward to contribute this.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 19, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024
@michaelfeil
Copy link
Author

Nah, not stale!

@michaelfeil
Copy link
Author

I am still waiting for a freaking PR review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants