Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference #1856

Open
saleshwaram opened this issue May 15, 2024 · 1 comment
Open

Comments

@saleshwaram
Copy link

Feature request

I would like to request support for using Meta LLaMA 3 with ORTModelForCausalLM for faster inference. This integration would leverage the capabilities of the ONNX Runtime (ORT) to optimize and accelerate the performance of Meta LLaMA 3 models.

Motivation

Currently, there is no direct support for integrating Meta LLaMA 3 with ORTModelForCausalLM on Hugging Face. This lack of integration leads to slower inference times, which can be a significant bottleneck in applications requiring real-time or near-real-time responses. Providing support for this integration would greatly enhance the performance and usability of Meta LLaMA 3 models, particularly in production environments where inference speed is critical.

Your contribution

While I may not have the expertise to implement this feature myself, I am willing to assist with testing and providing feedback on the integration process. Additionally, I can help with documentation and usage examples once the feature is implemented.

@IlyasMoutawwakil
Copy link
Member

Hi! are you sure llama3 doesn't work ? it's the same architecture/model_type of llama2 so it should work out of the box
I'm running a script locally to export it to see if it works (the export is going smoothly with meta-llama/Meta-Llama-3-8B)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants