A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 11, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
🤖 The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
Private & local AI personal knowledge management app.
LSP-AI is an open-source language server that serves as a backend for AI-powered functionality, designed to assist and empower software engineers, not replace them.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
🤖 Collect practical AI repos, tools, websites, papers and tutorials on AI. 实用的AI百宝箱 💎
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
🤯 Lobe Chat - an open-source, modern-design ChatGPT/LLMs UI/Chat Framework. Supports speech-synthesis, multi-modal, and extensible plugin system. One-click FREE deployment of your private ChatGPT/Gemini/Ollama chat application.
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
Java version of LangChain
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Add a description, image, and links to the llama topic page so that developers can more easily learn about it.
To associate your repository with the llama topic, visit your repo's landing page and select "manage topics."