Examples of serving LLM on Modal.
-
Updated
Jun 13, 2024 - Python
Examples of serving LLM on Modal.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
A REST API for vLLM, production ready
Evaluate your LLM's response with Prometheus and GPT4 💯
Setup and run a local LLM and Chatbot using consumer grade hardware.
Run code inference-only benchmarks quickly using vLLM
Chat with Lex! A RAG app, using HyDE with milvus DB for vector store, VLLM for LLM inference, and FastEmbed for Embeddings!
Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.
Genshin Impact Character Chat Models tuned by Lora on LLM
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
Carbon Limiting Auto Tuning for Kubernetes
Standardized spec and vendor-specific transforms for ChatML
Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.
To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."