#

vllm

Here are 56 public repositories matching this topic...

wtlow003 / modal-llm-serving

Examples of serving LLM on Modal.

modal openai model-serving openai-api llm vllm sglang lmdeploy

Updated Jun 13, 2024
Python

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Updated Jun 13, 2024
Python

OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

reinforcement-learning raylib transformers deepspeed large-language-models reinforcement-learning-from-human-feedback vllm

Updated Jun 12, 2024
Python

bricks-cloud / BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated Jun 12, 2024
Go

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

language-model llm runpod vllm

Updated Jun 12, 2024
Python

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention streaming-llm deepseek open-sora

Updated Jun 12, 2024

meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated Jun 13, 2024
Jupyter Notebook

France-Travail / happy_vllm

A REST API for vLLM, production ready

production transformers api-rest serving mlops llm llm-serving vllm

Updated Jun 11, 2024
Python

prometheus-eval / prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

python evaluation gpt4 llm llmops vllm litellm llm-as-a-judge llm-as-evaluator

Updated Jun 11, 2024
Python

jasonacox / TinyLLM

Setup and run a local LLM and Chatbot using consumer grade hardware.

chatbot artificial-intelligence openai rag large-language-models llm vllm retrieval-augmented-generation llama-cpp-python

Updated Jun 8, 2024
JavaScript

iNeil77 / vllm-code-harness

Run code inference-only benchmarks quickly using vLLM

transformers code-generation nlp-machine-learning vllm

Updated Jun 5, 2024
Python

yueying-teng / generate-language-image-instruction-following-data

mistral multimodal-learning llm langchain llava vllm llama-cpp-python instruction-following-data

Updated Jun 5, 2024
Python

SuperSecureHuman / rag_hyde_chat

Chat with Lex! A RAG app, using HyDE with milvus DB for vector store, VLLM for LLM inference, and FastEmbed for Embeddings!

rag milvus llm vllm llamaindex llm-inference fastembed

Updated Jun 4, 2024
Python

esmailza / Llama2-vLLM-LangChain-knowledge-graph

Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.

python distributed information-extraction knowledge-graph named-entity-recognition summarization langchain vllm llama2

Updated Jun 3, 2024
Python

svjack / Genshin-Impact-Character-Chat

Genshin Impact Character Chat Models tuned by Lora on LLM

game chatbot transformers roleplay webui gradio mistral roleplaying-game genshin-impact llm sharegpt llama-cpp vllm qwen llama-cpp-python qwen1-5

Updated Jun 3, 2024
Python

sasha0552 / vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM.

Updated Jun 3, 2024
Python

atfortes / Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

awesome prompt question-answering gpt papers language-models reasoning cot multimodal gpt-4 in-context-learning prompt-engineering chain-of-thought chatgpt mllm vllm

Updated Jun 2, 2024

Climatik-Project / Climatik-Project

Carbon Limiting Auto Tuning for Kubernetes

kubernetes sustainability kepler kubernetes-operator power-capping green-computing keda kserve llm vllm llm-inference

Updated May 30, 2024
Python

gotzmann / booster

Booster - open platform for serving LLM models

openai llama gpt llm chatgpt llamacpp llama-cpp vllm ggml exllama oobabooga ollama

Updated May 29, 2024
C++

julep-ai / standard-chatml

Standardized spec and vendor-specific transforms for ChatML

openai gemini-api anthropic vllm chatml litellm standard-chatml

Updated May 27, 2024
Python

Improve this page

Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."