llm-inference

Here are 431 public repositories matching this topic...

microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Jun 12, 2024
Jupyter Notebook

sophgo / LLM-TPU

Star

Run generative AI models in sophgo BM1684X

llama llm generative-ai chatglm llm-inference qwen bm1684x

Updated Jun 12, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jun 12, 2024
Python

openvinotoolkit / openvino

Star

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Jun 12, 2024
C++

expectedparrot / edsl

Star

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

python open-source openai surveys experiments domain-specific-language market-research social-science synthetic-data data-labeling llm anthropic llm-agent llm-inference llama2 llm-framework mixtral deepinfra

Updated Jun 12, 2024
Python

YeonwooSung / MLOps

Sponsor

Star

Miscellaneous codes and writings for MLOps

Updated Jun 12, 2024
Jupyter Notebook

felladrin / MiniSearch

Star

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

search nlp search-engine machine-learning information-retrieval typescript ai artificial-intelligence webapp question-answering searxng llm gpu-accelerated generative-ai llm-inference retrieval-augmented-generation web-llm ratchet-ml wllama

Updated Jun 12, 2024
TypeScript

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Jun 12, 2024
Python

intel / intel-extension-for-transformers

Star

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Jun 12, 2024
Python

Lightning-AI / litgpt

Star

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Jun 12, 2024
Python

intel / neural-speed

Star

An innovative library for efficient LLM inference via low-bit quantization

Updated Jun 12, 2024
C++

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 12, 2024
C++

Psycoy / MixEval

Star

The official evaluation suite and dynamic data release for MixEval.

benchmark evaluation benchmarking-suite evaluation-framework benchmarking-framework foundation-models large-language-models large-language-model llm-inference llm-evaluation large-multimodal-models llm-evaluation-framework benchmark-mixture mixeval

Updated Jun 12, 2024
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jun 12, 2024
Python

friendliai / friendli-client

Star

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 12, 2024
Python

AvaAvarai / EmbedQA

Star

Semantic embedding-based system for question answering from PDFs with visual analysis tools.

chatbot embeddings knowledge-base semantic-search bert-embeddings pdf-questions pdf-query llm-inference

Updated Jun 12, 2024
Python

DefTruth / Awesome-LLM-Inference

Star

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.