A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
-
Updated
Jun 12, 2024 - Jupyter Notebook
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
Run generative AI models in sophgo BM1684X
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
Miscellaneous codes and writings for MLOps
Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
A high-performance inference system for large language models, designed for production environments.
The official evaluation suite and dynamic data release for MixEval.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Friendli: the fastest serving engine for generative AI
Semantic embedding-based system for question answering from PDFs with visual analysis tools.
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Popular Large Language Models from scratch - 2024
AICI: Prompts as (Wasm) Programs
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."