⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
-
Updated
Jun 12, 2024 - Python
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
scalable and robust tree-based speculative decoding algorithm
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
minimal C implementation of speculative decoding based on llama2.c
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Dynasurge: Dynamic Tree Speculation for Prompt-Specific Decoding
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Verification of the effect of speculative decoding in Japanese.
Some experiments aimed at increasing LLM throughput and efficiency via Speculative Decoding.
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.
To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."