#

quantization

Here are 566 public repositories matching this topic...

openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

machine-learning computer-vision deep-learning pytorch semi-supervised-learning image-classification object-detection transfer-learning image-segmentation quantization action-recognition automl incremental-learning anomaly-detection hyper-parameter-optimization self-supervised-learning openvino neural-networks-compression datumaro

Updated May 20, 2024
Python

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated May 20, 2024
Python

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated May 20, 2024
Python

cpldcpu / BitNetMCU

Neural Networks with low bit weights on a CH32V003 RISC-V Microcontroller without multiplication

quantization tinyml ch32v003 ch32v003fun

Updated May 19, 2024
C

LLaMA-Factory

hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Updated May 19, 2024
Python

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated May 20, 2024
Python

sony / model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

machine-learning deep-neural-networks deep-learning neural-network tensorflow optimizer pytorch quantization qat network-quantization network-compression edge-ai ptq

Updated May 19, 2024
Python

maryamsoftdev / Quantization-in-Machine-Learning

A Tutorial Notebook to Quantization in Machine Learning

quantization quantization-aware-training quantization-efficient-network

Updated May 19, 2024
Jupyter Notebook

adithya-s-k / LLM-Alchemy-Chamber

a friendly neighborhood repository with diverse experiments and adventures in the world of LLMs

python inference quantization fine-tuning finetuning large-language-models llm finetuning-llms

Updated May 19, 2024
Jupyter Notebook

autohdw / QuBLAS

Quantized BLAS

template cpp blas quantization meta-programming cpp23

Updated May 19, 2024
C++

kurianbenoy / Indic-Subtitler

Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.

deep-learning nextjs transformers inference webapp speech-recognition openai speech-to-text quantization whisper asr fastapi faster-whisper whisperx vegam-whisper

Updated May 19, 2024
Jupyter Notebook

huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated May 18, 2024
Jupyter Notebook

Aisuko / notebooks

Implementation for the different ML tasks on Kaggle platform with GPUs.

natural-language-processing computer-vision neural-network accelerator transformers pytorch kaggle quantization visulization fine-tuning peft multimodal wandb renforcement-learning large-language-models

Updated May 18, 2024
Jupyter Notebook

ModelTC / llmc

This is the official implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and it is also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

benchmark deployment tool evaluation pruning quantization large-language-models llm

Updated May 18, 2024
Python

onnx2tf

PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

android docker machine-learning deep-learning tensorflow models keras transformer lstm quantization coreml onnx model-converter tensorflow-lite tflite tfjs yolov7 onnx-tensorflow

Updated May 18, 2024
Python

kooten111 / EasyEXL

A Python script designed to streamline the process of quantizing models to exllamav2 format

quantization llm exllama

Updated May 17, 2024
Python

thesven / GGUF-n-Go

your go-to tool for easily creating quantized versions of Hugging Face models in the GGUF format.

quantization llms gguf

Updated May 17, 2024
Python

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated May 17, 2024
Python

Xilinx / brevitas

Brevitas: neural network quantization in PyTorch

fpga deep-learning pytorch neural-networks xilinx quantization hardware-acceleration qat brevitas ptq

Updated May 17, 2024
Python

alexeybelkov / MedQ

Implementation of MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation

computer-vision medical-imaging quantization medical-image-segmentation efficient-neural-networks quantization-aware-training

Updated May 17, 2024

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."