Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
-
Updated
May 20, 2024 - Python
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
Faster Whisper transcription with CTranslate2
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Neural Networks with low bit weights on a CH32V003 RISC-V Microcontroller without multiplication
Unify Efficient Fine-Tuning of 100+ LLMs
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
A Tutorial Notebook to Quantization in Machine Learning
a friendly neighborhood repository with diverse experiments and adventures in the world of LLMs
Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Implementation for the different ML tasks on Kaggle platform with GPUs.
This is the official implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and it is also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
A Python script designed to streamline the process of quantizing models to exllamav2 format
your go-to tool for easily creating quantized versions of Hugging Face models in the GGUF format.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Brevitas: neural network quantization in PyTorch
Implementation of MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."