Web tool to count LLM tokens (GPT, Claude, Llama, ...)
-
Updated
Jun 12, 2024 - TypeScript
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Web tool to count LLM tokens (GPT, Claude, Llama, ...)
Parser Building Toolkit for JavaScript
Chartokenizer is a Python package for basic character-level tokenization. It provides functionality to generate a character-to-index mapping for tokenizing strings at the character level. This can be useful in various natural language processing (NLP) tasks where text data needs to be preprocessed for analysis or modeling. 🚀
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Persian NLP Toolkit
⛄ Possibly the smallest Lua compiler ever
Implementation of LLM ✨from scratch✨
Tools and resources for the computational processing of Nheengatu (Modern Tupi)
DOM-aware tokenization for Hugging Face language models
An elegant Math Parser written in Lua, featuring support for adding custom operators and functions
Oxide is a hybrid database and streaming messaging system (think Kafka + MySQL); supporting data access via REST and SQL.
DadmaTools is a Persian NLP tools developed by Dadmatech Co.
A multilingual morphological analysis library.
OpenShield is a firewall designed for AI models.
(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)