This is a Java implementation of a GPT3/4 tokenizer, loosely ported from Tiktoken with the help of ChatGPT. ...that all 3.5-turbo models released after 0613 now have tokenization counts for messages ...
JTokkit aims to be a fast and efficient tokenizer designed for use in natural language processing tasks using the OpenAI models. It provides an easy-to-use interface for tokenizing input text, for ...
I have implemented a parallel tokenizer (in Java) for my Polymorph Data Language (PDL) which can use all the CPU cores of my machine (14 cores, 20 threads). The PDL scripts are divided into blocks ...