Implementation of a LLaDA-inspired Masked Diffusion Model for Text using PURE BYTE-LEVEL TOKENIZATION (cuz why not) and Mixed Precision Training for speed.
nlp machine-learning diffusion langauge-model llm small-language-models diffusion-lm small-llm masked-diffusion llada masked-diffusion-lm diffusion-llm diffusion-langauge-model masked-diffusion-llm
-
Updated
May 18, 2025 - Python