01 Live now
Qwen3.5 0.8B, 4B, and 9B at 2.5-bit
Perplexity holds within 2% of original and the drop ships with kernels for vLLM.
0.8B / 4B / 9B2.5-bit1.9 GB on 4B
Read release note ↗ AI so cheap, you'll cry.
State-of-the-art inference. Embarrassingly affordable.
Perplexity holds within 2% of original and the drop ships with kernels for vLLM.
The next drop is already staged while validation finishes across realistic batch sizes, packaging, and serving checks.
Watch the blog ↗