Tether CEO Paolo Ardoino has announced the release of a new version of QVAC Fabric by the Tether AI team. According to ChainCatcher, this update integrates the BitNet LoRA framework, enabling the training and inference of large models with billions of parameters on consumer-grade GPUs and smartphones.
The updated QVAC Fabric LLM marks the first instance of BitNet LoRA fine-tuning and inference running cross-platform on AMD, Intel, Apple Metal, and mobile GPUs. On flagship devices, GPU inference speed has increased by 2 to 11 times compared to CPUs, while memory usage has been reduced by up to 90% compared to full-precision models. The Tether team has successfully fine-tuned models with up to 3.8 billion parameters on flagship smartphones such as Pixel 9, S25, and iPhone 16, and achieved fine-tuning of models with up to 13 billion parameters on the iPhone 16. The related code has been made open-source on GitHub.