Before WWDC24, Apple released an "efficient language model with an open source training and inference framework" called OpenELM on the Hugging Face platform. This is an open source language model, and its source code, pre-trained model weights, and training recipes are available in Apple's Github library.
According to reports, OpenELM uses a hierarchical scaling strategy to effectively distribute the parameters of each layer of the Transformer model, thereby improving accuracy. For example, with a parameter volume of about 1 billion, OpenELM has an accuracy improvement of 2.36% compared to OLMo, while the number of pre-trained tokens required is only 50% of the original.
Unlike the previous practice of only providing model weights and inference code and pre-training on private datasets, the version released by Apple includes a complete framework for training and evaluating language models on public datasets, including training logs, multiple checkpoints, and pre-training configurations.
In addition, it also released code to convert the model to the MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to enhance and consolidate the open research community and pave the way for future open research work. (IT Home)