In what seems to be a statement of his consistent stance on open-sourcing AI models, Elon Musk has made a choice quite contrary to Altman’s. On March 17th, Musk announced the open-sourcing of Grok-1, making it the largest publicly available language model to date with a staggering 314 billion parameters, far surpassing OpenAI’s GPT-3.5 with 175 billion parameters.
Interestingly, the cover image for the announcement of Grok-1’s open-sourcing is generated by Midjourney, showcasing the idea of “AI helps AI.”
Musk, who has been critical of OpenAI’s lack of openness, naturally took a veiled jab on social media, saying, “We want to understand more about the open parts of OpenAI.”
Grok-1 follows the Apache 2.0 license, opening up model weights and architecture. This means users are free to use, modify, and distribute the software for both personal and commercial purposes. Such openness encourages broader research and application development. Since its release, the project has garnered 6.5k stars on GitHub, with its popularity steadily rising.
The project explicitly emphasizes in its documentation that due to Grok-1’s large scale (314B parameters), machines with sufficient GPU memory are needed to test the model using sample code. Users speculate this may require a machine with 628 GB of GPU memory.
Furthermore, the efficiency of the implementation of the MoE (Mixture-of-Experts) layer in this repository is not high. The reason for choosing this implementation is to avoid the need for custom kernels to validate the model’s correctness.
Popular large models that have been open-sourced include Meta’s Llama2 and France’s Mistral. Generally, releasing open-source models aids in large-scale testing and feedback from the community, thereby speeding up the iteration process of the model itself.
Grok-1 is a Mixture-of-Experts (MoE) large model developed by xAI, a company under Musk’s umbrella, over the past four months. A brief overview of the model’s development journey:
After announcing the establishment of xAI, researchers first trained a prototype language model with 330 billion parameters (Grok-0), which approached the capabilities of LLaMA2 (70B) on standard language model testing benchmarks but with fewer training resources.
Subsequently, researchers made significant improvements to the model’s inference and encoding capabilities, ultimately developing Grok-1, which was released in November 2023. It is a more powerful state-of-the-art (SOTA) language model, achieving a score of 63.2% on the HumanEval encoding task and 73% on MMLU, surpassing all other models in its computational class, including ChatGPT-3.5 and Inflection-1.
What Sets Grok-1 Apart from Other Large Models?
xAI particularly emphasizes that Grok-1 is their own large model trained from scratch, starting from October 2023, using a custom training stack on JAX and Rust, without fine-tuning for specific tasks like dialogue.
A unique and fundamental advantage of Grok-1 is its ability to understand the world in real-time through the X platform, enabling it to answer sharp questions rejected by most other AI systems. The training data used for the release version of Grok-1 comes from internet data up to the third quarter of 2023 and data provided by xAI’s AI trainers.
With 314 billion parameters in the Mixture-of-Experts model, with an active weight ratio of 25% for each token, this immense parameter count provides it with powerful language understanding and generation capabilities.
xAI previously stated that Grok-1 will serve as the engine behind Grok for natural language processing tasks, including question answering, information retrieval, creative writing, and coding assistance. In the future, understanding and retrieval of long contexts, as well as multimodal capabilities, are among the directions the model will explore.