Exciting news from MosaicML, an AI startup based in San Francisco! They have just released their groundbreaking language model, MPT-30B, which promises to revolutionize the field of artificial intelligence in enterprise applications. What’s even more impressive is that this model was trained at a fraction of the cost of its competitors, making it a more attractive option for enterprises looking to deploy natural language processing (NLP) models in applications like dialog systems, code completion, and text summarization.
Naveen Rao, the CEO and cofounder of MosaicML, explained that MPT-30B was trained at a cost of only $700,000, compared to the tens of millions of dollars required to train GPT-3. MosaicML used various techniques to optimize the model, such as Alibi and FlashAttention mechanisms that enable long context lengths and high utilization of GPU compute. They also had access to Nvidia H100 GPUs, which increased the throughput-per-GPU by over 2.4 times and resulted in a faster finish time.
“MPT-30B adds better capabilities for summarization and putting more data into the prompt and having [the model] reason over that data,” Rao said. “So if that’s a requirement for you, that you care less about the economics of serving, then maybe the 30B is a better fit [than our 7B model].”
MosaicML allows businesses to train models on their own data using the company’s model architectures and then deploy the models through its inference API. This means that enterprises can build custom models for cheaper, and startups have already used MosaicML’s models and tools to build natural language frontends and search systems.
MosaicML’s release of MPT-30B and its model deployment tools highlight the company’s goal of making advanced AI more accessible. With the availability of MPT-30B as an open-source model and MosaicML’s model tuning and deployment services, the startup is poised to challenge OpenAI for dominance in the market for large language model (LLM) technologies.
The company’s vision for the future of generative AI is to create a tool that can assist experts across various industries, accelerating their work without replacing them. “I think the future, at least for the next five years, is going to be about taking these techniques and making everyone who’s an expert already, even better,” Rao explained.