US enterprise software company Databricks has released Dolly 2.0, the next version of its Large Language Model (LLM), with ChatGPT-like features. It is the first open source LLM with a free learning instruction set that will help companies use AI technology for their own commercial projects – without having to pay for APIs or share data with third parties.
In recent months, many language models have been released, similar to OpenAI’s GPT, which by many definitions could be considered open. One of these is LLaMA by Meta, which, in turn, was inspired by Alpaca, Koala, Vicuna and Dolly 1.0
However, many of these “open” models were under the control of system developers—for example, the Stanford team’s AI project Alpaca, which was trained on GPT-3.5 instructions and built on top of LLaMA 7B. OpenAI’s terms of use include a rule that researchers cannot use products from systems that compete with the company.
The purpose of Databricks is to solve this problem. Dolly 2.0 is a large language model with 12 billion parameters, based on the open source Eleuther family of artificial intelligence models and tuned exclusively to a small instruction block (databricks-dolly-15k) created by Databricks staff. The license terms for this dataset allow it to be used, modified, and extended for any purpose, including academic or commercial programs.
The Databricks blog highlights that, like the original Dolly, version 2.0 is not state-of-the-art, but “demonstrates a remarkably efficient level of instruction execution given the training block size.” The report adds that the level of effort and cost required to create powerful artificial intelligence technologies is “significantly less than previously thought.”
You can download the Dolly 2.0 model from the Databricks Hugging Face page and instructions from GitHub. The company is also offering a webinar on April 25th to explain how organizations can use LLM.