There were a lot of people willing to try: the service is unstable.
The startup Stability AI, which develops the Stable Diffusion family of neural network generators, has launched the Stable Audio service. This is Stable Diffusion for music: the service can generate music based on a text description.
There are several examples on the Stable Audio website: music generated by queries like “Warm soft hug, comfort, low synths, twinkle, wind and leaves, ambient, peace, relaxed, water.” The neural network also understands simple requests like “Drum solo”.
In all demo tracks, only music is generated; no words are heard in the compositions. According to the developers, the neural network was trained on more than 800 thousand excerpts of licensed music from the AudioSparks library.
As Stable Audio co-creator Zach Evans explains, from a technical point of view the technology is similar to Stable Diffusion. But there is an important difference: Stable Diffusion can be asked to work in the style of a specific artist, while Stable Audio cannot play in the style of The Beatles.
The current version of Stable Audio has about 1.2 billion parameters, about the same number as in the first version of Stable Diffusion. You can generate no more than 20 tracks per month (20 seconds each) for free; up to 500 tracks per month (90 seconds each) for a fee.
You can try Stable Audio here, the service is currently unstable – there are a lot of people interested.
This is interesting