Google Is Making An AI That Turns Text Descriptions Into Music

Google has detailed an AI bot that can create music from text inputs. Called MusicLM, it can even generate entire songs with human vocals.

Google is working on an AI bot called MusicLM that can create music from simple text inputs. AI has been entering public consciousness over the past several years, but things exploded last year with the unveiling of Open AI’s ChatGPT. The chatbot can do a plethora of things, like writing stories, generating original jokes, explaining complex scientific topics, solving math problems, and even offering therapy, among other things.

Google has created an AI bot called MusicLM that the company claims can generate high-fidelity music from text descriptions or sound snippets, and create songs based on specific genres and styles the user mentions. It can also use a hummed or whistled melody. In a research paper, Google researchers said, “MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.” The researchers also say that the bot can adhere to the input criteria better than any existing music-producing AI, and even outperform them in terms of audio quality.


AI-Generated Music

An image of a circuit board in place of a brain to indicate artificial intelligence.

Google has uploaded a bunch of audio snippets that are said to have been created using MusicLM. The AI generated the 30-second audio clips with descriptions that include a genre or specific instruments. Some of the 5-minute-long clips were created entirely based on one or two-word descriptions, like ‘melodic techno,’ ‘swing,’ or ‘relaxing jazz.’ The results aren’t likely to make anyone forget Beethoven or Mozart, but they sound natural enough to make it almost impossible to believe that human composers did not write them.

Along with text prompts, MusicLM can also receive instructions using pictures. Users can set the experience levels of the AI musician to fine-tune the output quality. The bot can even create music inspired by places and be asked to create music for particular activities, like meditation or workouts. MusicLM can generate human vocals, but they sound distorted, with English lyrics that sound more like a word salad than an actual song. Ed Sheeran and Taylor Swift don’t have to worry about impending competition just yet.

MusicLM is far from the first modern AI music generator. Earlier attempts include Riffusion, Dance Diffusion, and OpenAI’s Jukebox, but none of them have produced results as scarily impressive as MusicLM. With more time and training material, the new AI could become even more realistic and ‘human-like,’ but that could also land Google in legal trouble with musicians from using their music to train the AI model. That’s exactly what happened recently when three prominent artists sued Stability AI, Midjourney, and DeviantArt for alleged copyright violations. Potential legal troubles notwithstanding, it is likely only a matter of time before MusicLM comes up with creations that are truly indistinguishable from music created by human composers.

