MuseChat: AI-Aid Music Chatbot

Introducing MuseChat, an adaptive personal AI music chatbot that generates personalized music based on your current mood. By interacting with it naturally, similar to how you would with ChatGPT, you can allow it to learn more about your day and mood, ultimately producing personalized music as the final product.

What did I make this?

When the music generative AI SUNO was first released, I spent some time experimenting with it. Some songs generated by it are quite enjoyable.

You know, we will elicit different kinds of emotions when listening to different kinds of music, but will that be interesting if we can compose music of our own based on our emotions now? Like when you are angry, you can listen to some songs to soothe your urge for aggression. I see potential in AI to complete this task, and it would also be nice if we could have some way to demonstrate those emotions, so I've decided to show those emotions via different colored planets.

Structure

The chatbot has two main components: the frontend, which is where the user can interact with the chatbot and see their own emotional planet; the backend, which is the LLM agents that actually talk to the users and create music.

Without further ado, let's jump into the project!

Feel free to try it out here: https://github.com/Spark-Zhao666/MuseChat

Be sure to include your own APIs!

Supplies

Since this project is (currently) all digital-based, I would only provide a rough rundown of the libraries and LLMs I used:

Frontend:

Vue 3 (overall framework)

Backend:

FastAPI (communication with the frontend)
Langchain (implementation of the AI agents, I used DeepSeek's API in this case)
SUNO v4 (API, for music generation)

Frontend: Main Display

As said before, the front end of the chatbot is implemented via the famous Vue 3 framework. We have one file for the rendering of the 3D objects (planets) and another one for music display (the songs generated by SUNO). To manage all the dependencies, I used npm.

In the repository of the project, main.js is the main file, and App.vue shows the main display of the frontend.

Frontend: Planet Rendering

The main idea behind the implementation of this part is that I've attributed roughly 22 different nuanced emotions to 22 different planets. Each present will show a planet with a different design. For example, the planet representing the emotion of anger will be in dark red, while the planet representing happiness will have a lighter color.

To determine the precise display of planets, the frontend communicates and receives the agent's verdict of the user's current emotion based on the conversation, and it will show the respective planet.

For the display of the planets, they will have animations of tilting and rotating, and even halos that beat according to the content of the music.

Frontend: Music Control

When it is about time, the frontend will receive data from the backend. Once the agent sends the message play_music to it, it will create a music player and be ready to load the music URL generated by SUNO. In the meantime, it will analyze the song using the spectrum analyzer, yielding data that can be used to generate the halo of the planet.

Of course, the player will reset immediately after finishing playing the song (stop_music event). Currently, users can't control the music (start, stop, forward, backward) directly, only following the song generated by the system.

Frontend: Communications

Yes, there should be one module dedicated to communicating with the backend, getting essential information. In this case, we are using emitter.js to communicate with the backend to control different events. The events include creating planets, playing music, receiving spectrum data, ending music, etc.

Backend: Main Backbone

Moving on to the backend, for the implementation of the majority of the backend, I used FastAPI as it has good asynchronous capability and support for WebSocket.

In particular, server/chat.py is responsible for implementing API routing and WebSocket interaction with the frontend. This asynchronously supports conversation and music generation.

Backend: Agents

All agents are implemented via Chaingraph, and the model I used in this case is DeepSeek.

There are three agents to manage different tasks. uperviso_agent is the "manager" note, responsible for dynamically allocating tasks to different agents (consult_agent or generate_music_agent) according to the current state of conversation (like whether detected emotion, or whether generated music). Furthermore, consult_agent is used to detect emotion and music preference using the capability of Deepseek, and generate_music_agent is used to generate the musical contents used later. Once sufficient data is obtained, the backend will generate the planet's parameters based on the respective preset and push that data back to the frontend.

To manage the communication between the agents, we used StateGraph, and we used the Command object to transfer the states between the agents to decide the next action, allowing multi-agent cooperation.

Backend: Music Generation

Once the agents send out the appropriate actions, they will trigger the music generation function. In particular, a request will be sent to the SUNO API and setting the agents to wait for the final product. Specifically, the request contains the prompt for SUNO generated by the agents based on the emotion of the user, determined by the agents and the content of the conversation.

After finishing generating the music, the link to the final product and the spectrum data will be sent to the frontend using WebSocket.

Backend: Network Processing

Finally, FastAPI will process the information from the WebSocket such that the conversation and music generation tasks will run asynchronously, allowing high concurrency and low latency.

Wrap Up and Final Thoughts

Again, this is a proof-of-concept and something that I really wanted to do for a long time, and I finally found a chance to realize it.

For the future, I will add a display board system so that users can see everyone else's planets. Furthermore, I will try to generate the planet in a more nuanced way, make it truly personal.

This project is currently all software-based, but it has the potential to be transplanted to concepts like a smart speaker, allowing more interesting ways to experience our emotions.

This concludes the project. Let me know what you guys think!