Spotify DJ is an AI guide that knows your taste so well it can choose what music to play for you.

Role
  • Product design
  • Content programming design
  • Prototyping
Results
  • One of Time's best inventions of 2023
  • 3 patents filed
  • Second largest source of music discovery among Spotify playlists
Spotify AI DJ

The Challenge

Spotify’s AI DJ started as a hack week pitch with a simple premise: a host who plays music you love, and uses that trust to take you somewhere new.

The pitch came out of a problem I’d been stuck on in the car. Spotify was losing to radio there, and none of our fixes (voice control, a car-specific card mode, custom hardware) were moving the needle. Watching user research, I realized the issue wasn’t the app at all. Drivers don’t reach for their phones, they use the buttons on the steering wheel. On radio, those buttons change the kind of music playing. A single click shifts you from classic rock to country. On Spotify, the same buttons just skipped to the next track in whatever playlist you’d already picked. It was easier to use radio to cultivate a specific mood.


My Role

Design Lead. I owned the product vision and character strategy for the DJ from the hack week pitch through launch. I cast and created Xavier as a named character rather than a faceless assistant, defined the content programming for the algorithm, designed the three-mode narration framework that gave writers and engineers a shared language for the DJ’s intent, and designed the LLM-as-judge system that kept his output safe and defensible.

The team grew with the project. It started as just me, sketching and prototyping. Then a PM, a content designer, and Xavier himself joined to build a scrappy prototype in the Spotify app for a diary study. By launch, the team had grown to a PM, a content designer, a visual designer, and sixteen engineers.

The hardest decision along the way was the interaction model: how users would signal to Xavier that they wanted something different. It changed a core assumption about how Spotify worked, reshaping the user journey and the app’s information architecture, and required VP approval to ship.


Core Design Components

Designing the DJ meant designing four key parts of the experience: - The interaction design system - The character and persona of the DJ - The narration strategy for what the DJ would say and why - The content programming for what the DJ played


Interaction Design

Swipe to skip
Swipe to skip
Player controls
Segment Selector
Super skip
Super skip
DJ button
DJ button

The central design question was how users would signal to Xavier that they wanted something different. It sounds like a small problem. It wasn’t. The answer determined what the DJ actually was inside Spotify: an enhanced control, or a new kind of listening session. The decision touched the app’s core information architecture in ways that made it a VP-level decision.

We explored four directions, from mechanical to human. A Tinder-style swipe interaction that reframed a session as a series of accept/reject decisions. A below-the-fold segment selector that let users steer from the now playing screen. A “super skip” icon that extended Spotify’s existing skip control. And a dedicated DJ button that summoned Xavier back to talk.

The debate narrowed to two finalists and a clear philosophical split: Jump vs. Your DJ.

Jump was the functional answer. It leveraged the mental model users already had, skip, and extended Spotify’s existing design language. It was safer, more legible, and it scaled beyond the DJ to any future feature that needed a “bigger skip.” The cost was conceptual: users would have to learn that there were now two kinds of skip, and the DJ would feel like a souped-up control rather than your music friend.

Your DJ was the contextual answer. A bespoke button, unique to the DJ, that only appeared when Xavier was your current session and summoned him back to talk. It leveraged the mental model of the DJ himself. You don’t press a button to skip, you ask the host for something different. It was one concept to learn instead of two, but it only scaled with the DJ. If the DJ failed, the button failed with it.

Underneath the button debate was a bigger one about the app’s information architecture. Spotify is an app optimized for control: you browse to a playlist page, pick a track, and press play. The DJ inverted that: tap once on home, and music starts immediately. No playlist page, no track list, an AI would pick for you. That was a significant departure from how Spotify worked, and it was the part that made the decision a VP-level call.

Rather than advocate for one answer, my PM and I facilitated a discussion between the two VPs who needed to sign off. We laid out the tradeoffs honestly, articulating what each approach assumed, what it cost, and what it unlocked. We then let the argument happen in the open.

Super skip
The super skip design provided a functional control, that leveraged the mental model of skip, and expanded upon the visual language of Spotify, and could scale beyond the DJ. However, it was two concepts that users had to learn about: A DJ, and a new player control.
DJ button
The DJ button was playful, leverages the mental model of a DJ, and was a single concept to learn about: a DJ. However, it was a bespoke design asset that only scales with the DJ.

The decision landed on Your DJ. The bet was that the DJ was worth treating as a new kind of object in the app, not a variation on an existing one, and that the speed of “tap once and listen” was worth giving up the control of the playlist page for this one session type.

In hindsight, that was the decision that made everything else in the case study possible. Xavier’s character only works if users believe he’s a host, not a control. The narration framework only matters if people are listening to him talk. The content programming only has room to breathe if users aren’t evaluating a tracklist.


The Character: Xavier

In early iterations of the DJ, we used a traditional text-to-speech voice, but it felt bland and mechanical. Users needed to feel like they were in good hands, and that required a personality, not just a playlist.

The DJ needed credibility. Not just a pleasant sound, but a voice whose taste you’d actually trust. I cast Xavier Jernigan, Spotify’s head of cultural partnerships and the host of The Get Up, a morning show Spotify had produced. Xavier had already done the job, and we saw the DJ as a way to scale his expertise to every Spotify listener. With a real person anchoring the character, we could define who our DJ was in the product. We defined five character traits:

  • Light Hearted — knows how to grab attention without being over the top
  • Knowledgeable — a music aficionado, but never a know-it-all
  • Extroverted — happy to be your guide
  • Inclusive — explains things anyone can understand
  • Humble — knows he won’t get everything right, and is always eager to get better

The rejected traits were as important as the accepted ones. Early drafts included funny, amusing, entertaining, opinionated, and encyclopedic. We cut all of them. Funny and amusing set a bar the system couldn’t reliably clear: a DJ who tries to be funny and misses is worse than one who never tries. Opinionated cut against trust; we wanted Xavier to feel like a guide, not a critic. Encyclopedic cut against warmth; we wanted him to know a lot without explaining the obvious. What was left was a character who could be consistently warm across millions of narration moments, without overreaching.

We used a journey map to identify the moments where the DJ would have the most impact, mapping what we wanted people to think, do, and feel at each one. One of the most important moments was the unboxing moment, the first time someone used the DJ, and needed to understand what the DJ was as a product.

DJ Framework
The diagram used to illustrate a member's journey with the DJ, and highlight key moments

Users needed to believe Xavier knew them before they’d trust his picks, and that introduction set the tone for everything that followed. We used the following narration to help people understand what the DJ was, and how it was different from other assistants.

"Hey, what’s going on, it’s really great to be here with you. I’m Xavier, my friends call me X, and from this moment on, I’m going to be your own personal AI DJ on Spotify. Yeah, I’m an AI, but listen, I don’t set timers, I don’t switch on your lights. I’m all about music, your music. I know what you listen to. I see that {user.top.artist name} there. So I’m going to be here everyday, Playing those artists you got on rotation, going back into your history for songs you used to love, and I’m always on the look out for new stuff, Just to push your boundaries a little bit. I’m going to come back every few songs to change up the vibe. But if you’re ever not feeling it, there’s going to be a DJ button at the bottom of your screen, Tap that, and I’ll come back early to switch it up. Alright, enough talk, I mentioned {user.top.artist.name}, let’s get it going with that and some other artists in that zone."

The voice itself was its own craft problem. An audio engineer turned Xavier’s recorded performances into a TTS voice that could say anything the system generated, and our content designer directed the performance, ensuring that the pacing, warmth, and specific way Xavier speaks were captured by the DJ.


Narration as a Design System

It’s easy to imagine the DJ as a smarter playlist without narration. Xavier’s voice was the thing that made the DJ a DJ, and the narration system was how we made sure that voice was doing real work every time it showed up.

One of our core user insights was about taste. People often know the broad strokes of what they like: “I’m a pop fan,” “I listen to rock.” But those labels feel too big to describe how they actually listen. Pop fans don’t love all pop. Rock fans have a specific rock. People wanted to know the nuanced “tasting notes” that could help them better understand who they are as music fans. The DJ’s job was to name that specificity back to them, and in doing so, show them we understood their taste better than they could articulate it themselves. Once we’d earned that trust, we could use the same system to introduce them to artists they didn’t yet know.

The narration framework was how we did that at scale.

We gave writers and engineers three modes, each with a different purpose. Three isn’t a magic number. The real system had many more rules a content designer could apply based on a user’s relationship to the music and the particular narration being spoken, but three is the easiest way to grok the shape of the work:

Mode Purpose Example
Name the genre Orient the listener “Let’s get started with some Mellow Gold.”
Celebrate the genre Add context and warmth “Let’s rewind to the soft sounds of the 70s with some Mellow Gold.”
Educate Deepen the connection “Mellow Gold combines soft rock and folk rock: clean production, harmonies, melodic compositions.”

Beyond expanding members’ understanding of their taste, we also wanted to help deepen their connection to their favorite artists and help them discover new ones through the lens of that artist’s unique story. You can appreciate Ed Sheeran’s music, but he might be more interesting, or you might be more likely to give him a chance, if you understand that he got his start busking on the streets of London. To do this at scale took two things. First, we created a writers room that brought diverse perspectives, deep music knowledge, and cultural expertise to craft bespoke narration stories around cultural moments and new releases that LLMs didn’t know about. Second, an LLM extended the writers room’s reach to every artist in the catalog.

Using LLMs to speak about real people at scale meant safety couldn’t be optional. We built a two-layer LLM-as-judge system, with humans in the loop, to ensure both member and creator safety.

  • Wholesomeness Judge. Ensured the DJ’s narration stayed within Spotify’s content policy, filtering for harmful or offensive output.
  • Defensibility Judge. Ensured the DJ’s statements were anchored in verifiable data, preventing hallucinations from reaching users.

The defensibility layer was particularly important for Xavier’s character. A DJ who confidently makes things up stops feeling trustworthy immediately. The judge system let us maintain Xavier’s confident voice while keeping him honest.


Content Programming

The DJ shipped with over 65,000 content pools behind it. The three strategies below are a representative sample of what a user might hear, not the whole library.

  • Familiar Favorites. Your favorite music, with some new discovery.
  • Nostalgia. Music you enjoyed, or we predict you’d enjoyed, from the past.
  • Trending. New and trending music that’s relevant to you.

Most of these weren’t built from scratch. Spotify already had strong personalized content surfaces like Daily Mix and Today’s Top Hits. The DJ’s job was to turn them into programming: the same Daily Mix that used to sit on a shelf waiting for you to press play now became a set of tracks Xavier could introduce, contextualize, and sequence into a listening session.

DJ Content Programming Pools
The DJ's content programming used content pools from ~65,000 Spotify playlists. Each column represents one segment of the DJ, and moves from close to your taste to the edges of your taste as you read from left to right.

The sequencing was where the discovery thesis actually played out. The principle was simple: anchor in familiarity, move outward toward discovery, touch what was trending, then come back home. A single DJ set might pull five tracks from Daily Mix, mostly songs you loved, then transition to your top songs from 2017, and then land in an emerging pop artist that you haven’t heard of before, with a nugget of their story to pull you in.

Scaling that principle to 65,000 content pools was the real problem. If every new listening experience needed a team of engineers writing custom code, we’d never get past the first few. So we built a domain-specific language that let designers define a listening experience declaratively: which pools to draw from, which mutators to apply (filters, orderings, transforms), and which objectives to optimize for (artist separation, user relevance, discovery, duplicate avoidance). A set designer could write a definition, validate it against a cohort, inspect quality reports, iterate, and publish, all without touching engineering. What used to take weeks could take an afternoon. The first iteration of the DJ was a 250,000 line YAML file.

That tool, and the system around it, is covered by a patent I co-invented. The invention isn’t really “a better sequencer,” it’s a piece of design infrastructure: a language and workbench that let a small team ship listening experiences at a scale that wouldn’t have been possible otherwise, closer to a primitive form of vibe coding.


Results

We launched in February 2023.

  • Named one of Time’s best inventions of 2023
  • Three patents filed on the underlying systems and the design
  • Second largest source of music discovery among Spotify playlists

The DJ started as a hack week pitch about the car. It shipped as a new kind of listening experience that changed how Spotify thought about AI.