Media Session API: How Spotify Web Player Integrates With OS Media Controls

Media Session API: How Spotify Web Player Integrates With OS Media Controls

ยท

6 min read

Spotify offers a web app with all the features available on the main Spotify application. This web app functions like the full version of the app within your browser.

When using Spotify's web player on your laptop, you'll notice that you can control the playback even if the browser tab is not active or if the browser is minimized. You have the ability to seek forward/backward, play the next song, and pause directly from your operating system's media controls. Also, it seamlessly integrates with the media buttons on your keyboard.

This feature is not limited to desktops only; you can also use the Spotify web player on your mobile phone. Even when you lock your screen, you'll still be able to see the song you're playing on your lock screen and control it through the operating system's media controls.

Media Controls On Different Operating Systems

You might wonder how the browser can do these functions beyond its typical capabilities.

Introducing Media Session API

Media Session API allows web apps to integrate with the media controls of the operating system. So, the user will be aware of what's currently playing and have full control to pause, seek or play the next media without needing to open the specific page that launched it.

Think of it as a communication channel between the web app and the user's operating system. The web app simply says, "Hey there, I'm playing this media right now! Check out its title, the artist behind it, and the media cover. Plus, if the user wants to pause, play the next track, or work some other magic, just give me a heads-up."

Inspecting Media Session On Spotify Web

You can get to the media session API by using navigator.mediaSession in your web browser. For example, if you're playing a song on Spotify Web Player and you take a look at navigator.mediaSession, you'll see all the info about that song. It's got the title, artist, album, and even the artwork (that's the image for the song). Cool, right?

This is how your operating system knows what's being played and what image to show, and more.

Getting Started With Media Session API

To use the media session API, you should be playing some media first. The reason for this is that the media controls are typically updated based on the currently playing media. Therefore, if you're not playing any media, the media session API will not work for you.

You have flexibility in how you initiate media playback within your web app, and the browser will recognize it. To keep things simple, we'll use an <audio> tag for our media playback example.

<audio controls>
    <source src="nasserSpace.mp3" type="audio/mpeg">
    Your browser does not support the audio tag.
</audio>

<button>Play</button>

<script>
const button = document.querySelector("button");
button.addEventListener("click", () => {
        const audio = document.querySelector("audio");
        audio.play();

        // Audio is playing, let's set media session metadata
    });
})
</script>

Now We Can Set Our Metadata

Once your web app starts playing media, the browser detects this activity. It then accepts to include metadata that describes the currently playing content.

This metadata can be set by assigning values to navigator.mediaSession.metadata. This property accepts an instance of MediaMetadata. This object serves as a container for relevant details about the media, those details can be:

  • Title: The name of the media currently playing.

  • Artist: The artist's name who performed the media.

  • Album: The title of the album containing this media.

  • Artwork: A collection of images meant to be shown for the media, each in various sizes.

const button = document.querySelector("button");
button.addEventListener("click", () => {
    const audio = document.querySelector("audio");
    audio.play();

    navigator.mediaSession.metadata = new MediaMetadata({
        title: "Media Session API",
        artist: "NasserSpace",
        album: "Lesser-Known Browsers APIs",
        artwork: [
            {
                src: "https://raw.githubusercontent.com/nasserahmed009/nasserahmed009/main/icons/me.png",
                sizes: "96x96",
                type: "image/png",
            },
        ],
    });
})

Now the operating system can show you what you're looking for

Responding to User's Actions

So far, we've let the operating system know what's playing. But that's not all โ€“ we need to do more. We have to pay attention when the user interacts with their physical or on-screen media controls. For instance, if they hit the "next" button on their keyboard, we should switch to the next media. If they seek forward from the mobile notifications center, our web app's media should also seek accordingly. And even if they use their personal assistant, like Siri, to pause the media โ€“ we want that to work too. But how can we catch all these actions and make our web app respond?

Setting Up Actions Handlers

When the user does one of those actions, an event is fired indicating what action the user has performed. We can then listen to that action and react to it through setActionHandler method on the MediaSession instance.

The setActionHandler method accepts two parameters, "type" and "callback", the "type" parameter indicates which action the user has performed, and the "callback" parameter indicates how we're going to react to that action.

Different events fire for different user actions, and what's passed to the callback function changes depending on the event.

Play/Pause Actions

When the user presses "pause" on their keyboard or from the media controls, it triggers a "pause" event. We can catch this event and pause the audio that's currently playing.

navigator.mediaSession.setActionHandler("pause", () => {
    audio.pause();
    // Other code to perform on "pause"
});

The same thing goes for the "play" action too. If the user hits "play," it triggers a "play" event, and we can react accordingly to play the audio.

navigator.mediaSession.setActionHandler("play", () => {
    audio.play();
    // Other code to perform on "pause"
});

Previous Track/Next Track Actions

Those events are fired when the user triggers the next track or previous track actions.

navigator.mediaSession.setActionHandler("previoustrack", () => {
    // Logic to play the previous track
});

navigator.mediaSession.setActionHandler("nexttrack", () => {
    // Logic to play next track
});

Seek Forward/Seek Backward Actions

When the user tries to seek the playback forward or backward, an event is fired

navigator.mediaSession.setActionHandler("seekbackward", (evt) => {
    const skipTime = evt.seekOffset || 10; // Skip 10 secs
    audio.currentTime = audio.currentTime - skipTime;
});

navigator.mediaSession.setActionHandler("seekforward", (evt) => {
    const skipTime = evt.seekOffset || 10; // Skip 10 secs
    audio.currentTime = audio.currentTime + skipTime;
});

Feel free to get creative and come up with your ideas to make the user experience better. For instance, take a cue from Netflix โ€“ if you hit the seek button several times in a row, you could make it skip forward by larger chunks.

[video-to-gif output image]

GIF Source: uxdesign.cc/anatomy-of-the-netflix-forward-..

Seek To Action

When the user uses the timeline on the media controls to jump to a specific point, a seekto event is fired. And the desired time is given in the seekTime property that's sent to the callback function.

navigator.mediaSession.setActionHandler("seekto", (evt) => {
    const seekTime = evt.seekTime;
    audio.currentTime = seekTime;
});

Other Actions Available

But wait, there's more! Other events get triggered by different things the user does. These events include actions like hangup, nextslide, previousslide, skipad, togglecamera, and more.

We've covered the pretty common actions. The others we mentioned might not be used as often, but they work similarly to what we talked about earlier with the other actions.

Opening Doors to Innovation ๐Ÿ’ก

In addition to the basic use cases for the Media Session API, which already boosts the user experience. We can use the action handlers to get more insights about our media. Imagine you have a podcast app, you can listen to the seekforward or seekbackward actions to have more insights about which parts listeners skip โ€“ super useful for improving content. Or listen to the previoustrack or nexttrack actions that can reveal top episodes.

Get creative with this API! Tune in to user actions and make the most of it.

ย