HTML5 Video Caption Cue Settings in WebVTT

Published: 2014-10-18 18:30 -0400

TL;DR Check out my tool to better understand how cue settings position captions for HTML5 video.

Having video be a part of the Web with HTML5 <video> opens up a lot of new opportunities for creating rich video experiences. Being able to style video with CSS and control it with the JavaScript API makes it possible to do fun stuff and to create accessible players and a consistent experience across browsers. With better support in browsers for timed text tracks in the <track> element, I hope to see more captioned video.

An important consideration in creating really professional looking closed captions is placing them correctly. I don’t rely on captions, but I do increasingly turn them on to improve my viewing experience. I’ve come to appreciate some attributes of really well done captions. Accuracy is certainly important. The captions should match the words spoken. As someone who can hear, I see inaccurate captions all too often. Thoroughness is another factor. Are all the sounds important for the action represented in captions. Captions will also include a “music” caption, but other sounds, especially those off screen are often omitted. But accuracy and thoroughness aren’t the only factors to consider when evaluating caption quality.

Placement of captions can be equally important. The captions should not block other important content. They should not run off the edge of the screen. If two speakers are on screen you want the appropriate captions to be placed near each speaker. If a sound or voice is coming from off screen, the caption is best placed as close to the source as possible. These extra clues can help with understanding the content and action. These are the basics. There are other style guidelines for producing good captions. Producing good captions is something of an art form. More than two rows long is usually too much, and rows ought to be split at phrase breaks. Periods should be used to end sentences and are usually the end of a single cue. There’s judgment necessary to have pleasing phrasing.

While there are tools for doing this proper placement for television and burned in captions, I haven’t found a tool for this for Web video. While I haven’t yet have a tool to do this, in the following I’ll show you how to:

  • Use the JavaScript API to dynamically change cue text and settings.
  • Control placement of captions for your HTML5 video using cue settings.
  • Play around with different cue settings to better understand how they work.
  • Style captions with CSS.

Track and Cue JavaScript API

The <video> element has an API which allows you to get a list of all tracks for that video.

Let’s say we have the following video markup which is the only video on the page. This video is embedded far below, so you should be able to run these in the console of your developer tools right now.

<video poster="soybean-talk-clip.png" controls autoplay loop>
  <source src="soybean-talk-clip.mp4" type="video/mp4">
  <track label="Captions" kind="captions" srclang="en" src="soybean-talk-clip.vtt" id="soybean-talk-clip-captions" default>

Here we get the first video on the page:

var video = document.getElementsByTagName('video')[0];

You can then get all the tracks (in this case just one) with the following:

var tracks = video.textTracks; // returns a TextTrackList
var track = tracks[0]; // returns TextTrack

Alternately, if your track element has an id you can get it more directly:

var track = document.getElementById('soybean-talk-clip-captions').track;

Once you have the track you can see the kind, label, and language:

track.kind; // "captions"
track.label; // "Captions"
track.language; // "en"

You can also get all the cues as a TextTrackCueList:

var cues = track.cues; // TextTrackCueList

In our example we have just two cues. We can also get just the active cues (in this case only one so far):

var active_cues = track.activeCues; // TextTrackCueList

Now we can see the text of the current cue:

var text = active_cues[0].text;

Now the really interesting part is that we can change the text of the caption dynamically and it will immediately change:

track.activeCues[0].text = "This is a completely different caption text!!!!1";

Cue Settings

We can also then change the position of the cue using cue settings. The following will move the first active cue to the top of the video.

track.activeCues[0].line = 1;

The cue can also be aligned to the start of the line position:

track.activeCues[0].align = "start";

Now for one last trick we’ll add another cue with the arguments of start time and end time in seconds and the cue text:

var new_cue = new VTTCue(1,30, "This is the next of the new cue.");

We’ll set a position for our new cue before we place it in the track:

new_cue.line = 5;

Then we can add the cue to the track:


And now you should see your new cue for most of the duration of the video.

Playing with Cue Settings

The other settings you can play with including position and size. Position is the text position as a percentage of the width of the video. The size is the width of the cue as a percentage of the width of the video.

While I could go through all of the different cue settings, I found it easier to understand them after I built a demonstration of dynamically changing all the cue settings. There you can play around with all the settings together to see how they actually interact with each other.

At least as of the time of this writing there is some variability between how different browsers apply these settings.

Test WebVTT Cue Settings and Styling

Cue Settings in WebVTT

I’m honestly still a bit confused about all of the optional ways in which cue settings can be defined in WebVTT. The demonstration outputs the simplest and most straightforward representation of cue settings. You’d have to read the spec for optional ways to apply some cue settings in WebVTT.

Styling Cues

In browsers that support styling of cues (Chrome, Opera, Safari), the demonstration also allows you to apply styling to cues in a few different ways. This CSS code is included in the demo to show some simple examples of styling.

::cue(.red){ color: red; }
::cue(.blue){ color: blue; }
::cue(.green){ color: green; }
::cue(.yellow){ color: yellow; }
::cue(.background-red){ background-color: red; }
::cue(.background-blue){ background-color: blue; }
::cue(.background-green){ background-color: green; }
::cue(.background-yellow){ background-color: yellow; }

Then the following cue text can be added to show red text with a yellow background. The

<>This cue has red text with a yellow background.</c>

In the demo you can see which text styles are supported by which browsers for styling the ::cue pseudo-element. There’s a text box at the bottom that allows you to enter any arbitrary styles and see what effect they have.

Example Video

Test WebVTT Cue Settings and Styling