This document describes an extension to HTML which explicitly identifies a transcript linked to a media object such as audio or video.

This is the 3 July editors' draft. The editor plans to request consensus to publish this as a First Public Working Draft.

While Github pull requests are welcome for specific proposed changes, the HTML Accessibility Task Force uses a publicly available tracker installation to track issues on this specification. Please do not use the github issues associated with the repository, as issues there may not be tracked in a timely manner.

Introduction

[[HTML5]] allows the use of audio or video, and includes mechanisms for associating multiple timed tracks. But in the case where there is a transcript, which may not include timing information, there is no way to provide an explicit association between it and its associated media element.

Throughout this document the terms must and may must be interpreted in accordance with [[RFC2119]].

Use cases and requirements

Over a number of years use cases and requirements have been extensively discussed by the HTML Working Group, in particular by its accessibility and media task forces. The following section is intended to provide a brief summary of the key information.

Use cases

Saving bandwidth - inline transcript
A user chooses to read a transcript included in the same page as a media resource, because their connection will not support downloading the full media resource. Requires:
Explicit relation
Users who cannot determine from the default page formatting that part of the page is the transcript need a mechanism that allows an assistive technology to do so
Delineation
Assistive technology needs a way to determine the extent of the transcript - i.e. where it begins and ends
Optional Consumption
For assistive technology users, the ability to skip over the transcript e.g. when navigating through the page is important
Inline transcript
This is assumed by this use case. For many users who are not relying on assistive technologies, seeing the transcript in the page is sufficient to enable this use case.
Using the transcript for accessibility
A user reads a linked transcript because the media resource in its original format is inaccessible to them due to a disability. Requires:
Explicit relation
The user needs a way to know that there is a transcript available. In certain cases such as where they have to make a choice based on what kind of resource will be more accessible to them, they need a way for their assistive technology to determine that there is a transcript.
Optional consumption
It is important that the user not be forced to read the transcript every time they navigate the page - just as users are not forced to re-watch a full video every time they scroll past it.
Interactive transcript as controller
A user agent renders a transcript which includes timing information alongside the media resource. Navigating to a particular point in the transcript scrubs through the media resource to that point. Requires:
Explicit relation
The user agent needs to determine that a particular resource is a transcript
Delineation
The user agent needs to identify exactly the content that is part of the transcript
Format agnostic
Publishers need to be able to provide transcripts in a format the user agent can use as a controller
Using a transcript to improve video/audio search
A search engine uses the explicit association of a transcript to collect textual information that can be reliably associated with an audio or video resource, to improve discoverability of the resource through text-based search. Requires:
Explicit relation
The search engine needs to determine that a particular resource is a transcript
Increasing multilingual discovery of resources
A publisher produces multiple translated transcripts of a media resource in order to improve discoverability of and access to the media resource. Requires:
Multiple transcripts
The publisher needs to link multiple transcripts to the same media resource.
Linked transcript
Many publishers do not want to include multiple transcripts in different languages inline in a page.
Use what is there
A publisher links a script which is available as PDF or Word document to provide a basic workable transcript for a media resource that would otherwise be inaccessible to some users. Requires:
Format-agnostic
The publisher needs to be able to associate whatever resource is available
Linked transcript
The publisher needs to be able to link resources which are not HTML. Content management workflows need to allow for resources kept and published separately.

Requirements

Delineation
It must be possible to determine which part of a larger resource such as an HTML page is the transcript included within that resource.
Explicit relation
It must be possible to unambiguously determine that a resource is a transcript for a given media resource
Format agnostic
It must be possible to use an arbitrary format for a transcript
Inline transcript
It must be possible to include the content transcript in the same page as a media resource
Linked transcript
It must be possible to link a media resource to an external transcript
Multiple transcripts
It must be possible to include more than one transcript for a media resource. It should be possible to differentiate transcripts for a given media resource to allow easy selection of the appropriate transcript for a given use case
Optional Consumption
It should be possible for a user to choose whether or not to read the transcript. Note that this is particularly important for users interacting with their system through speech output, or for whom large amounts of text make content substantially more difficult to use.

Denoting a transcript

To meet the delineation requirement this specification defines a new transcript element as follows:

The transcript element

Categories:
Flow content.
Palpable content.
Contexts in which this element can be used:
Where flow content is expected.
Content model:
Flow content.
Content attributes:
Global attributes
Tag omission in text/html:
Neither tag is omissible
Allowed ARIA role attribute values:
Any role value.
Allowed ARIA state and property attributes:
Global aria-* attributes
Any aria-* attributes applicable to the allowed roles.
DOM interface:
interface HTMLTranscriptElement : HTMLElement {};

The transcript element can contain any content. It represents a transcript for a media resource.

Linking transcripts

A transcript may include timing information, machine-readable or otherwise. The preferred solution includes the link to the transcript within the media element for which it is a transcript, and adds a transcript element as a container for a transcript. This can be included on the page in which the media object is embedded, which is a common use in practice, or can serve to separate multiple transcripts collected in a single page.

Extending track to allow kind="transcript"

This proposal adds transcript to the set of values defined for the kind attribute of the track element. This requires adding an entry to the table of values defined for the attribute in [[!HTML5]], as follows:

Keyword State Brief description
transcript transcript Tracks intended to permit use independent of media source. May be displayed by the user agent instead of, or supplementary to, the media resource.

An objection that has been raised to this method is that it requires a potential change to the current definition of the track element in HTML5, which says that it allows authors to specify explicit external timed text tracks for media elements unless a transcript with no timing information included can be considered a "timed text track". However, this definition also appears to conflict with the allowed metadata state for tracks, so will probably be changed anyway.

Example 1. Extending allowable track kinds

<video controls>
  <source src="video.rm">
  <!-- A link to a transcript within the same document -->
    <track kind="transcript" title="English transcript" href="#theText">
  <!-- A link to an external transcript in french uses hreflang -->
    <track kind="transcript" hreflang="fr"
       href="http://transcripts.example.fr/qqchose#laTexte"
       lang="fr" title="Transcription en français">
  <track kind="captions" src="#YouGetTheIdea,Right?" lang="ru">
</video>

<transcript id="theText">This is the english language transcript... </transcript>

Acknowledgements

The editor would like to acknowledge the awe-filled respec, github, and BlueGriffon, as well as direct contributions to this document by:

Paul Cotton, Daniel Davis, Joan-Marie Diggs, Steve Faulkner, John Foliot, Edward O'Connor, Silvia Pfeiffer, Janina Sajka, Richard Schwerdtfeger, Cynthia Shelly, Léonie Watson, and W3C's HTML Media Task Force

The editor would like to apologise to anybody whose name was left out of this list, and invites corrections.

Appendix: Alternative approaches

Several other approaches have been considered to meeting the requirements. They are included here in outline, with some notes, for completeness. This appendix is expected to be removed before requesting advancement to Candidate Recommendation.

Alternative approach: create a new element

Add a new element to HTML representing a link to a transcript for the parent media resource. This requires choosing a name - in the following we have used relateTranscript as a placeholder name, to avoid conflicting with the proposed transcript container element - and defining a new element definition as follows:

The relateTranscript element

Categories:
None.
Contexts in which this element can be used:
As a child of a media element.
Content model:
Empty.
Content attributes:
Global attributes
src - URL of the transcript
type - the MIME type of the transcript
Tag omission in text/html:
No end tag
Allowed ARIA role attribute values:
This could be rendered as a liveregion controlled by the media resource, or a control for the media resource.
Allowed ARIA state and property attributes:
rendering, interactive states?
Any aria-* attributes applicable to the allowed roles.
DOM interface:
interface HTMLRelateTranscriptElement : HTMLElement {
           attribute DOMString src;
           attribute DOMString type;
           attribute DOMString media;
};

Example 2: Using a relateTranscript element

<video controls>
  <source src="video.rm">
  <!-- A link to a transcript within the same document -->
    <relateTranscript title="English transcript" href="#theText">
  <!-- A link to an external transcript in french uses hreflang -->
    <relateTranscript hreflang="fr"
       href="http://transcripts.example.fr/qqchose#laTexte"
       lang="fr" title="Transcription en francais">
  <track kind="captions" src="YouGetTheIdea?Right" lang="ru">
</video>

<transcript id="theText">This is the english language transcript... </transcript>

Alternative approach: Use the source element

The source element represents a version of the media resource that can be presented as an alternative to others. This is what a transcript is.

This approach is not preferred as it will involve complex changes.

The element currently allows a MIME type attribute and a media query that can be used to determine when to render a given version. However, although transcripts are likely to have MIME types that are different from those used for audio or video resources, relying on this difference as a heuristic seems a weak approach to identifying a transcript.

Alternative approach: Use the a element with rel and for attributes

This meets the requirements, but requires defining a new value of rel, and changes to the for attribute.

Separating the link from the video code requires developers to include it in the visible content of the page, which leads many developers to try and hide it in the default presentation. A common result is that it is not available to people who need it, such as users with low vision, or is invisible but can be activated, confusing users.

Separating the link from the block of code can lead to it being lost when the source is copied to be used elsewhere.

Example 4: Using the a element with rel and for attribute

<p>
<!-- A link to a transcript within the same document --> <a rel="transcript" for="theVideo" title="English transcript" href="#theText">Transcript below</a>, <!-- A link to an external transcript in french uses hreflang --> <a rel="transcript" for="theVideo" hreflang="fr" href="http://transcripts.example.fr/qqchose#laTexte" lang="fr" title="Transcription en francais">transcription aussi disponible en français</a>.
</p>

<video controls id="theVideo"> <source src="video.rm"> <track kind="captions" src="YouGetTheIdea?Right" lang="ru"> </video>

<transcript id="theText">This is the english language transcript... </transcript>

Alternative approach: Use an attribute

An attribute could be defined, analogous to the longdesc attribute for images.

This approach is not preferred as it makes it very difficult to meet all of the multiple transcripts requirement.

Allowing a space-separated list of URLs does not provide any information to help choose which transcript to link to, or use.

Example 5: Using a relateTranscript attribute

<video controls relateTranscript="#theText http://transcripts.example.fr/qqchose#laTexte">
  <source src="video.rm">
  <track kind="captions" src="YouGetTheIdea?Right" lang="ru">
</video>

<transcript id="theText">This is the english language transcript... </transcript>