A transcript extension for HTML

Use cases and requirements

Over a number of years use cases and requirements have been extensively discussed by the HTML Working Group, in particular by its accessibility and media task forces. The following section is intended to provide a brief summary of the key information.

Use cases

Saving bandwidth - inline transcript

A user chooses to read a transcript included in the same page as a media resource, because their connection will not support downloading the full media resource. Requires:

Explicit relation: Users who cannot determine from the default page formatting that part of the page is the transcript need a mechanism that allows an assistive technology to do so
Delineation: Assistive technology needs a way to determine the extent of the transcript - i.e. where it begins and ends
Optional Consumption: For assistive technology users, the ability to skip over the transcript e.g. when navigating through the page is important
Inline transcript: This is assumed by this use case. For many users who are not relying on assistive technologies, seeing the transcript in the page is sufficient to enable this use case.

Using the transcript for accessibility

A user reads a linked transcript because the media resource in its original format is inaccessible to them due to a disability. Requires:

Explicit relation: The user needs a way to know that there is a transcript available. In certain cases such as where they have to make a choice based on what kind of resource will be more accessible to them, they need a way for their assistive technology to determine that there is a transcript.
Optional consumption: It is important that the user not be forced to read the transcript every time they navigate the page - just as users are not forced to re-watch a full video every time they scroll past it.

Interactive transcript as controller

A user agent renders a transcript which includes timing information alongside the media resource. Navigating to a particular point in the transcript scrubs through the media resource to that point. Requires:

Explicit relation: The user agent needs to determine that a particular resource is a transcript
Delineation: The user agent needs to identify exactly the content that is part of the transcript
Format agnostic: Publishers need to be able to provide transcripts in a format the user agent can use as a controller

Using a transcript to improve video/audio search

A search engine uses the explicit association of a transcript to collect textual information that can be reliably associated with an audio or video resource, to improve discoverability of the resource through text-based search. Requires:

Explicit relation: The search engine needs to determine that a particular resource is a transcript

Increasing multilingual discovery of resources

A publisher produces multiple translated transcripts of a media resource in order to improve discoverability of and access to the media resource. Requires:

Multiple transcripts: The publisher needs to link multiple transcripts to the same media resource.
Linked transcript: Many publishers do not want to include multiple transcripts in different languages inline in a page.

Use what is there

A publisher links a script which is available as PDF or Word document to provide a basic workable transcript for a media resource that would otherwise be inaccessible to some users. Requires:

Format-agnostic: The publisher needs to be able to associate whatever resource is available
Linked transcript: The publisher needs to be able to link resources which are not HTML. Content management workflows need to allow for resources kept and published separately.

Requirements

Delineation: It must be possible to determine which part of a larger resource such as an HTML page is the transcript included within that resource.
Explicit relation: It must be possible to unambiguously determine that a resource is a transcript for a given media resource
Format agnostic: It must be possible to use an arbitrary format for a transcript
Inline transcript: It must be possible to include the content transcript in the same page as a media resource
Linked transcript: It must be possible to link a media resource to an external transcript
Multiple transcripts: It must be possible to include more than one transcript for a media resource. It should be possible to differentiate transcripts for a given media resource to allow easy selection of the appropriate transcript for a given use case
Optional Consumption: It should be possible for a user to choose whether or not to read the transcript. Note that this is particularly important for users interacting with their system through speech output, or for whom large amounts of text make content substantially more difficult to use.

Linking transcripts

A transcript may include timing information, machine-readable or otherwise. The preferred solution includes the link to the transcript within the media element for which it is a transcript, and adds a transcript element as a container for a transcript. This can be included on the page in which the media object is embedded, which is a common use in practice, or can serve to separate multiple transcripts collected in a single page.

Extending `track` to allow `kind="transcript"`

This proposal adds transcript to the set of values defined for the kind attribute of the track element. This requires adding an entry to the table of values defined for the attribute in [[!HTML5]], as follows:

Keyword	State	Brief description
`transcript`	transcript	Tracks intended to permit use independent of media source. May be displayed by the user agent instead of, or supplementary to, the media resource.

An objection that has been raised to this method is that it requires a potential change to the current definition of the track element in HTML5, which says that it allows authors to specify explicit external timed text tracks for media elements unless a transcript with no timing information included can be considered a "timed text track". However, this definition also appears to conflict with the allowed metadata state for tracks, so will probably be changed anyway.

Example 1. Extending allowable `track` `kind`s

<video controls>
  <source src="video.rm">
  <!-- A link to a transcript within the same document -->
    <track kind="transcript" title="English transcript" href="#theText">
  <!-- A link to an external transcript in french uses hreflang -->
    <track kind="transcript" hreflang="fr"
       href="http://transcripts.example.fr/qqchose#laTexte"
       lang="fr" title="Transcription en français">
  <track kind="captions" src="#YouGetTheIdea,Right?" lang="ru">
</video>

<transcript id="theText">This is the english language
  transcript...
</transcript>

Acknowledgements

The editor would like to acknowledge the awe-filled respec, github, and BlueGriffon, as well as direct contributions to this document by:

Paul Cotton, Daniel Davis, Joan-Marie Diggs, Steve Faulkner, John Foliot, Edward O'Connor, Silvia Pfeiffer, Janina Sajka, Richard Schwerdtfeger, Cynthia Shelly, Léonie Watson, and W3C's HTML Media Task Force

The editor would like to apologise to anybody whose name was left out of this list, and invites corrections.

Appendix: Alternative approaches

Several other approaches have been considered to meeting the requirements. They are included here in outline, with some notes, for completeness. This appendix is expected to be removed before requesting advancement to Candidate Recommendation.

Alternative approach: create a new element

Add a new element to HTML representing a link to a transcript for the parent media resource. This requires choosing a name - in the following we have used relateTranscript as a placeholder name, to avoid conflicting with the proposed transcript container element - and defining a new element definition as follows:

The `relateTranscript` element

Categories:

None.

Contexts in which this element can be used:

As a child of a media element.

Content model:

Empty.

Content attributes:

Global attributes

src - URL of the transcript

type - the MIME type of the transcript

Tag omission in text/html:

No end tag

Allowed ARIA role attribute values:

This could be rendered as a liveregion controlled by the media resource, or a control for the media resource.

Allowed ARIA state and property attributes:

rendering, interactive states?

Any aria-* attributes applicable to the allowed roles.

DOM interface:

interface HTMLRelateTranscriptElement : HTMLElement {
           attribute DOMString src;
           attribute DOMString type;
           attribute DOMString media;
};

Example 2: Using a `relateTranscript` element

<video controls>
  <source src="video.rm">
  <!-- A link to a transcript within the same document -->
    <relateTranscript title="English transcript" href="#theText">
  <!-- A link to an external transcript in french uses hreflang -->
    <relateTranscript hreflang="fr"
       href="http://transcripts.example.fr/qqchose#laTexte"
       lang="fr" title="Transcription en francais">
  <track kind="captions" src="YouGetTheIdea?Right" lang="ru">
</video>

<transcript id="theText">This is the english language
  transcript...
</transcript>

Alternative approach: Use the `source` element

The source element represents a version of the media resource that can be presented as an alternative to others. This is what a transcript is.

This approach is not preferred as it will involve complex changes.

The element currently allows a MIME type attribute and a media query that can be used to determine when to render a given version. However, although transcripts are likely to have MIME types that are different from those used for audio or video resources, relying on this difference as a heuristic seems a weak approach to identifying a transcript.

Alternative approach: Use the `a` element with `rel` and `for` attributes

This meets the requirements, but requires defining a new value of rel, and changes to the for attribute.

Separating the link from the video code requires developers to include it in the visible content of the page, which leads many developers to try and hide it in the default presentation. A common result is that it is not available to people who need it, such as users with low vision, or is invisible but can be activated, confusing users.

Separating the link from the block of code can lead to it being lost when the source is copied to be used elsewhere.

Example 4: Using the `a` element with `rel` and `for` attribute

<p>
  <!-- A link to a transcript within the same document -->
    <a rel="transcript" for="theVideo" title="English transcript" href="#theText">Transcript below</a>,
  <!-- A link to an external transcript in french uses hreflang -->
    <a rel="transcript" for="theVideo" hreflang="fr"
       href="http://transcripts.example.fr/qqchose#laTexte"
       lang="fr" title="Transcription en francais">transcription aussi disponible en français</a>.
</p>

<video controls id="theVideo">
  <source src="video.rm">
  <track kind="captions" src="YouGetTheIdea?Right" lang="ru">
</video>

<transcript id="theText">This is the english language
  transcript...
</transcript>

Alternative approach: Use an attribute

An attribute could be defined, analogous to the longdesc attribute for images.

This approach is not preferred as it makes it very difficult to meet all of the multiple transcripts requirement.

Allowing a space-separated list of URLs does not provide any information to help choose which transcript to link to, or use.

Example 5: Using a `relateTranscript` attribute

<video controls relateTranscript="#theText http://transcripts.example.fr/qqchose#laTexte">
  <source src="video.rm">
  <track kind="captions" src="YouGetTheIdea?Right" lang="ru">
</video>

<transcript id="theText">This is the english language
  transcript...
</transcript>

Introduction

Use cases and requirements

Use cases

Requirements

Denoting a transcript

The `transcript` element

Linking transcripts

Extending `track` to allow `kind="transcript"`

Example 1. Extending allowable `track` `kind`s

Acknowledgements

Appendix: Alternative approaches

Alternative approach: create a new element

The `relateTranscript` element

Example 2: Using a `relateTranscript` element

Alternative approach: Use the `source` element

Alternative approach: Use the `a` element with `rel` and `for` attributes

Example 4: Using the `a` element with `rel` and `for` attribute

Alternative approach: Use an attribute

Example 5: Using a `relateTranscript` attribute

Introduction

Use cases and requirements

Use cases

Requirements

Denoting a transcript

The transcript element

Linking transcripts

Extending track to allow kind="transcript"

Example 1. Extending allowable track kinds

Acknowledgements

Appendix: Alternative approaches

Alternative approach: create a new element

The relateTranscript element

Example 2: Using a relateTranscript element

Alternative approach: Use the source element

Alternative approach: Use the a element with rel and for attributes

Example 4: Using the a element with rel and for attribute

Alternative approach: Use an attribute

Example 5: Using a relateTranscript attribute

The `transcript` element

Extending `track` to allow `kind="transcript"`

Example 1. Extending allowable `track` `kind`s

The `relateTranscript` element

Example 2: Using a `relateTranscript` element

Alternative approach: Use the `source` element

Alternative approach: Use the `a` element with `rel` and `for` attributes

Example 4: Using the `a` element with `rel` and `for` attribute

Example 5: Using a `relateTranscript` attribute