CSS Speech polyfill demo

This is a simple demonstration of using the SpeechSynthesis API to speak some content aloud.

Tab/focus the statements below to speak them

This sounds normal.

This sounds loud.

This sounds quiet.

This sounds fast.

This sounds slow.

This sounds high.

This sounds low.

This says nothing.

This specifies a voice family.

Below is the (editable!) CSS that controls the speech

How it works

The first step of this is relatively simple: when one of the paragraphs above receives keyboard focus, we use the SpeechSynthesis API to speak the content of that paragraph aloud.

The second step is a little more involved. This page isn’t really a demo of the SpeechSynthesis API; there are plenty of those and that’s all known science already. It's really a demo of (and semi-polyfill-for) the CSS Speech module, which allows authors to control the way that HTML content is spoken aloud by assistive technology. So that requires a little more explanation.

How the important bit works

The heavy lifting here is done by Philip Walton's Polyfill.js, which handles CSS declarations which are unknown to the browser. For a bunch of reasons, unknown properties don't show up in the CSS Object Model and there’s nothing you can do about that, so polyfilling CSS is hard to do and involves parsing the actual CSS text. Polyfill.js does that, but it's an inherently limited technique, and Walton goes into a lot of explicit detail about that on the page linked above. So this should not be used in production, unless you understand the limitations. Note also that Polyfill.js needs to be explicitly included in the page; the polyfill itself does not bundle it, at this point.

Given that, the rest is a small matter of implementation and is done in css_speech_polyfill.js; for each of the new declarations required by the CSS Speech specification, there's a small JavaScript function which applies that change to a SpeechSynthesisUtterance. This is done in a very basic way; in particular, most of the new declarations can have a keyword value (so voice-pitch can be x-low or high or others, for example), but can also take a numeric value; this version of the polyfill doesn’t support these values, because it's a demo. That's the next stage of implementation, I suspect.

The way this happens in practice is that our page has a simple JS function, speakAloud(), defined in JavaScript below, which uses the speechSynthesis API to speak the content of relevant elements. It is called by a focusin event listener attached to the parent element; it's done that way rather than by attaching focus event listeners to each relevant element because it's tidier that way, and it's focusin rather than focus because the focus event doesn't bubble. But this is not critical. That code is something that anybody might write to make a page speak a chosen paragraph aloud; it has nothing to do with CSS Speech. This speech demo involves only one change: to that speakAloud function we have added one single line, after generating a SpeechSynthesisUtterance but before speaking it, which calls our polyfill to do its nefarious work on the created utterance.

The actual CSS for the page (well, the CSS for the voice stuff) is made directly editable by the user through judicious use of contenteditable, a thoroughly disreputable trick that I picked up from Terence Eden and Ana Tudor. Polyfill.js is, sadly, not clever enough to detect when the content of a CSS stylesheet changes, so we have a secret reset() method on the `CSSSpeechPolyfill` which discards all current rules and re-reads everything, which we quietly call when the CSS gets edited.