Tab/focus the statements below to speak them
This sounds normal.
This sounds loud.
This sounds quiet.
This sounds fast.
This sounds slow.
This sounds high.
This sounds low.
This says nothing.
This specifies a voice family.
This is a simple demonstration of using the SpeechSynthesis API to speak some content aloud.
This sounds normal.
This sounds loud.
This sounds quiet.
This sounds fast.
This sounds slow.
This sounds high.
This sounds low.
This says nothing.
This specifies a voice family.
The first step of this is relatively simple: when one of the paragraphs above receives keyboard focus, we use the SpeechSynthesis API to speak the content of that paragraph aloud.
The second step is a little more involved. This page isn’t really a demo of the SpeechSynthesis API; there are plenty of those and that’s all known science already. It's really a demo of (and semi-polyfill-for) the CSS Speech module, which allows authors to control the way that HTML content is spoken aloud by assistive technology. So that requires a little more explanation.
The heavy lifting here is done by Philip Walton's Polyfill.js, which handles CSS declarations which are unknown to the browser. For a bunch of reasons, unknown properties don't show up in the CSS Object Model and there’s nothing you can do about that, so polyfilling CSS is hard to do and involves parsing the actual CSS text. Polyfill.js does that, but it's an inherently limited technique, and Walton goes into a lot of explicit detail about that on the page linked above. So this should not be used in production, unless you understand the limitations. Note also that Polyfill.js needs to be explicitly included in the page; the polyfill itself does not bundle it, at this point.
Given that, the rest is a
small matter
of implementation and is done in
css_speech_polyfill.js;
for each of the new declarations required by the CSS Speech
specification, there's a small JavaScript function which applies that change
to a SpeechSynthesisUtterance
. This is done in a very basic way;
in particular, most of the new declarations can have a keyword value (so
voice-pitch
can be x-low
or high
or
others, for example), but can also take a numeric value; this version of the
polyfill doesn’t support these values, because it's a demo. That's the next
stage of implementation, I suspect.
The way this happens in practice is that our page has a simple JS function,
speakAloud()
, defined in JavaScript below,
which uses the speechSynthesis API to speak the content of relevant elements.
It is called by a focusin
event listener attached to the
parent element; it's done that way rather than by attaching focus
event listeners to each relevant element because it's tidier that way,
and it's focusin
rather than focus
because
the focus
event doesn't bubble. But this is not critical. That
code is something that anybody might write to make a page speak a chosen
paragraph aloud; it has nothing to do with CSS Speech. This speech demo
involves only one change:
to that speakAloud
function we have added one single line,
after generating a SpeechSynthesisUtterance but before speaking it,
which calls our polyfill to do its nefarious work on the created utterance.
The actual CSS for the page (well, the CSS for the voice stuff) is made
directly editable by the user through judicious use of contenteditable
,
a thoroughly disreputable trick that I picked up from
Terence Eden
and Ana Tudor. Polyfill.js is, sadly, not clever enough to detect when the content
of a CSS stylesheet changes, so we have a secret reset()
method on
the `CSSSpeechPolyfill` which discards all current rules and re-reads everything,
which we quietly call when the CSS gets edited.