Tony Aubé writes interestingly about how No UI is the New UI:
Out of all the possible forms of input, digital text is the most direct one. Text is constant, it doesn’t carry all the ambiguous information that other forms of communication do, such as voice or gestures. Furthermore, messaging makes for a better user experience than traditional apps because it feels natural and familiar. When messaging becomes the UI, you don’t need to deal with a constant stream of new interfaces all filled with different menus, buttons and labels. This explains the current rise in popularity of invisible and conversational apps, but the reason you should care about them goes beyond that.
He’s talking here about “invisible apps”: Magic and Operator and to some extent Google Now and Siri; apps that aren’t on a screen. Voice or messaging or text control. And he’s wholly right. Point and click has benefits — it’s a lot easier to find a thing you want to do, if you don’t know what it’s called — but it throws away all the nuance and skill of language and reduces us to cavemen jabbing a finger at a fire and grunting. We’ve spent thousands of years refining words as a way to do things; they are good at communicating intent1. On balance, they’re better than pictures, although obviously some sort of harmony of the two is better still. Ikea do a reasonable job of providing build instructions for Billy bookcases without using any words at all, but I don’t think I’d like to see their drawings of what “honour” is, or how to run a conference.
The problem is that, until very recently, and honestly pretty much still, a computer can’t understand the nuance of language. So “use language to control computers” meant “learn the computer’s language”, not “the computer learns yours”. Echo, Cortana, Siri, Google Now, Mycroft are all steps in a direction of improving that; Soli is a step in a different direction, but still a valuable one. But we’re still at the stage of “understand the computer’s language”, although the computer’s language has got better. I can happily ask Google Now “what’s this song?”, or “what am I listening to?”, but if I ask it “who sang this?” then my result is a search rather than music identification. Interactive fiction went from “OPEN DOOR” to being able to understand a bewildering variety of more complex statements, but you still have to speak in “textadventurese”: “push over the large jewelled idol” is fine, but “gently push it over” generally isn’t. And tellingly IF still tends to avoid conversations, replacing them with conversation menus or “tell Eric about (topic)”.
“User interface” doesn’t just mean “pixels on a screen”, though. “In a world where computer can see, listen, talk, understand and reply to you, what is the purpose of a user interface?”, asks Aubé. The computer seeing you, listening to you, talking to you, understanding you, and replying to you is the user interface.
In that list, currently:
- seeing you is hard and not very reliable (obvious example: Kinect)
- listening to you is either easy (if “listening” and “hearing” are the same thing) or very difficult (if “listening” implies active interest rather than just passively recording everything said around it)
- talking to you is easy, although (as with humans) working out what to say is not, and it’s still entirely obvious that a voice is a computer
- understanding you is laughably incomplete and is obviously the core of the problem, although explaining one’s ideas and being understood by people is also the core problem of civilisation and we haven’t cracked that one yet either
- replying to you requires listening to you, talking to you, and understanding you.
Replying by having listened, talked, and understood works fine if you’re asking “what’s this song?” But “Should I eat this chocolate bar?” is a harder question to answer. The main reason it’s hard is because of an important thing that isn’t even on that list: knowing you. Which is not the same thing as “knowing a huge and rather invasive list of things about your preferences”, and is also not something a computer is good at. In fact, if a computer were to actually know you then it wouldn’t collect the huge list of trivia about your preferences because it would know that you find it a little bit disquieting. If a friend of mine asks “should I eat this chocolate bar?”, what do I consider in my answer? Do I like that particular one myself? Do I know if they like it? Do lots of other people like it? Are they diabetic? Are they on a diet? Do they generally eat too much chocolate? Did they ask the question excitedly or resignedly? Have they had a bad day and need a pick-me-up? Do I care?
That list of questions I might ask myself before replying starts off with things computers are good at knowing — did the experts rate Fry’s Turkish Delight on MSN? And ends up with things we’re still a million, million miles away from being able to analyse. Does the computer care? What does it even mean to ask that question? But we can do the first half, so we do do it… and that leads inevitably to the disquieting database collection, the replacement of understanding with a weighted search over all knowledge. Like making a chess champion by just being able to analyse all possible games. Fun technical problem, certainly. Advancement in our understanding of chess? Not so much.
“When I was fifteen years old, I missed a period. I was terrified. Our family dog started treating me differently - supposedly, they can smell a pregnant woman. My mother was clueless. My boyfriend was worse than clueless. Anyway, my grandmother came to visit. And then she figured out the whole situation in, maybe, ten minutes, just by watching my face across the dinner table. I didn’t say more than ten words — ‘Pass the tortillas.’ I don’t know how my face conveyed that information, or what kind of internal wiring in my grandmother’s mind enabled her to accomplish this incredible feat. To condense fact from the vapor of nuance.”
That’s understanding, and thank you Neal Stephenson’s Snow Crash for the definition. Hell, we can’t do that, most of us, most of the time. Until we can… are apps controlled with words doomed to failure? I don’t know. I will say that point-and-grunt is not a very sophisticated way of communicating, but it may be all that technology can currently understand. Let’s hope Mycroft and Siri and Echo and Magic and Operator and Cortana and Google Now are the next step. Aulé’s right when he says this: “It will push us to leave our comfort zone and look at the bigger picture, bringing our focus on the design of the experience rather than the actual screen. And that is an exciting future for designers.” Exciting future for people generally, I think.
- and I leave completely aside here that French is not English is not Kiswahili, although this is indeed a problem for communication too ↩