Conversational Information Architectures

People are talking about talking to computers. Companies are investing in chat, messaging, and voice interfaces that let users interact using natural language. We’re being told to prepare for a future without screens because “advances in technology, especially in AI, are increasingly making traditional UI irrelevant.”

I’m skeptical. In the hype cycle for bots, we’re past the peak of inflated expectations and heading into the trough of disillusionment. But I bought an Amazon Echo anyway, because I’m interested in wrangling with conversational information architectures as we slog our way up the slope of enlightenment towards a more intelligent future.

Let There Be Light

We’ve had our Echo for a month, and I admit I do enjoy surfing Spotify and Prime Music from the couch with my eyes closed, after asking Alexa to set a timer in case I fall asleep. However, all too often, our conversation goes like this:

Peter (in the dark): “Alexa, turn on the lamp.”
Alexa (turns on the lamp): “Okay.”
Susan (yells from the kitchen): “What?”
Peter (yells back): “Nothing, I was just talking to Alexa.”
Alexa: “I’m not quite sure how to help you with that.”
Susan (yells from the kitchen): “Don’t forget to thank Alexa.”
Alexa: “Sorry, I don’t know the answer.”

Alexa’s speech to text translation is impressive (although she absolutely refuses to recognize my last name) but even when she understands my intent, using her can be as convoluted and socially awkward as writing about her (or “it”).

But Alexa’s biggest problem is she makes me feel dumb. She forces me to admit I can’t remember the name of one of my favorite songs by Vienna Teng. On Spotify, I can search for Vienna Teng, select Warm Strangers, and play Harbor without pause. That’s the magic of recognition over recall. And it’s hard to do with a voice interface.

Alexa will list Vienna’s albums and the songs on Warm Strangers if I ask her right, but now we’re back to the biggest problem. What can I ask? What’s the right syntax? Apparently, machine learning means humans learning to speak machine. Alexa can’t pass the Turing Test nevermind the Winograd Schema Challenge. She can’t understand context or meaning. She has no common sense. She can’t engage in a real conversation. So the cognitive burden falls on us. She makes us feel dumb.

It might help if Alexa had a discernable information architecture. If we could select a category such as music, news, or shopping, it may be easier to navigate each subset of use cases and commands. Alexa might invite us to create songlines and memory palaces in symphonic acts of placemaking that help us to understand and remember.

And, Alexa might take a page from Siri by embracing multimodal interaction. When I ask Siri about Vienna Teng’s albums, she answers visually with iTunes on my iPhone.

Imagine what Alexa might do with a large, flat-panel display on the living room wall. While audio-only has advantages (e.g., multi-room use, accessibility for the blind), Alexa can’t make us into badass users without screens. Ask isn’t enough. She must also mix audiovisual input and output to enable better browse and search. Alexa needs a cross-channel information architecture that embraces multimodal interaction.

After owning an Echo for a month, we’ve found Alexa hard to use and easily forgotten. It’s not enough for new technology to be cool. It must be better than the alternatives. For now, our phones, tablets, and laptops offer a better experience. But I believe Alexa will soon see the light with a little help from her friends in information architecture.

Conversational Ecosystems

We can’t talk about talking to computers without chatting about messaging. Now that teenagers have taught us to txt, businesses are reframing SMS as a natural language platform for everything. To avoid being confused by claims that “intelligent assistants” can help us do anything, it’s worth exploring the taxonomy of conversational apps.

First, there are personal concierge services like Operator and Magic staffed by humans. They may be useful to folks with too much money. Second, there are chatbots like Poncho, Quartz, and Tay. They may be fun for folks with too much time. Third, there are hybrids like M powered by people plus artificial intelligence. They may make Facebook richer by shifting expressions of user intent from search to messaging.

These tools are intriguing, but they won’t replace traditional UIs. Many will use rich communication services (RCS) to import graphical user interfaces into the chat window. It’s often easier to select options from a menu than from a sentence. Mostly, chat will occupy a specialized niche in the wider conversational ecosystem.

For example, CourtBot is a brilliant project from Code for America that lets folks who receive traffic citations check the status of court dates and pay fines via SMS. But the bot is not a lone hero. The real story is a cross-channel customer journey that involves police officers, a piece of paper, the bot, a call center, a website, and the courthouse. User satisfaction is achieved when the parts fit together to create a whole ecosystem.

Deus Ex Machina

We humans are great at making a mess. I should know. As an information architect I spend my time making sense of co-created chaos. And I’ve learned that organization isn’t enough. A good information architecture engages people in conversation. Consider a search for pokémon on Amazon. The results show examples while the facets ask questions: are you looking for Toys & Games or Books or Movies? Choose Toys and Amazon asks if you want Puppets or Puzzles and whether Size or Price matters. To find what we need or to get things done, a website is a better conversationalist than any bot.

Of course, a bigger story lurks behind the bots. Watson, DeepFace, self-driving cars, and the chatbots are all signs that deep learning is bringing an end to the AI winter. While I do believe cognifying is among the most disruptive forces shaping our future, I’m not holding my breath for the deux ex machina of superintelligence. Cognition and conversation are more complex, contextual, messy, and embodied than we may know.

So we must make sense of our own mess. That’s why AI needs IA. A weak grasp of natural language will only go so far. Alexa doesn’t understand meaning or context, so our “conversations” require organization, placemaking, and multimodal interaction. The textbots are just as limited. Messaging apps depend upon the structural design of cross-channel user experiences. Bots are but a part of the conversational ecosystem.

The bots are getting better, and some see Her and Ex Machina just around the corner. I’m skeptical. And happy. Instead of extinction, we get to wrangle with conversational information architectures in a colorful array of crossmodal, multisensory contexts. Of course, I could be wrong. The singularity might be near. What do you think, Alexa?

by Peter Morville