Your kids may have grown up with smartphones, but that doesn’t mean they will expect everything to work on small screens in the future. Like me, you probably embarked on your digital experience by typing computer commands from a DOS prompt. I doubt you would be impressed if someone asked you to use that kind of interface today. The next big shift in interface design is the move toward more natural interactions.
The more natural the interface, the more likely we are to start forgetting about the algorithmic machinery hard at work in the background. Our bodies are becoming interfaces. Whether it be smart speakers or sensors, smart tattoos or augmented reality glasses, we are learning to sense and respond to data in a more intuitive way. AI systems will become better at not only understanding us but also deciphering our emotional states as well. Rana el Kaliouby from Affectiva has done some fascinating work in this area. In her words, ‘I saw a future where you could frown at your digital device, it could recognise your frustration, and use this input to create a better user experience”
As they collect more and more data from our interactions, tomorrow’s digital platforms will do a better job of understanding our intentions and responding rapidly to our unarticulated desires. We will talk rather than type, smile rather than swipe. Instead of automation creating more standardized experiences, your future customers will expect you to leverage machine learning to create more natural, personalized, human-level interactions. And yes, all of this will undoubtedly create entirely new privacy challenges for leaders and legislators - as will navigating the narrow path between an AI-driven convenient world and a surveillance state.
In the last year, we have made dramatic progress creating systems that not only understand our spoken or written requests, but can respond in a very natural way. Natural language processing uses deep neural networks to understand the complexity of human conversation and text, and has the capacity to generate truly astonishing results. A great example is Talk-to-Transfomer which is powered by a 1.5 billion parameter model called GPT-2, originally created by OpenAI. The tool was used to generate stories, poems and even business plans. The latest version of this model, GPT-3, is 100 times larger, contains 175 billion parameters and reportedly cost around $12 million to train. It is currently being used to generate lifelike human responses and insights in everything from games to legal research, customer service and education.
There is, however, a fine line between useful and unsettling. Consider Google Duplex, a tool which allows users to delegate carrying out tasks like scheduling appointments over the phone using their AI. It has even been used lately in the current pandemic to check the availability of toilet paper supplies. People called by the service can speak normally, just as they would to another person, without having to adapt their speech to be recognizable to a machine. In fact, it is not even obvious that you are talking to a machine. And that, for many, was the most disconcerting aspect of the demo of the product.
At the launch, Google’s director of Augmented Intelligence Research, Greg Corrado, explained that over the next decade he expected to see the development of artificial emotional intelligence that will allow products to actually have much more natural and fluid human interactions. The most shocking moment was when, in response to a question from a human on the other end of the phone, the algorithmic voice paused and said “hmmm” before continuing. According to a Google blog post, this was deliberate. The system was designed to sound more natural via the incorporation of typical speech disfluencies and fillers (e.g., hmmms and aaahs) to mimic what people often do when they are gathering their thoughts. But is this a good idea?
There is a difference between creating natural interfaces that are human-level and don’t require us to modify our behavior to use them, and tricking people into thinking they are engaging with a human. Think of it as the difference between human-level and human-like. Human-like interfaces often risk failing the Uncanny Valley test, a term coined in 1970 by a Japanese robotics professor named Masahiro Mori.
Mori argued that as the appearance of a robot becomes more human-like, our emotional response to the robot becomes increasingly positive and empathetic, but there is a point at which our response quickly turns to strong revulsion. However, as the robot starts to become harder to distinguish from a real human, our emotional response then starts to become positive again and our empathy levels approach those we would display toward another human.
While Google may very well succeed in creating an interface that is indistinguishable from a human, the real question is, should it? In the future, we may insist that AIs identify to the user that they are machines and not humans. Human-like interfaces that attempt to simulate, imitate, and ultimately deceive us into thinking they are humans may engender distrust, suspicion, and even fear.
Human-level interfaces that understand our natural speech, recognize our faces, respond to our emotional states, and even track our gestures will be useful. They will allow us to effortlessly communicate our intentions and accomplish our objectives without resorting to command interfaces and workflows. And in doing so, we may start to see algorithms as an extension of ourselves.
This article is excerpted from ‘The Algorithmic Leader: How to be smart when machines are smarter than you’ - available now on Amazon and Audible.