Voice is the new Remote Control

The most popular entertainment device ever created is, somehow, largely untouched by technology.

I’m talking, of course, about TV.

The average American watches four hours of television per day, and yet our televisions and what we watch on them would not look unfamiliar to a time traveler from 1950.

Today, the screens are bigger, the hardware is thinner, the colors are more colorful, and the shows are (occasionally) time-shifted, but otherwise, many aspects of TV have remained the same! We’re still tuning into scripted drama, sitcoms, and adventure shows, with the biggest innovations coming in reality TV, 24/7 news, and ESPN.

The lessons of the internet are completely lost on TV—content isn’t personalized, user-generated, interactive, skimmable, social, or intertwined with commerce.

On the Internet, we have YouTube, Instagram, Twitter, Twitch, Snap, TikTok, and Clubhouse. Novel content formats seem to materialize each year. And yet, TV has reaped very few gains from the Internet era.

Why?

I believe that the most important missing feature from television today is interactivity, i.e. the ability to shift from a one-way broadcast medium to a two-way entertainment medium.

Without robust interactivity, TV can’t take advantage of any of the lessons we’ve learned on PCs, the Web, or mobile devices, which were all enabled by highly responsive, “revolutionary” user interfaces.

What revolutionary interface could enable TV?

It’s hard to make TV interactive because it’s hard to make a remote control that allows for more than basic functionality. But a new era is dawning—one in which viewers will be able to do much more with their TVs while sitting or standing 10 feet away.

This era will be defined by a more flexible and intuitive interface than the remote: the human voice.


Voice is everything a remote control is not: always-available, flexible, natural, and emotional.

There’s no need to dig it out of the couch cushions or t-a-p-i-n-o-n-e-l-e-t-t-e-r-a-t-a-t-i-m-e to get to your favorite show or movie. And two-year-olds can use it, alongside 92-year-olds.

Voice control enables a huge range of input on a device that has been previously hamstrung by the weak and piddling remote control.

Practically every TV show in existence could be replicated and expanded upon with robust forms of voice interaction. Imagine:

  • Interactive stories a la Netflix’s Bandersnatch where the audience controls the characters’ actions by yelling at them

  • Game shows turned into massively multiplayer variations of HQ Trivia with shoutable multiple choice answers

  • Talk shows taking live questions from their home audience

  • New versions of sports where fans call the plays or vote on real-time trades

  • Judge Judy with viewers voting on the outcome of the case - think of the courtroom version of American Idol!

These ideas only scratch the surface. Every TikTok format and YouTube channel genre should and would have its own interactive variation on a voice-controlled TV. Dance competitions, memes, vlogs, unboxings, tech reviews, and gaming content would all have a place on a voice-controlled TV with infinite channels. The "long tail" of internet video, souped-up with interactive features like chat, polls, quizzes, and shopping, would begin to disrupt the hegemony of fixed linear television content formats.

Beyond video broadcasts, televisions with voice-controlled operating systems will revolutionize the gaming industry. Casual experiences that have struggled to succeed with console gamers, like bingo, Boggle, and blackjack, will be accessible to players who have historically struggled with the complexity and fast-twitch motor requirements of AAA games today (myself included!)

One interesting avenue of development will be new content formats that are focused on smaller voice-controlled devices with video screens—the “Echo Show” form factor, for lack of a better term.

Where do these go that TVs don't?

I mentioned in an earlier essay that “big screens are immersive, but small screens can go everywhere”. The kitchen and the bathroom seem like obvious rooms where there are fewer TVs, but more Echo Shows. Will we see a surge in interactive cooking shows? Shower sing-a-longs? Live bedtime story readings? More illicit content? I have no idea, but the opportunities seem boundless.

Smaller devices also incorporate a key feature that TVs mostly lack—a camera. A camera is the necessary hardware for a robust user-generated content ecosystem, as well as for social networking. Once a majority of voice-controlled devices with screens have cameras, the world will see a Cambrian explosion of user-generated content targeted to the home.

This new generation of home content will also be more “multiplayer”, in that it will be meant for more than one person to enjoy at once. While TikTok and YouTube can achieve insane levels of personalization based on a single viewer’s habits, interactive TV will need to appeal to multiple viewers at once (including parents and kids simultaneously). I expect we will see a resurgence of the more generic “family-friendly” content genres that play on network TV today.


So how far are we from this future becoming a reality? I suspect closer than many might think, maybe only 1-3 years. Amazon’s Fire Cube and Echo Show devices already have most of the hardware necessary to make this a reality, and Google isn’t too far behind with its Chromecast and Nest Hub. Apple is certainly lagging, but it recently announced that HomePod can now control Apple TV, and I expect more news along these lines in the next 18 months.

The software isn’t quite as good as it needs to be, but it is very close, and if we’ve learned anything over the last five decades, it’s that software platforms go from “not quite good enough” to “billions of users” much faster than one might think. Much of the heavy lifting to realize this future will be done by developers and creators like those at our company, Volley. We are excited for the challenge, and please let us know if you'd like to join. :)

“Interactive TV”, like most good internet ideas, went through a brief hype cycle in the 1990s and then fizzled out. But to paraphrase Marc Andreessen, there are no bad ideas in tech, just ideas that are too early.

Voice control is the skeleton key that will unlock ubiquitous interactive television. One-third of the average human attention span is going to be reallocated to new forms of entertainment over the next decade or two, and the ramifications will be dramatic.

Next
Next

HomeOS Handicapping