Kinect in focus: Xbox's app chief talks Smart Homes & Cortana

By Chris Davies/April 5, 2014 11:45 am EST

When you have a product like Kinect, so closely associated with gaming, how do you convince everybody else that they should be installing a motion-tracking camera in the home? Microsoft is looking to smart home technology and health, among other things, to do just that with Kinect for Windows v2, though a stealthy spread through Cortana and smartphones may be just as vital. We caught up with Microsoft's Michael Mott, general manager of Xbox applications and developer relations, to find out how virtual assistants and home automation could make Kinect-tech the next must-have.

Kinect is undoubtedly a vital part of the Xbox One proposition – you can't buy the console without it – but its Kinect for Windows counterpart project has always been less well known. The new Kinect v2 sensor bar will be coming to Windows desktops and notebooks this summer, Microsoft announced at Build 2014 this week, bringing the higher-resolution sensor, better skeletal tracking, and other improvements to Windows 8.

Explaining Kinect to gamers is relatively straightforward. Move your arms and you can swing a sword, for instance, or reach out and grab a virtual object. Trying to sell that in a broader sense may be trickier, and Mott doubts that there'll be any one single "killer app" that breaks the market.

"There's probably not one, but there will be three or four of those that we think are just going to delight people," he explained. "Communications is definitely one of them, enhanced communication and even entertainment communication. I think creativity is another one of them, and I definitely think fitness and wellness is yet another one. And then there's home automation."

Right now, the idea of the consumer smart home may be in its relative infancy, but Mott believes the rise of the Internet of Things – individual devices, whether they be smoke detectors, or thermostats, or lights, or locks, being connected online – and of products like Philips' hue wireless bulbs and Nest's Smart Learning Thermostat will give people a framework within which motion-sensing technology like Kinect can clearly demonstrate its potential.

"With more devices that are now aware and controllable through software – and we're doing it with the Internet of Things here at Microsoft – developers will be able to plug into that ecosystem," Mott said. "So, with the hue lights for example, there's no reason why you can't just be sitting down and say "dim the lights." Or say, "hue, match my mood" and hold up the color of your drink, so you want your mood to go Mai Tai that night. So that's more in the way of "Hey, I'm engaged but I'm just controlling the environment.""

The stage beyond that, however, may be even more compelling. Right now, interactions with Kinect are usually done explicitly: the user directly engages with the sensor-bar for some reason. However, an intelligent smart home needn't require that sort of one-on-one instruction.

"The other [possibility] is just being aware of what's going on," Mott theorized. "If there's two of you on the couch, what does it play? Does it bring up on screen something that's relevant to the both of you? And most of that, as different screens can light up because they're aware, you'd see different experiences. Another one is, I have Sonos at home... if I walk in the room, it would be cool to have my theme music come on, a little AC/DC to get going in the morning, perhaps!"

Microsoft itself is unlikely to be directly coding Sonos and hue integration for Kinect, though the company did namecheck its own Internet of Things work briefly during the Build opening keynote. Instead, it'll be left to developers to finesse the motion-sensing system into something with smart home appeal, which Mott has no doubt will take place.

"I think it's going to be fun," he said. "That's the nice thing: my creativity is not as expansive as I'd like it to be. But once you give developers the tools, and a common platform, they'll just go, and you'll see some magical things. So that's what we're excited about: our preview program, and 2,000 developers just unraveling and recasting something exciting with the technology, and I think home automation is a sweet spot for it."

Xbox One Review – the next-gen console to own?

For it to be worth developers' time, of course, Kinect for Windows v2 has to be in more homes. Mott concedes that the number of people who actually know about the sensor for desktops versus gaming is only a fraction of those familiar with it for Xbox 360 and Xbox One, though he's not sure bundling it with new PCs and tablets makes the most sense.

"I can imagine [verticals] would do that; sell their solution as from a healthcare perspective, delivering to a consumer and saying "hey, here's your tablet, and your Kinect, and now you're in our physical therapy program." I don't see us necessarily bundling that in all situations," he said. A good example of how that could work is Reflexion Health, which uses Kinect to guide physiotherapy patients through their exercises at home and feeds details of their ongoing performance back to their specialist so that the regimen can be finessed according to their abilities.

Kinect and a Windows tablet might not be peanuts, Reflexion Health CEO Spencer Hutchins conceded, but they're still a fraction of the cost of a course of physiotherapy. That course could well be shortened with more appropriate and compelling exercises, too, while the setup process could be integrated into existing out-patient care.

Microsoft's other big demo for Kinect for Windows v2 was Freak'n Genius, which created a version of its motion-controlled animation studio for the platform in about a week. Photos can be animated – CEO Kyle Kesterson quickly pieced together a scene with Jack Nicholson and Microsoft chief Satya Nadella, complete with jabbering mouths and goofy eyes – simply by "teaching" them through your own movements in front of the movement sensor.

Freak'n Genius sees clipart and theme packs as its monetization model, selling for instance holiday packs of props and scenes through in-app purchases, rather than the software upfront. Its already met with a positive response in schools, Kesterson told us, where kids have quickly picked up the basic interface and find creating educational presentations using Kinect animation more involving.

However, that still requires users to actually have a sensor themselves, something Mott thinks the continuing development of motion-tracking technology will address. "I can see both solutions-bundling but then the hardware, over time, coming down to a place where the cost makes sense, it can be competitive, and then the experiences are worth it."

"Instant sign-on because it's able to see who you are. Skype that actually will follow your family, so if you're talking and then your wife or your brother is talking, it can immediately move to whoever is talking over there. Kinect is an interface that either delivers something completely new and exciting, or it makes things that you already do today much easier, and more intuitive."

Perhaps one of the easiest ways to ease motion-sensing into consumers' worlds is to use devices they're already engaging with: smartphones and tablets. Mott certainly sees that as a clear gateway, particularly as Microsoft's "Natural User Interface" (NUI) program expands.

"I think there's two things," he explained. "We've been pioneering this Natural User Interface, and it's been delivered to you and to customers mostly through Xbox and through a device. But we know and we've seen what happened with devices that they get smaller, they get cheaper, they get embedded, and then they get spread across. The camera's a perfect example: you used to get a bad camera, now you get a 41-megapixel camera in your Nokia."

"So I could see the hardware itself and some of the power that the microphone array and the cameras and sensors coming down in cost," he predicted. "The other piece is the software, and the software just gets smarter about understanding what kind of quality of data you've got. And even if it's imperfect, if can now algorithmically understand that, "okay, well if you're moving your arm this far, and I miss something along the way, you're probably going to move your arm the rest of the way.""

That combination of smarter software allowing for more rudimentary sensors, and the hardware itself coming down in price, could eventually mean the same sort of motion tracking as from Kinect v2 but from webcams and other embedded chips in a broader number of devices.

"I think software will allow us to translate both what it captures and how it projects that into whatever application better," Mott said. "And that's why, when [EVP of Operating Systems] Terry Myerson talked about Kinect being the future, I think that was his shorthand for saying the natural user interface that we're building – through a combination of hardware that will come down in cost and move across devices, and software that will get smarter and be more ubiquitous across platforms – that's where I think you'll see these things move into phones, and tablets, and so forth."

Falling prices are – to the point where you can have distributed tracking around your home or office, rather than just when you're sat in front of Kinect – arguably something Microsoft can simply expect to happen. Its own work on software, though, looks likely to call on a now-familiar voice to get us talking to our smart environments.

Cortana, the Halo-named virtual assistant announced as part of Windows Phone 8.1 this week, will make her debut on smartphones, but there's no reason it'll stay that way. In fact, Mott points out, much of what's powering and shaping Cortana is the same as what's going on behind the scenes in Kinect.

Meet Cortana, Windows Phone 8.1's new virtual personal assistant

"I think that the nice thing is that the walls are coming down," he pointed out. "Which means that, when you build a conversational, voice-driven interface, it can work on all of your devices. So I think it's on the phone today because that's where people can most readily get the benefit, whereas I think when we go forward people will start to expect it."

While Windows Phones are "the device that's most relevant" for Cortana initially, Mott suggests, eventually it's likely to "seamlessly" spread across platforms.

"A lot of the technology we use for voice recognition and translating that into action is the same underlying technology that's behind Cortana. So I think you'll see us do that; I think you'll see the same thing that we do, everything we're learning on the TV with Kinect, is then translating into the natural user interface capabilities that we can put on the phone. For example, on Xbox, you sign in based on facial recognition. Your phone should just unlock when you look at it. Wouldn't that be nice? It seems obvious."

As Google found with Face Unlock on Android, and as other attempts to bring motion to computing have discovered, "obvious" doesn't necessarily mean "easy"; Kinect for Windows v2 still faces an uphill struggle to break into the mainstream. Nonetheless, Mott argues, there's already a good example of a device that initially left consumers cold but eventually worked its way into the general consciousness.

"It will ultimately be not unlike what happened with tablets," the executive predicts. "It wasn't one thing, it was "it can do all of these things!" Video, and games, and productivity, and music, and all those other common things. It's going to take us some work to be able to communicate to customers that you're going to want to have this device, because it's going to deliver all of this magical experience, but that's where the power of the developers comes in. They're the ones who are really going to showcase why it's so magical."