Apple has pulled back the curtain on Siri, and specifically the “Hey Siri” voice trigger feature that, from iPhone 6 on, allowed the assistant to recognize only its owner. In a new article published on Apple’s Machine Learning Journal, the Siri team details a paper it submitted to the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) which kicks off today.
Turns out, differentiating between users on a low-power core that can be always-listening even with the iPhone in standby is trickier than you might expect. The “Hey Siri” phrase itself came with both positives and negatives attached, too. On the one hand, the team points out, users were already familiar with the phrasing: indeed, some were already saying it even when manually triggering Siri using the home button.
However, it’s also a fairly short phrase. That gives Siri very little data from which it needs to pull out all the information required to recognize both definite intent and that it’s the correct user asking.
While Siri has an explicit training period when the assistant is first set up, those five sample phrases the user is asked to repeat do have some drawbacks. For example, there’s very little “environmental variability” the Siri team points out. In more typical use the real-world situations are seldom so ideal for a clean recognition.
To counter that, Siri also continuously familiarizes over time. That’s known as “implicit enrollment,” and allows the assistant to learn how the user sounds in real-world environments. The end result is a user profile of speaker vectors, kicked off with the five initial utterances from the training phase, but added to by up to 35 more based on these implicit enrollments. By saving the “Hey Siri” portion of the audio itself, meanwhile – on the device, rather than in the cloud, for privacy reasons – the iPhone can go back and re-analyze old triggers when new algorithms are released.
As with earlier machine learning insights from Apple’s journal, the detail itself might be a little too rich if you’re not into AI and audio recognition. The project was set up in mid-2017, as part of Apple’s new-found openness about some of the research projects going on inside the traditionally clandestine company. If that also encourages new engineers to apply to join the firm’s machine learning divisions, all the better.
As for what comes next, the Siri team has a few suggestions. On the one hand, there are obvious places where Siri – like all voice recognition systems – still struggles with speaker clarity. That includes both larger spaces, where reverberations become an issue, and noisy environments such as strong winds or while in the car.
However, another goal is to do away with the training requirement altogether. “Looking even further ahead,” the team writes, “we imagine a future without any explicit enrollment step in which users simply begin using the “Hey Siri” feature from an empty profile, which then grows and updates organically as more “Hey Siri” requests come in.”