Security researchers have succeeded in embedding commands directly into music and recordings of text, raising the spectre of successful attacks against home-based digital assistants.
Nicholas Carlini and David Wagner from the University of California Berkeley devised [pdf] a way to take any existing audio waveform, and produce another one that is 99.9 percent similar.
The almost-similar waveform can be used to trick digital assistants that utilise neural networks into transcribing audio to letters, and to then assemble these into commands to control digital assistants.
Up to 50 characters per second of audio transcription rates are possible, the researchers said.
Testing against Mozilla's open source DeepSpeech voice recognition implementation, Carlini and Wagner achieved a 100 percent success rate without having to resort to large amounts of distortion, a hallmark of past attempts at creating audio attacks.
Music can be transcribed as arbitrary speech, and human beings cannot hear the targeted attacks play out.
Carlini and Wagner's research builds on past work that has shown that deep learning for computer image recognition is vulnerable to adversarial perturbation.
The researchers have now demonstrated that automatic speech recognition, too, is vulnerable to such attacks.
As a practical example, the pair embedded voice commands into a tune called the CommanderSong, and succeeded in tricking speech recognition systems to take actions such as calling phone numbers, turning on Global Positioning System (GPS) features, and to attempt to make a credit card payment plan.
"The song carrying the command could spread through radio, TV or even any media player installed in portable devices like smartphones, potentially impacting millions of users in long distance," the researchers wrote.
By demonstrating the feasibility of subliminal commands hidden in music and other audio recordings, the researchers hope to create awareness around the potential dangers this entails, and encourage the industry to better secure speech recognition.