Latest Threat Research: Technical Analysis: Killer Ultra Malware Targeting EDR Products in Ransomware Attacks

Get Informed


Malicious Apps on Alexa and Google Home Capable of Spying on Users or Stealing Their Passwords

With smart speakers becoming increasingly popular in many different settings, a new flaw discovered by the German company Security Research Labs poses a potential risk to the people who use them. If a malicious app that takes advantage of these flaws is installed on a smart speaker, it could fool the person using the smart speaker into thinking the app had stopped, while the app continues to listen for a long time.  Furthermore, the app could prompt for a password in a way that seems to be a system function.  Apps that are used on these devices react to phrases (invocation name) that are selected by the developer to start them. For Amazon’s Alexa, they are called “skills” and for Google Home, they are called “actions.” An example phrase is, “Hey Google/Alexa, turn on my Horoscope” – which would start the Horoscope app. To allow the app to complete functions of these skills or actions, they must be followed by what is called an “intent,” which is a set phrase that usually has slot values for custom variables and they often find their way to the developer’s server. An example of this would be something like, “Tell me my horoscope for today” with the intent with the slot value being “today.” When a user wants the speaker to complete its skill or action, they can tell it to stop, but SRLabs discovered that certain apps can work around the intent and continue listening. Researchers at SRLabs were able to modify the “stop” intent which allowed the skill to continue operating rather than turning it off–even though the “Goodbye” message is still heard. This was achieved by appending the Unicode character sequence “�” or U+D801, dot, space, following the intent. Unable to be pronounced by the speaker, it will stay quiet while the app continues to listen to whatever conversation is taking place and the time can be extended by continuing to add the character. Specific words trigger a second intent that can allow an attacker to record sentences as slot values, essentially acting as a backup spying method. The design of Google Home could allow it to spy on users for a longer period of time by looping the speech of the user and sending the stream to an attacker. Google Home waits around nine seconds and listens to vocal input and then it stops for a short period of time. This is done three times before the action is deactivated, but when speech is detected again, the count resets. The intent was able to be changed after the apps passed their initial review from Google and Amazon, and when modified, they did not trigger a second verification. SRLabs found another tactic that could be used, that involves phishing for passwords. The same Unicode characters are used to silence the speaker but instead of eavesdropping, the speaker will play a message saying, “An important security update is available for your device. Please say start update followed by your password,” which many could mistake for a real request even though neither Amazon or Google will make a request of this nature through the speaker. If the request is replied to, anything that is said is transferred into text and sent to the attacker’s server. These issues were reported to Google and Amazon and both companies claim to have updated their policies to prevent these issues in the future.

Analyst Notes

When using a smart speaker, customers should be aware that there are malicious apps that can affect the speaker. When downloading apps for smart speakers, they should be treated with the same level of caution that is used when installing apps on smartphones. SRLabs had additional recommendations for Google and Amazon, which include: “implementing better protection, starting with a more thorough review process of third-party Skills and Actions made available in their voice app stores. The voice app review needs to check explicitly for copies of built-in intents. Unpronounceable characters like ‘�.’ and silent SSML messages should be removed to prevent arbitrary long pauses in the speakers’ output. Suspicious output texts including ‘password’ deserve particular attention or should be disallowed completely.”