Using IM-Visor to stop untrusted IME apps from stealing sensitive keystrokes

Using IM-Visor to stop untrusted IME apps from stealing sensitive keystrokes Third-party IME (Input Method Editor) apps are often the preference means of interaction for Android users’ input. In this paper, we first discuss the insecurity of IME apps, including the Potentially Harmful Apps (PHAs) and malicious IME apps, which may leak users’ sensitive keystrokes. The current defense system, such as I-BOX, is vulnerable to the prefix substitution attack and the colluding attack due to the post-IME nature. We provide a deeper understanding that all the designs with the post-IME nature are subject to the prefix-substitution and colluding attacks. To remedy the above post-IME system’s flaws, we propose a new idea, pre-IME, which guarantees that “Is this touch event a sensitive keystroke?” analysis will always access user touch events prior to the execution of any IME app code. We design an innovative TrustZone-based framework named IM-Visor which has the pre-IME nature. Specifically, IM-Visor creates the isolation environment named STIE as soon as a user intends to type on a soft keyboard, then the STIE intercepts,Android event sub translates and analyzes the user’s touch input. If the input is sensitive, the translation of keystrokes will be delivered to user apps through a trusted path. Otherwise, IM-Visor replays non-sensitive keystroke touch events for IME apps or replays non-keystroke touch events for other apps. A prototype of IM-Visor has been implemented and tested with several most popular IMEs. The experimental results show that IM-Visor has small runtime overheads. Keywords: TrustZone, Android app security, User privacy Introduction appears when a user intends to type characters in a user Nowadays, people are experiencing a booming growth app (e.g., type a location name in a map searching app). of Android smartphone apps and enjoying their con- Besides the default IME app, there are many kinds of third venience. According to Google Play’s statistics(Google party apps in Android market that a user can download 2007), thenumberofavailable apps in theGooglePlay from. These third party apps can provide value added fea- Store was most recently placed at 3.3 million apps in tures to a user app, such as cloud-based auto correcting, September 2017, after surpassing 1 million apps in July word association and clipboard. 2013. Most popular apps in the real world can be cat- Although an IME app provides great convenience to egorized into six groups: tools, communication, social users, they can introduce serious security problems. An interaction, efficiency, anime and sports. For example, ES attacker can use this kind of apps to steal users’ sensitive app as a tool app can help users transfer their files in keystrokes. As shown in Fig. 2, a keystroke processing in smartphone to a PC desktop. Android works as follows: When a user types a charac- An Input Method Editor (IME) app is a user-installed ter (e.g., a “K”) in a soft keyboard, the touch screen driver app that provides a soft keyboard to receive user input will receive a coordinate(x,y), then the event subsystem in mobile devices. As shown in Fig. 1, a default IME app transfers it into a touch event. Then, the input dispatcher thread will send the event to the target IME app. Finally, *Correspondence: wangyazhe@iie.ac.cn IME app will translate the event into a character (i.e., a State Key Laboratory of Information Security, Institute of Information “K”), and sends it to the target user app. After sending, Engineering, Chinese Academy of Sciences, Beijing 100093, People’s Republic of China an IME app can still revisit the buffer of that user app. School of Cyber Security, University of Chinese Academy of Sciences, Beijing, Here, we can see that an IME app is always the first service People’s Republic of China to receive (sensitive or non-sensitive) keystrokes prior to Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Tian et al. Cybersecurity (2018) 1:5 Page 2 of 17 Fig. 1 IME apps in the real world. An IME app is capable to provide a soft keyboard for user input. Typically, there are many kinds of third party apps in Android market that a user can download with his own preference a user app. Hence, if a user is typing sensitive informa- flow of I-BOX(Chen et al. 2015), which is a well-known tion (e.g., password, bank number, etc.) in a user app, a post-IME defense. Specifically, it saves the process state malicious IME app can work as a key logger to record of an IME app periodically and analyzes the translated and translate sensitive keystrokes, then store them in local keystrokes from IME apps each time an input transaction file system or send them to a remote server. This is a happens. If sensitive ones are found, it will let the IME app typical man-in-middle attack. There are several ways to “forget” sensitive keystrokes by a process roll-back. The I- construct such malicious IME apps, and repackaging is Boxalwayscheckswhether arollbackisneeded after the a common way which has been widely used by attack- IME has already processed keystrokes. And the salient fea- ers. (Zhou and Jiang 2012) Besides malicious IME apps, ture of the post-IME nature is that sensitive keystrokes there are also threats posed by potentially harmful IME appear in the dynamically allocated memory of an IME apps (PHAs). Without users’ consent, they collect sensi- app at least once. tive keystrokes and send them to an ad network doing Although post-IME defenses can prevent the sensitive targeted advertising based on the keywords in user inputs. data leakage in most common cases, there are still three To get rid of the above attacks, researchers have recently security holes (discovered so far) in current defense sys- proposed post-IME defenses. Figure 3 shows the work tems, that is, prefix substitution attack, colluding attack and sandbox bypassing attack (newly discovered attack). Prefix Substitution Attack. Figure 4 is an example of prefix substitution attack to I-BOX. The policy engine in I-Box is a status machine to detect whether the output string of an IME app is sensitive. Assuming the current input is sensitive data, but IME app developers use obfus- cated code to replace the prefix of the typed string with a non-sensitive one, then the policy engine is fooled and the roll-back will not be triggered. So the sensitive data obtained by the IME will not be cleaned and can still be sent to a remote server. Colluding Attack. Figure 5 is an example of colluding attack to I-BOX. To launch a colluding attack, as a post- IME design won’t do anything until it gets some output from the IMEs, an IME app needs to send sensitive text to a colluding app before it commits any text to a user app. So it is really easy to launch the attack in the real world. Sandbox Bypassing Attack. The “revisit” threat is dis- covered by us and I-Box was not aware of it yet. It is a threat for both post-IME and pre-IME defenses. From Fig. 2 Keystrokes porcessing in Android. An IME app obtains the the view of the I-Box, it regards the user input process coordinate(x,y) from touch screen driver and translate it into a as a transaction, which begins when a user starts to enter character, then send it to a user app. It is always the first service to the input and ends when the input session ends. When receive (sensitive or non-sensitive) keystrokes prior to a user app a user is typing sensitive data by a third party IME, the Tian et al. Cybersecurity (2018) 1:5 Page 3 of 17 Fig. 3 The work flow of a post-IME defense namely I-BOX. It always checks if a rollback is needed after the IME has already processed keystrokes. And the salient feature of the post-IME nature is that sensitive keystrokes appear in the dynamically allocated memory of an IME app at least once current transaction will be marked as sensitive by I-Box. will always access the touch event prior to the execution During this sensitive transaction, I-Box believes that the of any IME app code. restriction of network and roll-back can prevent sensitive This work seeks to solve the above problem by design- keystroke leakage. However, the sensitive text exist not ing, implementing and evaluating the first pre-IME only in an IME app while also in the buffer of a user app. defense based on 3 key ideas. The defense should ensure The roll-back only cleans the sensitive text in the IME app that touch events are intercepted before arriving at the but remains the one in the user app. In light of the fact that system (Key idea 1). Sensitive touch events are never sent some functions like getTextBeforeCursor in BaseIn- to IME apps (Key idea 2). Insensitive touch events should putConnection can be used to revisit the buffer of a user be replayed (Key idea 3). app, an IME app can launch a sandbox bypassing attack by calling revisited APIs at the beginning of the next new Challenges. To leverage the above three key ideas, we transaction. If the user app has not flush the buffer yet, are facing three main challenges. First, in the existing the IME can obtain the sensitive text committed in the modern mobile devices, an IME app is the first service last transaction. As a result, the sandbox of I-Box has been to receive (sensitive or non-sensitive) keystrokes from bypassed. Figure 6 shows how does the sandbox bypassing Android event subsystem, and translates them to text. attack case work. It is worth noting that the bypass attack Distinct from a post-IME design which does a rollback is not universally true. In other words, only when the user after the IMEs translating keystrokes, in a pre-IME design, app does not flush the buffer, there exists such an attack. how can we intercept and isolate sensitive keystrokes Problem statement. How to fill the three security holes ahead of IME translation? This is called the “Isolation through providing the following security property: the ahead of IME translation issue” (Challenge 1).Second, analysis on whether a touch event is a sensitive keystroke after we succeeding in intercepting and isolating those Fig. 4 Prefix-substitution attack. I-Box uses a policy engine to search substring in the output of an IME app. Malicious IME apps can obfuscate sensitive string into non-sensitive string to fool I-Box and leak it out to a remote server Tian et al. Cybersecurity (2018) 1:5 Page 4 of 17 Fig. 5 Colluding attack. To lauch such attack, an IME app just needs to send sensitive text to a colluding app before it commits any text to a user app sensitive keystrokes, how can we build a trusted path for have a clear security advantage over post-IME user apps to access these sensitive keystrokes? We call it designs. This is a key new insight of this work. the “Trusted path issue” (Challenge 2). Finally, recalling We build a concrete pre-IME defense named IM-Visor which leverages TrustZone to isolate the reason why users got incentives to use IMEs in the first paragraph, an IME app does provide convenience and sensitive keystrokes before the IMEs could access extra benefits. In a pre-IME design, how can we retain the them. IM-Visor resolves three main challenges: the value added feature for user apps? We call it the “Benefits “Isolation ahead of translation issue”, the “Trusted retaining issue” (Challenge 3). path issue” and the “Benefits retaining issue”. To address Challenge 1, we leveraged Trustzone and By noticing that sensitive keystrokes can generally achieved interception ahead of IME translation. The iso- flow both way (i.e., from IME apps to user apps and lation mechanism includes detection of soft keyboards, from user apps to IME apps), we discover a new initialization of STIE (Secure Typed Isolation Environ- sandbox bypassing vulnerability of I-Box. ment, touch event processing and keystrokes translation, We perform a thorough evaluation of IM-Visor. We and sensitiveness analysis. To address Challenge 2, we test a set of popular IME apps and the related user built a trusted path for sensitive keystrokes to be trans- apps, no sensitive keystroke leakage caused by IME ferred to the user app through creating a new IPC between apps is found. The experimental results show that the commit-proxy and the user app. To address Challenge IM-Visor has small runtime overheads. 3, we proposed a keystroke replay mechanism. Our main contributions are summarized as follows. Background Android IME, Input Method Framework (IMF) and event We propose a new idea “pre-IME”, which guarantees subsystem that “Is this touch event a sensitive keystroke?” Android IMF arbitrates interaction between applications analysis will always access user touch events prior to and the current input method (InputMethodManager the execution of any IME app code. 2016). A user app can use the standard TextView or We provide a deeper understanding that all the its subclass to interact with an IME app. InputMethod- designs with the post-IME nature are subject to the ManagerService (IMMS) in the IMF is a global system prefix-substitution and colluding attacks. Addressing service that manages the interaction across the above the two attacks, designs with the pre-IME nature processes. When a user touches on the TextView of a Fig. 6 Sandbox bypassing attack. The pink and blue color represent two different state of an IME app. As shown in blue color, after the roll-back, an IME app can still access the user app’s data buffer for sensitive text by some revisited APIs and leak it out to a remote server at the beginning of next input transaction Tian et al. Cybersecurity (2018) 1:5 Page 5 of 17 user app, IMMS will start an IME app. What’s more, uploading whatever a user types on a soft keyboard. So some functions in IMMS such as showSoftInput, there is a risk of sensitive keystroke leakage through third hideCurrentInputLocked can control when a soft patry IMEs. A current defense with the post-IME nature keyboard will be shown up or hidden. If a user types intends to discover sensitive input by analyzing the out- on the soft keyboard, TouchInputMapper in the put of an IME app and cleans it by a roll-back. However, Android event subsystem is the first entity to handle user an IME app can fool the defense by committing a replaced touch events. After the process of TouchInputMapper, text (Prefix-substitution Attack ) or leaking out sensitive an input dispatch thread in WindowManagerService keystrokes with a colluding app before the analysis is trig- (WMS)(Windowmanager) is responsible to dispatch gered (Colluding Attack). In “Introduction”section,we keystrokes to the active IME app. Then the IME can trans- have pointed out that all the designs with the post-IME late keystrokes to text and commits them to a user app by nature are subject to the above two attacks. And a key BaseInputConnection (BIC)(InputConnection). BIC is the motivation of our work is that we intend to build a more connection between a user app and an IME app. BIC pro- secure defense to get rid of the above attacks. Besides, vides some functions such as getTextBeforeCursor, we discover a new data leakage path from a user app to getSelectedText for IME apps to revisit the data an IME app by some revisit APIs (Sandbox Bypassing buffer in a user app. The reason why these functions exist Attack). So the “revisit” is another threat to our security is that an IME app may need to change some character concerns. before finally committing or it just wants to verify the It is possible that a malicious user app can collude with committing. In this paper, we put hooks in some func- an IME app to steal sensitive keystrokes. However, we con- tions in the IMF and event subsystem so that the “Is this sider this out of the scope of this paper. Because a user touch event a sensitive keystroke?” analysis can be invoked app can get whatever a user types in a soft keyboard, it before the IMEs access keystrokes. is unnecessary to steal sensitive keystrokes through an hacked IME app. Besides, from an attacker’s point of view, TrustZone it is much more easier to attack a single IME app than Processor state isolation. As hardware-level security iso- attacking all kinds of user apps which often use an IME lation, TrustZone provides Secure Monitor Call (SMC) keyboard. If an IME app is hacked, all user apps are hacked instruction for the processor to enter secure world from since an IME app processes all of a user’s input in modern normal world. The SMC instruction is a privileged mobile devices. instruction which is invoked in normal world. Program in secure state can access resources across the system Assumptions including I/O, memory, etc. Normal program has a lower As third party IME apps may cause sensitive keystroke execution privilege. leakage, we consider all third party IMEs (i.e., malicious andPHAs)as untrusted. ThegoalofIM-Visoristopre- I/O device and memory isolation. A major feature of vent IME apps from accessing data when a user types TrustZone is that it can flexibly configure the secure sensitive keystrokes, so we assume that the user app which stateofI/Odevicesusingsoftware.Thisfunctioninvolves employs an IME app for keystroke translation is trusted. TrustZone Protection Controller (TZPC) and Trust- Although there are lots of attacks targeting at user apps Zone Address Space Controller (TZASC). TZASC allows (Zhou and Jiang 2012;Suarez-Tangil etal. 2013), such secure and non-secure area partition for the mobile device threats are not in the scope of this work. We assume that DRAM memory. In existing mobile devices, touch screen the Android System Server and the kernel are not on the and display controller are usually configured as non- target list of attackers. As IM-Visor is a security scheme secure. based on TrustZone, so we assume the device is equipped with TrustZone and the function of TrustZone has been Trustlets. An application in secure world is known as a correctly implemented on the device. Considering Trust- trustlet. It can access the normal world memory but not Zone is an isolation solution with hardware support, we vice-versa. Considering the TCB size of secure world, a assume the hardware of the device is trusted. Hardware trustlet is usually designed to provide some higher secure attacks which may prevent the normal operation of Trust- operation such as displaying trusted UI or encryption. Zone are out of the scope of this work. Although OS is not on the list of attack targets, consid- Threat model and assumptions ering the following facts, we still use TrustZone to imple- Threat model ment our defense. First, TrustZone is widely deployed. Data from Samsung shows that millions of modern As mentioned, in the Android IMF, due to extra benefits, devices are outfitted with TrustZone (Azab et al. 2014). We user apps got incentives to use an IME app to access a soft hypothesize that more and more devices will use the ARM keyboard. However, an IME app is capable of logging and Tian et al. Cybersecurity (2018) 1:5 Page 6 of 17 TrustZone in the future. Second, the Trusted Execution keystrokes and isolate them before an IME app could Enviroment (TEE) is already deployed, there seems less a access them. need to reinvent wheels. Comparing to adding system or One possible way is to leverage TrustZone to implement kernel code, it is really more convenient to put our critical a trusted IME app with a trusted GUI. When users intend code as a trustlet in secure world and only put some hooks to type sensitive data, let them switch to the trusted IME. in Android. Third, it ensures minimum kernel modifica- However, this approach brings two disadvantages. First, tion. In our design, only a TrustZone driver is needed to be it is a burden for users to constantly keep this switch in installed in kernel. No kernel instrumentation is needed. mind. Second, a friendly trusted GUI means a lot of extra Forth, no significant impact on system overheads by test- coding work, such as efficient graphics rendering. So we ing with most popular IMEs. Fifth, TrustZone does reduce have to look for a new approach. our attack surface. For example, using a gravity sensor In the light of the fact that keystrokes will be pre- to launch a side channel attack is possible when a user processed by the event subsystem before an IME app types on a soft keyboard. TrustZone can configure related could access them, we put some hooks in event sub- hardware as secure to thwart such attack. system and leverage TrustZone to achieve the pre-IME nature. Subsystem hooks make SMC calls and jump to Overview secure world. In secure world, IM-Visor provides the Figure 7 shows the system components of IM-Visor, which STIE in which the touch screen and display devices includes a Secure Typing Isolation Environment (STIE) in are only controlled by secure world. For touch input, secure world, a system service named commit-proxy,a we implement a separate touch driver in secure world. daemon thread named replay executor in the event sub- Hence, whenever a touch input interrupt arrives, IM- system and some hooks. The STIE includes two parts: Visor would be the first to access keystrokes prior to secure hardware drivers and a trustlet named pre-IME the execution of any IME app code. The pre-IME Guard guard. As mentioned in “Introduction” section, in order receives keystrokes, translates them and analyzes whether to create a defense with the pre-IME nature, there are the char string is sensitive. Concerning about the flex- three main challenges: “Isolation ahead of IME translation ibility and efficiency, the STIE will be created only issue”, “Trusted path issue”, “Benefits retaining issue”. Now when a user intends to type in a soft keyboard (see we give a high-level overview of how IM-Visor resolves “STIE initialization”section). Compared to the development of an trusted IME app, them. the STIE helps IM-Visor avoid the above two disadvan- Isolation ahead of IME Translation Issue. In existing mobile devices, an IME app is the first entity to receive tages. First, as the STIE can be initialized automatically user touch events, and then translates keystrokes to text. when a user intends to type in a soft keyboard, a user does To achieve a pre-IME design, we must recognize sensitive not have to keep the keyboard switch in mind. Second, Fig. 7 IM-Visor consists of the STIE in secure world, a new service named commit-proxy, a daemon thread named replay executor in event subsytem and some hooks Tian et al. Cybersecurity (2018) 1:5 Page 7 of 17 because the STIE reuses the UI of a soft keyboard and isolates touch input, no trusted GUI lib is needed. Trusted path issue. As mentioned in “Introduction” section, after the isolation of sensitive keystrokes, we must buildatrustedpathfromthepre-IMEGuard toauser app (Zhou et al. 2012; trustonic). Obviously, we can- not use untrusted IME apps to commit sensitive text as this violates our security principle. So we have to find another data path isolated from IME apps. In light of the fact that TextView of a user app uses a local binder named IInputContext.Stub to receive text, we put some hooks in the IMF and create a new connection between a user app and our newly added service named commit-proxy. In other words, we create a new inter- process communication (IPC) between a user app and the commit-proxy to commit sensitive text. Benefits retaining issue. As an IME app does provide convenience and extra benefits, in a pre-IME design, we must retain the value added feature for user apps. The keyideaofIM-Visoristoreplayakeystrokeassoonas the pre-IME Guard determines it as non-sensitive and let the IMEs work for non-sensitive keystrokes. To achieve Fig. 8 Workflow and data paths under the IM-Visor protection. For this, we design replay executor running in System Server sensitive keystrokes as shown in red color, a trusted path from the secure touch screen to a user app is created by the STIE and process for replay. Specifically, the Replay Executor gets commit-proxy. For non-sensitive keystrokes, the Replay Executor touch event coordinates from the pre-IME Guard and dispatches them to the targeted IME app encapsulates them into Android touch event format, then triggers event subsystem to dispatch events to IME apps. Another issue related to replay is that we must replay non-keystroke touch events for the other apps. keystroke would be preprocessed by event subsys- tem before any IME app could access it. Specifically, Design and implementation TouchInputMapper in event subsystem is the class Workflow of IM-Visor for touch event processing. InputMethodManagerService As a pre-IME design, IM-Visor always recognizes and iso- (IMMS) in the IMF is a global system service that man- lates sensitive keystrokes before the IMEs could access ages the interaction across IME apps and user apps. them. To achieve this, whenever a user intends to type Anytime a user app requests a soft keyboard, IMMS in a soft keyboard, the STIE will be initialized to inter- would ask an IME app to show a soft keyboard by calling cept touch events and analyze whether it is a sensitive showSoftInput. keystroke. From the perspective of how touch events (i.e., keystrokes or non-keystrokes) are handled, Fig. 8 shows STIE initialization the workflow of IM-Visor after the STIE has been initial- The primary technical challenge of the STIE initialization ized. The red data path indicates the trusted path from is guaranteeing that IM-Visor is always aware of when a touch screen to a user app. On the other hand, as shown user is typing in a soft keyboard prior to the execution in green color, when non-sensitive touch events (i.e., non- of any IME app code. If IM-Visor can create the STIE as sensitive keystrokes or non-keystrokes) are found, the soon as a user firstly puts his or her finger on a soft key- pre-IME Guard asks the Replay Executor to replay the cor- board in normal world, then the pre-IME Guard is able to responding touch event to the targeted apps (e.g., IME intercept user keystrokes from the start of input and the apps or other apps). pre-IME nature can be ensured. To address this challenge, the key idea is to check whether a soft keyboard has been Address challenge 1: isolation ahead of IME translation shown up each time a touch event arrives in event sub- At first, let’s recall some backgrounds about the IMF system. In modern mobile devices with a touch screen, we andevent subsystemin“Android IME, Input Method assume that a user intends to type text when he or she Framework (IMF) and event subsystem”section.A taps on touch screen after a soft keyboard has been shown Tian et al. Cybersecurity (2018) 1:5 Page 8 of 17 up. And the keyboard display information is maintained user is typing, the pre-IME Guard takes two steps. Step 1) in secure world. The display controller is reconfigured as secure by Trust- Figure 9 shows how we initialize the STIE. The first user Zone TZPC so that normal world cannot change it. This tap on the edit box of a user app will ask IMMS to start up is important because display controller provides infor- an IME app. This process in fact invokes two hooks: sync mation about the start region of framebuffer.Ifthe and showSoftInput. IM-Visor will ignore the touch but display controller is not controlled by secure world, nor- update keyboard display information in secure world. At mal world software can deceive the pre-IME Guard into this moment, the STIE has not been initialized yet. Then awrong framebuffer and translated in a wrong soft the user may taps on a soft keyboard. This behaviour of keyboard layout.Step2)After atouch eventhappened, course invokes sync again. At this moment, the STIE the pre-IME Guard reads framebuffer and checks cor- must be initialized, because tapping on an IME soft key- rectness of the layout. As a proof-of-concept prototype, board is obviously a keystroke. We reconfigure peripherals IM-Visor preloads the layout information of popular IME like display controller and touch screen as secure. Then apps and determines whether the layout is correct by com- the pre-IME Guard receives touch events directly through paring the hash of the current layout with the preloaded secure touch screen. standard one. As a future work, in step 2, instead of the “preload&check” way, we will obtain the current layout by Touch event processing and keystroke translation an efficient optical character recognition (OCR). In order to intercept user keystrokes in secure world, the Leaving framebuffer in normal world is not a secure touch screen is reconfigured to be only accessed by secure concern as it seems. Supposing an untrusted IME app world, and a separate touch screen driver is implemented intends to figure out which keystroke a user types, it also in secure world. As a result, anytime a touch interrupt needs the above two pieces of information. But sensi- arrives,thedriverwillbethe firsttoreceive usertouch tive touch coordinates only stay in secure world. So an coordinates. In order to figure out which keystroke a user IME app cannot succeed in finding sensitive keystrokes types, we need two pieces of information: touch coordi- without touch coordinates. nates and the current soft keyboard layout. The touch After identifying a keystroke, the pre-IME Guard will screen driver in secure world provides a secure way to translate it into a character. Now we give an example of obtain touch coordinates. Now we explain how the pre- keystroke translation. For Latin language, every keystroke can directly correspond to a character, but for non-Latin IME Guard gets the soft keyboard layout securely. The languages candidate words often need to be shown. Here, soft keyboard layout is a piece of display data in nor- mal world that an IME app puts in framebuffer. And we only discuss Latin language translation with a qwerty framebuffer is a region of memory which is allo- keyboard. Supposing the user types “a” in the soft key- cated by a Linux display driver. The display controller is board, then secure touch screen gets the touch point a peripheral to generate the necessary control signals for C(x, y). The preloaded keyboard layout helps the pre- data display. To obtain the soft keyboard layout on which a IME Guard determine whether this point falls in the geo Fig. 9 STIE initialization. Hooks in the IMF and event subsystem are invoked to notify the pre-IME Guard in secure world to initialize the STIE. In modern mobile devices with a touch screen, we assume that a user intends to type text when he or she taps on touch screen after a soft keyboard has been shown up Tian et al. Cybersecurity (2018) 1:5 Page 9 of 17 range of “a” key button, which is defined by the top-left startInput will be invoked. The pre-IME Guard cre- point A(x1, y1) and the bottom-right point B(x2, y2). If ates a token to IMMS, which contains a unique id to it falls in, the pre-IME Guard translate the keystroke as a identify the current user app. Then IMMS requests the character “a”. commit-proxy to bind the user app with two parameters (token, InputConnection). When the commit-proxy Sensitive keystroke analysis receives this bind request, it makes a SMC call to check In order to analyze whether keystrokes are sensitive, we whether the id in token is valid. If it is valid, the commit- accept the I-BOX’s policy engine, which enforces a spe- proxy will add the new connection. Otherwise the bind cific context-based policy and a specific prefix-matching request will be refused. If any sensitive string is found, policy. In the IMF, text fields in user apps have different the pre-IME Guard sends sensitive string to the commit- types, such as dates and passwords. IM-Visor can lever- proxy by a shared memory, and then the commit-proxy age these information to decide whether current input is will commit it with the above new IPC. sensitive or not. Specifically, the hook startInput in IMMS can provide information of text fields. If the cur- Address challenge 3: benefits retaining rent edit box works for passwords (or something sensitive To retain the extra benefits of IME apps (e.g., auto cor- like that), the pre-IME Guard will know it from the start of recting and word association), one possible way is to a soft keyboard display and treat all following keystrokes implement the value added feature in IM-Visor. However, as sensitive. This is called the “Context-based Policy”. User this way makes all IME apps useless and means a lot of activities such as logging in is a typical case that IM-Visor extra coding works for IM-Visor (e.g., cloud-based word can enforce such policy. For general user input stream, association and local trusted GUI lib). We look for a more after translating keystrokes to string, IM-Visor leverages efficient and elegant approach. prefix-matching to search all possible substrings when a Our key idea here is to design a replay mechanism new char is typed (Aho and Corasick 1975). The sensi- and let IME apps work for non-sensitive keystrokes. We tive data set used for searching is defined by users. As designed a daemon thread named replay executor running a user could consider large numbers of data instances in System Server process to replay touch events. If some as sensitive, IM-Visor uses a trie-like structure to main- touch events need to be replayed, the pre-IME Guard puts tain it in secure world. This is called the “Prefix-matching them in a shared memory and then the Replay Executor Policy”. reads and replays them. We explain the detail as follows. In Android system, every activity or service maintains a Address challenge 2: trusted path thread loop to receive touch events or other input events After isolating and translating sensitive keystrokes, we by an input channel. If the Replay Executor intends to should commit them to the targeted user app. Obviously, replay a touch event directly, it needs to maintain the as a pre-IME defense, we cannot use untrusted IME apps input channels and selects which activity or service will to commit sensitive string. So we have to find another receive the touch event. The selection is based on not data path isolated from IME apps. Our main idea here is only touch coordinates but also window layouts and the to add an independent system service that can commit current window focus. sensitive string from secure world to a user app for the The key point is that the Replay Executor only receives trusted path. touch events from the pre-IME Guard and then triggers Normally, the edit box of a user app uses a local binder event subsystem to complete the “maintain&selection”. As named IInputContext.Stub to receive char strings. mentioned in “Android IME, Input Method Framework And the client of IInputContext.Stub is initialized (IMF) and event subsystem”section,aninput dispatch in the IMMS at the start of input. In light of the above thread in WindowManagerService is responsible for touch fact, we add some code in the IMF to make the IMMS events dispatching. In most cases, the input dispatch create an extra binder client for our newly added service thread sleeps on an input event queue. When a touch commit-proxy. And then the commit-proxy is capable of event needs to be dispatched, it wakes up and dequeues committing sensitive string to the user app. Because the an event, then selects an activity or service for dispatch- new IPC and new service are independent of an IME app, ing. If we can handle this input event queue and wake sensitive string in this data path cannot be accessed by any up the thread when a replay is needed, we are able to let IMEs. the event subsystem do the “maintain&selection” work. This is exactly how the Replay Executor works. The con- Create a new IPC. Figure 10 shows how the commit- text of WindowManagerService provides the event queue of input dispatch thread. The Replay Executor encapsu- proxy creates a new IPC with a user app. When a user lates non-sensitive keystrokes as required Android touch taps ontheeditbox,theuserappasksIMMStocall- event format and enqueues them. Then it simply wakes back functions in the current active IME app. Hooks in Tian et al. Cybersecurity (2018) 1:5 Page 10 of 17 Fig. 10 How a user app binds the commit-proxy. Note that a token is necessary to detect the legitimacy of a user app for the binding up the dispatch thread to do the rest work for our replay. in Android IMF. Listing 1 shows all the hooks we put in As a related issue, non-keystroke touch events also can be Android. replayed by this way. Secure touch screen and display controller reconfig- uration. When a user intends to type in an IME soft Minor challenge 4: buffer revisiting threat keyboard, some reconfiguration should be done for the As discussed in “Threat model” section, we discover a new STIE initialization. We reconfigure Interrupt Security data leakage path from a user app to an IME app by some Register(ICDISR), Priority Mask Register (ICCPMR) and revisit APIs. To prevent this threat, we hook all revisit Enable Set Register (ICDISER) to make the touch input APIs (see Listing 1) and analyze the revisited char string as a secure interrupt and mask all non-secure intterupt. again to detect and block sensitive text when a third-party In CPU Interface Control Register (ICCICR), FIQEn, IME app revisits the buffer of a user app. EnableS are set to 1 to enable FIQ interrupt. FIQ bit in Secure Configuration Register (SCR) is also set to 1 Implementation to ensure FIQ interrupt routing to TrustZone monitor We have implemented IM-Visor on Samsung 4412 devel- mode. Besides, touch screen and display controller are opment board equipped with ARM TrustZone. Android settobesecureperipherals withTZPC.Asaproof- and kernel version on the board are 4.0 and 3.0.2 of-concept prototype, we only implement single-touch respectively. in the separate touch driver and leave multi-touch as a future work. Pre-IME guard and services. Specifically, the pre-IME Guard runs as a trustlet in secure world and Android runs Evaluation in normal world. The commit-proxy is a system service Security evaluation in Android System Server process. The Replay Executor Malicious IME apps and PHAs will upload user sensi- is a daemon thread running in System Server process. tive data to remote servers, which cause harm to users. Both of them are passively waiting to receive data from To evaluate the defense effectiveness of IM-Visor against the pre-IME Guard. When a user types in the STIE, the malicious IME apps, we construct malicious IME apps by pre-IME Guard receives keystrokes from touch screen and repackaging some popular ones to make them send sensi- translates them into a char string. Corresponding to its tive keystrokes to a remote server. We want to see whether sensitiveness, we return it through green path or red path. they can still leak out sensitive keystrokes during the IM- (see Fig. 8). Visor protection. To evaluate the defense effectiveness of IM-Visor against PHAs, we analyze the commonly used Hooks in the IMF and event subsystem. To min- IME apps’ network packets. As we can see, without IM- imize system overhead, we have to hook as less as Visor, these IME apps may send user sensitive keystrokes possible. Specifically, only three classes in Android outside, and with IM-Visor, user sensitive keystrokes won’t has been hooked: InputMethodManagerService, be found in their network packets. BaseInputConnection and TouchInputMapper. In order to jump into secure world, a TrustZone driver Defense against malicious IME Apps is installed in Linux kernel. Hooks make SMC calls We use repackaging to design malicious IME apps and through the newly installed driver. When these hooks are the targets of repackaging are the popular third party IME invoked, the pre-IME Guard can intervene the data-flows apps. As many IME apps have their own different defenses Tian et al. Cybersecurity (2018) 1:5 Page 11 of 17 against repackaging, the difficulty of repackaging on dif- in the smali file. For example, as most IME apps use the ferent IME apps is different. Some IME apps just design Android IMF which provides various classes and APIs, simple defense which are not difficult to be cracked, and we can hook the API commitText to intercept all the some construct complex solutions which will cost much user input. The added code is inserted into the location of time to be repackaged. We repackaged three IME apps, commitText and the functionality of the added code is one is Sogou IME which is the most popular third party to upload each user keystroke to a remote server by socket IME, and the other two are QQ IME and TouchPal IME connections. Finally, the modified code is recompiled, which are also very popular third party IME apps. We can signed and installed in the terminal. After installation, the get the smali code for each app after decompiling the APK repackaged IME can be set as the default IME by mod- file and then add some code in several critical locations ifying the system settings. Now we open an app which Tian et al. Cybersecurity (2018) 1:5 Page 12 of 17 needs a user to type user name and password, and we find intercept the network packets of Sogou IME app. Figure 12 that the entered user name and password are sent to the shows the intercepted packets. This indicates that sen- remote server when the user is typing. sitive keystrokes have been leaked out by the IME app. To verify the defense effectiveness of IM-Visor, we With IM-Visor, these potentially harmful IME apps can no repeat the above operations in the development board longer access the user input when the input is sensitive, as with IM-Visor for several times. Figure 11 shows the we don’t see Wireshark capturing any packets containing defense effectiveness of IM-Visor against malicious IME sensitive keystrokes. apps. It is clearly shown from the above that IM-Visor can prevent malicious third party IME apps from stealing Correctness evaluation sensitive data. IM-Visor is a pre-IME design, it intervenes in the com- munication between user apps and IME apps. As an IME Defense against PHAs app cannot trigger input by itself, it must be employed by Most commercial-off-the-shelf (COTS) IME apps actually a user app which has edit boxes. So in this section, we collect the user input to improve user experience by ana- need test if user apps and IME apps can normally run with lyzing the user input habits or to do targeted advertising. IM-Visor deployed. To verify this, we use Wireshark to intercept the net- First, we need test if user apps and IME apps can work packets when users enter data using IME apps. After run without crashing. To implement this, we first down- experiments on commonly used IME apps, we indeed find load and install the top 10 IME apps from Android that a continuous sequence of packets will be captured Market. Then we use the Android automated testing by Wireshark when user is typing. For further verifica- tool MonkeyRunner to download 100 user apps from tion, we need to analyze the content of captured packets. the Android Market. As the touch events triggered by Although some IME apps such as Baidu IME and iFly MonkeyRunner are random, we restrict the screen area IME use encryption to prevent the content analysis, there where touch events can happen based on the location are still other IME apps which upload users’ input in analysis of edit boxes in many user apps. In this way, plain-text with HTTP protocol. After experiments, IME MonkeyRunner can trigger more keystrokes. For each apps include Sogou (v8.0), QQ (v5.4.0), Octopus (v4.2.6) IME app, we use MonkeyRunner to install and run these and TouchPal (for pad, v5.4.5) have dawn our attention. 100 user apps. After experiments, we find only 3 user Taking the Sougou IME app as an example, after typing apps crashed and none of the 10 IME apps crashed. the word “password” in the SMS, we use Wireshark to Forthe3crasheduser apps,wemanuallyrunthemin (a) (b) (c) Fig. 11 The defense of IM-Visor on repackaged IME apps. The repackaged IME apps are capable of uploading user names and passwords to the remote server without IM-Visor. However, with the IM-Visor protection, they cannot leak out sensitive information. a Email log-in using the repackaged Sogou IME. b WeChat log-in using the repackaged QQ IME. c AliPay log-in using the repackaged TouchPal IME Tian et al. Cybersecurity (2018) 1:5 Page 13 of 17 Fig. 12 The analysis on Sogou IME app’s network packets. The leaked data “password” appears in one of Sogou IME app’s packets the development board without IM-Visor, however, they Excluding irregular touches (e.g., fumbling phones) and still crashed. So we think these 3 user apps crashed multi-touch behaviour (e.g., zooming gestures), our test because of their bad compatibilities with our development is focused on the most common case that a user types board. characters on a qwerty soft keyboard with his or her Besides the crash problem, we also need to test if single-touch behaviour. The user app used for test is SMS. IM-Visor can guarantee that a user app is able to run When a user types text in SMS, the translation results of without any input data missing or input data disorder keystrokes will be analyzed by IM-Visor to decide whether (the input data are from IM-Visor and IME apps). For the keystrokes are sensitive. The conclusions can be classi- each user app, we design several different use cases, and fied into two types: Sensitive keystrokes and non-sensitive for each use case, we use some commonly used IME keystrokes. apps including Sogou, QQ, TouchPal, Baidu and iFly to test. We have tested 10 typical user apps including the Sensitive keystrokes. We choose a phone number of Email Client and SMS. For the Email Client we design 11 characters and an email address of 19 characters two use cases including normal log-in and resumed log- which are in the sensitive data set. Then for each IME in (i.e., the user is typing and then he picks up a phone app, we type the phone number and email address 50 call and resumes to log in after hanging up). For the times separately and calculate the average elapsed times, SMS, we also design two use cases including normal respectively. The results are shown in the left half of text-edit and resumed text-edit. After experiments for Table 1. 20 times, we manually verify and find that a user app Based on the results for sensitive keystrokes in Table 1, can work normally without any data missing or data we find that the elapsed time taken for the user app disorder. to get user input data in IM-Visor is 1.84% longer than the time without IM-Visor deployed. This is mainly Usability evaluation due to the overhead of world switches between secure In this evaluation, we need to test how long it costs when world and normal world. The secure kernel we port a user app can get user input data. This refers to the dura- is a Linux-like kernel, it takes about 110ms to switch tion from the time when the first keystroke in the test from user mode in secure world (the context of pre-IME string happens, to the time when the full string is com- Guard) to user mode in normal world (the context of mitted to the user app. The IME apps used for test are java hooks ). Sogou IME, Baidu IME, iFly IME, QQ IME and Touch- One additional issue is about user experience. For the Pal IME. The sensitive data set used for test contains whole sensitive phone number, although IM-Visor brings phones numbers, ID numbers, bank card numbers, and only 1.84% reception latency, the display latency may be email addresses. We select a phone number (11 charac- user-perceptible. Recalling policies in “Sensitive keystroke ters) and an email address (19 characters) for sensitive analysis” section, IM-Visor enforces two different policies data test. (i.e., context-based policy and prefix-matching policy) to Tian et al. Cybersecurity (2018) 1:5 Page 14 of 17 Table 1 Elapsed time for the user app to get the data. We compare the time without/with IM-Visor Sensitive Keystrokes Non-Sensitive Keystrokes Phone Number Email Address Phrases of 15 Characters Phrases of 25 Characters IME apps Without With IM-Visor Without With IM-Visor Without With IM-Visor Without With IM-Visor IM-Visor (ms) (ms) IM-Visor (ms) (ms) IM-Visor (ms) (ms) IM-Visor (ms) (ms) Sogou 6143 6256 11028 11132 8565 9315 14658 15972 Baidu 6090 6192 10960 11063 8543 9316 14632 15933 iFly 6302 6408 11332 11433 8890 9632 15130 16456 QQ 6085 6184 10971 11079 8507 9275 14601 15899 TouchPal 6098 6112 10948 11061 8513 9269 14613 15925 analyze user input. With the context-based policy, IM- for non-sensitive keystrokes is usually longer than that for Visor will treat every single number as sensitive and com- sensitive keystrokes. mit it to a user app one by one, so the display latency is For non-sensitive keystrokes, whether the display non-perceptible. With the prefix-matching policy, the dis- latency is user-perceptible depends on how the prefix play latency is user-perceptible for sensitive keystrokes. of non-sensitive typed string matches the prefix of user- For a sensitive numeric string like phone number, the defined sensitive data set (i.e., phone numbers in our case). prefix-matching of IM-Visor cannot determine the sen- If there is no long common prefix between non-sensitive sitiveness of input until the last number has been typed. string and items of sensitive data set, the display latency is Hence, from the view of a user, no character is displayed non-perceptible. Otherwise, it is perceptible. until the last number of whole sensitive data has been typed. To strike a balance between user privacy and expe- Non-keystroke touch events. The above evaluation is rience, those long sensitive string in user-defined sensitive about keystrokes, but there are also non-keystroke touch data set will be maintained in the form of shorter pieces to events which will be intercepted by IM-Visor. With the alleviate the uncomfortable display latency. For example, display information in secure world, we optimized the a sensitive phone number “1320469299” will be automat- secure kernel to prevent trapping in user mode in secure ically maintained in the form of two shorter pieces like world when a non-keystroke touch event happened, that “13204” and “69299”. is, when no keyboard is shown, the secure kernel will return to normal world imediately without trapping into the pre-IME guard. The optimized world switch here is Non-sensitive keystrokes. We select some phrases from only 27ms and it will not affect the Android touch event system to distinguish user gesture as the default timeout the commonly used sentences set (Braden 1969). In order of a long press in Android is 500 ms . to facilitate the average time calculation, we select 50 dif- ferent phrases of 15 characters and calculate the average time to input these 50 phrases. Then we select 50 different Performance evaluation phrases of 25 characters and calculate the average time. In order to test the performance of the impact of IM-Visor The sensitive data set is the same as the above test, that on Android system, we use the CaffeineMark benchmark is, a phone number of 11 characters and an email address and compare it to original Android. CaffeineMark is a of 19 characters. The results are shown in the right half of popular Android benchmarking tool that runs a series of Table 1. tests and gives an assessment score (John and Eeckhout Based on the results for non-sensitive keystrokes in 2005). We run the benchmark 15 times, each time with Table 1, we find that the elapsed time taken for the user a reboot to eliminate impact caused by different system app to get user input data in IM-Visor is 9.5% longer workload, then calculate the average score. The results are than the time without IM-Visor deployed. This is also in Fig. 13. Overall, IM-Visor performs only 1.53% worse mainly due to the overhead of world switches between than stock Android. This is mainly due to the reason that secure world and normal world. Under the pre-fix match- the IME is an event-driven service which makes IM-Visor ing policy, the world switch for sensitive string only keep idle in most time. needs twice(i.e., from normal world to secure world, and return to normal world from secure world), but for non- Discussion and limitation sensitive string, this switch may happen many times as Discussion the replay mechanism results in a switch from secure We leverage TrustZone to implement the first “pre-IME” world to normal world, so the time taken by IM-Visor defense with hooking Android framework. Recalling the Tian et al. Cybersecurity (2018) 1:5 Page 15 of 17 Fig. 13 CaffeineMark results for original Android and Android with IM-Visor goal of an attacker is to steal sensitive keystrokes, another as secure peripherals to thwart such threat (Nahapetian option is to implement it entirely inside OS kernel with- 2016;Avivetal. 2012). out using the TrustZone. However, it would be unsecure considering the following reasons: First, an IME app may Related work get the coordinates from the linux driver interface “/de- Defense against the Android third-party IME apps v/input/event0” directly, which may result in the leakage belongs to a relatively new problem, I-Box(Chen et al. of sensitive keystrokes. Without the secure touch screen 2015) tries to establish a sandbox mechanism for third- driver in STIE, it can not guarantee the “pre-IME” if party IME apps, by analyzing the user keystrokes to not trusting the linux kernel touchscreen driver. Sec- determine whether to rollback the IME app. As a post- ond, the keyboard layout is a key to figure out which IME design, I-Box is vulnerable to the prefix-substitution keystroke is the user typing, without reconfiguring the attack and colluding attack. In contrast, IM-Visor is a display controller by TrustZone, it can not ensure that defense with the pre-IME nature and it can defend against the framebuffer is the right one in current typing the above attacks. Also the solution does not notice the environment. “Buffer revisiting threat”, so it can be cracked by sandbox bypassing attack. With hooks in revisiting APIs, IM-Visor Limitation can block the data leakage path from a user app to an Although IM-Visor has made the first “pre-IME” attempt IME app. to prevent sensitive keystroke leakage against third party To implement secure password entry, ScreenPass (Liu IMEs, some limitations still exist in its design and et al. 2013) designs a trusted software keyboard to enter implementation. the password. The use of trusted keyboard in ScreenPass is guaranteed by using the Optical Character Recognition SystemServer attacks Recently, some vulnerabilities (OCR), but the OCR itself can be cracked by attackers, so have been discovered to attack System Server (Horn 2014; the security of ScreenPass cannot be guaranteed. What’s Huang et al. 2015b; Ren et al. 2015;Shaoetal. 2016). But more, using a new keyboard instead of the original key- none of them can achieve a control flow hijacking, so mali- board will inevitably harm the user experience and the cious code cannot modify hooks in System Server to stop likelihood the user will adopt the new keyboard cannot IM-Visor from intercepting touch events. be guaranteed. In contrast, IM-Visor adopts TrustZone to provide secure isolation, so the security of IM-Visor can GUI attacks A malicious app may mimic the user app’s be guaranteed. Also IM-Visor reuses the original UI of an UI to mount phishing or click-jacking. However, at IME soft keyboard. present, there are quite a few prior systems which can For password and other privacy data protection, detect such attacks (Bianchi et al. 2015;Akhaweetal. researchers have also tried other solutions. Taint-tracking 2014;Huang etal. 2015a). (Kang et al. 2011) is a commonly used method. Taint- tracking tracks the sensitive information flow in the tar- Side channel attacks Malicious apps may use gravity get app and sets appropriate strategies to prevent the sensor and acceleration sensor to launch a side chan- outflow and abuse of sensitive data. TaintDroid (Enck et al. 2010) is the first taint-tracking method used in nel attack. IM-Visor provides the STIE for user typing, Android and it tracks the flow of sensitive data by in which we can also reconfigure those related sensors Tian et al. Cybersecurity (2018) 1:5 Page 16 of 17 tagging these data. ScreenPass (Liu et al. 2013)alsouses is a TrustZone-based memory acquisition mechanism to taint-tracking to monitor the password flow to prevent detect and prevent the newest malware, and the isola- illegal outflow. SpanDex (Cox et al. 2014)trackshow tion between the OS and the memory acquisition tool password information flows in an app, and compared to is achieved by TrustZone. These solutions focus on the the previous work, SpanDex focuses on the implicit infor- underlying system especially the kernel, and they have little relation to the Android frameworks. In contrast, IM- mation flow in apps. Although the taint-tracking method can get detailed information about sensitive data circula- Visor does much modification on the Android framework tion, it is not very suitable for tracking sensitive keystroke besides the kernel. AdAttester (Li et al. 2015)usesTrust- leakage. IME apps usually use native code in their key Zone to secure online mobile Ad attestation, leveraging function such as the send of sensitive inputs, but taint- the secure world of TrustZone to implement unforge- tracking cannot track the data flow in native code. Reg- able clicks and verifiable display. (Marforio et al. 2014) ulating ARM (Brasser et al. 2016) thwarts the sensitive uses TrustZone to ensure the trusted execution envi- information leakage through misused sensors or periph- ronment for the payment process. Similar to the two erals on smart personal devices. It replaces the original solutions,IM-Visoraimstoprotectonecertain functional peripheral drivers by a remote update when a user enters serviceinAndroid,butIM-Visorismorecomprehensive restricted spaces such as a federal building, and doesn’t asthetrustletinIM-Visorneedstocompletesomefunc- cancel the enforcement of usage policies until the user tional operation and needs more interaction with Android checks out. App Guardian (Zhang et al. 2015)thwarts framework while the trustlet in other two solutions mainly the runtime-information-gathering of malicious apps by complete the operation such as signature and encryption. blocking the runtime monitoring attempt. To realize this, App Guardian pauses the malicious app when sensitive Conclusion app is running. In contrast, IM-Visor will not pause the In this paper, we discuss the insecurity of IME apps, normal run of malicious IME apps which results in lit- including the Potentially Harmful Apps (PHAs) and mali- tle impact on Android system. Screenmilker (Lin et al. cious IME apps. We provide a deeper understanding that 2014) constructs an app which exploits the malicious use all the designs with the post-IME nature are subject to the of the Android ADB capabilities to monitor the screen prefix-substitution and colluding attacks. To remedy the and pick up a user’s password when he or she is typing. above post-IME system flaws, we propose a new idea, pre- Then it presents a mitigation mechanism that controls the IME, which guarantees that “Is this touch event a sensitive exposure of the ADB capabilities only to authorized apps. keystroke?” analysis will always access user touch events While IM-Visor and Screenmilker both aim to protect prior to the execution of any IME app code, and designed the sensitive keystrokes, there are substantial differences: an innovative TrustZone-based framework named IM- The threat in Screenmilker is caused by the flaws of the Visor which has the pre-IME nature. A prototype of Android permission system, whereas IM-Visor regards IM-Visor has been implemented and tested with several IME apps as the threat. The complicated construction of most popular IMEs. The experimental results show that the attacks in Screenmilker makes the attacks difficult IM-Visor has small runtime overheads. to apply widely, while the attacks in IM-Visor commonly Acknowledgment exist and can be built using repackaging. We would like to thank the anonymous reviewers for their valuable comments In recent years, TrustZone has obtained lots of research and suggestions. Yazhe Wang’s work was supported by the National Key and applications in many aspects. Some researchers aim Research and Development Program of China NO.2017YFB0801900 and Youth Innovation Promotion Association of CAS. Peng Liu was supported by NSF to improve the security and usability of TrustZone. CNS-1422594, NSF CNS-1505664, and NSF SBE-1422215 (social). SecReT (Jang et al. 2015) mainly solves the establishment of secure communication between the Rich Execution Authors’ contributions CT conceived of the study and participated in the design of IM-Visor. YW and Environment (REE) and Trust Execution Environment PL participated in the implementation of IM-Visor and drafted the manuscript. (TEE). ICE (Sun et al. 2015b) runs the secure code in the QZ carried out the evaluation for IM-Visor. CZ participated in drafting the non-secure domain by designing isolated secure environ- manuscript. All authors read and approved the final manuscript. ment to restrictthecodesizeofTEE environment. Competing interests Besides the above ones, more researchers aim to apply The authors declare that they have no competing interests. TrustZone to protect the sensitive kernel operations and Publisher’s Note sensitive service. Hypervision (Azab et al. 2014)uses Springer Nature remains neutral with regard to jurisdictional claims in TrustZone to reinforce the Linux kernel by replacing sen- published maps and institutional affiliations. sitive instructions in Linux kernel and controlling access Author details to sensitive kernel data. TrustOTP (Sun et al. 2015a)uses State Key Laboratory of Information Security, Institute of Information TrustZone to protect the full process from generation Engineering, Chinese Academy of Sciences, Beijing 100093, People’s Republic to use for one-time key. TrustDump (Sun et al. 2014) of China. School of Cyber Security, University of Chinese Academy of Tian et al. Cybersecurity (2018) 1:5 Page 17 of 17 Sciences, Beijing, People’s Republic of China. College of Information Sciences Li W, Li H, Chen H, Xia Y (2015) Adattester: Secure online mobile advertisement and Technology, Pennsylvania State University, University Park 16802, PA, USA. attestation using trustzone. In: MobiSys ’15 Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Received: 4 January 2018 Accepted: 17 April 2018 Services. ACM, Florence. pp 75–88 Lin CC, Li H, Zhou XY, Wang XF (2014) Screenmilker: How to Milk Your Android Screen for Secrets. Network and Distributed System Security Symposium. The Internet Society, San Diego References Liu D, Cuervo E, Pistol V, Scudellari R, Cox LP (2013) ScreenPass: secure Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic password entry on touchscreen devices. In: Proceeding of the, search. Commun ACM 18(6):333–340 International Conference on Mobile Systems, Applications, and Services. Akhawe D, He W, Li Z, Moazzezi R, Song D (2014) Clickjacking revisited: A ACM, Taipei. pp 291–304 perceptual view of ui security. Usenix Conference on Offensive Marforio C, Karapanos N, Soriente C, Kostiainen K, Capkun S (2014) Technologies. USENIX Association Smartphones as Practical and Secure Location Verification Tokens for Aviv AJ, Sapp B, Blaze M, Smith JM (2012) Practicality of accelerometer side Payments. Network and Distributed System Security Symposium. The channels on smartphones. In: Proceeding ACSAC ’12 Proceedings of the Internet Society, San Diego 28th Annual Computer Security Applications Conference. ACM, Orlando. Nahapetian A (2016) Side-channel attacks on mobile and wearable systems. pp 41–50 IEEE Consumer Communications & NETWORKING Conference. IEEE, Las Azab AM, Ning P, Shah J, Chen Q, Bhutkar R, Ganesh G, Ma J, Shen W (2014) Vegas Hypervision across worlds: Real-time kernel protection from the arm Ren C, Zhang Y, Xue H, et al (2015) Towards discovering and understanding trustzone secure world. In: ACM Sigsac Conference on Computer and task hijacking in android. In: Usenix Conference on Security Symposium. Communications Security. ACM, Scottsdale. pp 90–102 USENIX Association, Washington, D.C. pp 945–959 Bianchi A, Corbetta J, Invernizzi L, Fratantonio Y, Kruegel C, Vigna G (2015) Shao Y, Ott J, Chen QA, Qian Z, Mao ZM (2016) Kratos: Discovering inconsistent What the app is that? Deception and Countermeasures in the Android security policy enforcement in the android framework. In: Proc. 23rd Annual User Interface. In: 2015 IEEE Symposium on Security and Privacy. IEEE, San Network and Distributed System Security Symposium (NDSS’16). ISOC Jose. pp 915–930 Suarez-Tangil G, Tapiador JE, Peris-Lopez P, Ribagorda A (2013) Evolution, Braden WW (1969) Random common sentences. http://www.englishinuse. detection and analysis of malware for smart devices. IEEE Commun Surv net/. Accessed Aug 2016 Tutor 16(2):961–987 Brasser F, Kim D, Liebchen C, Ganapathy V, Iftode L, Sadeghi AR (2016) Sun H, Sun K, Wang Y, Jing J (2015a) TrustOTP: Transforming Smartphones into Regulating ARM TrustZone Devices in Restricted Spaces. International Secure One-Time Password Tokens. In: ACM Sigsac Conference on Conference on Mobile Systems, Applications, and Services. ACM, Computer and Communications Security. ACM, Denver. pp 976–988 Singapore. pp 413–425 Sun H, Sun K, Wang Y, Jing J, Jajodia S (2014) TrustDump: Reliable Memory Chen J, Chen H, Bauman E, Lin Z, Zang B, Guan H (2015) You shouldn’t collect Acquisition on Smartphones. European Symposium on Research in my secrets: thwarting sensitive keystroke leakage in mobile ime apps. In: Computer Security. Springer, Wroclaw Proceeding SEC’15 Proceedings of the 24th USENIX Conference on Sun H, Sun K, Wang Y, Jing J, Wang H (2015b) Trustice: Hardware-assisted Security Symposium. USENIX Association, Washington, D.C. pp 675–690 isolated computing environments on mobile devices. In: Ieee/ifip Cox LP, Gilbert P, Lawler G, Pistol V, Razeen A, Wu B, Cheemalapati S (2014) International Conference on Dependable Systems and Networks. IEEE Spandex: Secure password tracking for android. Usenix Conference on Computer Society, Rio de Janeiro. pp 367–378 Security Symposium. USENIX Association trustonic Trustzone, tee and trusted video path implementation. http://www. Enck W, Gilbert P, Chun BG, Cox LP, Jung J, Mcdaniel P, Sheth AN (2010) arm.com/files/event/Developer_Track_6_TrustZone_TEEs_and_Trusted_ Taintdroid: an information flow tracking system for real-time privacy Video_Path_implementation_considerations.pdf. Accessed Nov 2016 monitoring on smartphones. In: Usenix Conference on Operating Systems Windowmanager. Android Developer. https://developer.android.com/ Design & Implementation. USENIX Association, Vancouver. pp 393–407 reference/android/view/WindowManager.html/. Accessed Nov 2016 Google (2007) Number of available applications in the google play. https:// Zhang N, Yuan K, Naveed M, Zhou X, Wang XF (2015) Leave me alone: www.statista.com/statistics/266210/number-of-available-applications-in- App-level protection against runtime information gathering on android. In: the-google-play-store/. Accessed Nov 2017 2015 IEEE Symposium on Security and Privacy. IEEE, San Jose. pp 915–930 Zhou Y, Jiang X (2012) Dissecting android malware: Characterization and Horn J (2014) Cve-2014-7911: Privilege escalation using objectinputstream. evolution. In: 2012 IEEE Symposium on Security and Privacy. IEEE, San https://www.reddit.com/r/netsec/comments/2mr9cz/cve20147911_ Francisco. pp 95–109 android_50_privilege_escalation_using /. Accessed Nov 2014 Zhou Z, Gligor VD, Newsome J, McCune JM (2012) Building verifiable trusted Huang J, Li Z, Xiao X, Wu Z, Lu K, Zhang X, Jiang G (2015a) SUPOR: Precise and path on commodity x86 computers. In: 2012 IEEE Symposium on Security scalable sensitive user input detection for android apps. Usenix and Privacy. IEEE, San Francisco. pp 616–630 Conference on Security Symposium. In: 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C. pp 977–992 Huang H, Zhu S, Chen K, Liu P (2015b) From system services freezing to system server shutdown in android: All you need is a loop in an app. ACM Sigsac Conference on Computer and Communications Security. ACM, Denver InputConnection. Android Developer. https://developer.android.com/ reference/android/view/inputmethod/InputConnection.html/. Accessed Nov 2016 InputMethodManager (2016) The reference of android developer. https:// developer.android.com/reference/android/view/inputmethod/ InputMethodManager.html. Accessed Nov 2016 Jang J, Kong S, Kim M, Kim D, Kang BB (2015) SeCReT: Secure Channel between Rich Execution Environment and Trusted Execution Environment. Network and Distributed System Security Symposium. The Internet Society, San Diego John LK, Eeckhout L (2005) Caffeinemark 3.0. http://www.benchmarkhq.ru/ cm30/info.html. Accessed Nov 2016 Kang MG, Mccamant S, Poosankam P, Song D (2011) DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation. Network and Distributed System Security Symposium, NDSS. The Internet Society, San Diego http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Cybersecurity Springer Journals

Using IM-Visor to stop untrusted IME apps from stealing sensitive keystrokes

Free
17 pages

Loading next page...
 
/lp/springer_journal/using-im-visor-to-stop-untrusted-ime-apps-from-stealing-sensitive-BP40lr0jwX
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Computer Science; Computer Science, general
eISSN
2523-3246
D.O.I.
10.1186/s42400-018-0007-6
Publisher site
See Article on Publisher Site

Abstract

Third-party IME (Input Method Editor) apps are often the preference means of interaction for Android users’ input. In this paper, we first discuss the insecurity of IME apps, including the Potentially Harmful Apps (PHAs) and malicious IME apps, which may leak users’ sensitive keystrokes. The current defense system, such as I-BOX, is vulnerable to the prefix substitution attack and the colluding attack due to the post-IME nature. We provide a deeper understanding that all the designs with the post-IME nature are subject to the prefix-substitution and colluding attacks. To remedy the above post-IME system’s flaws, we propose a new idea, pre-IME, which guarantees that “Is this touch event a sensitive keystroke?” analysis will always access user touch events prior to the execution of any IME app code. We design an innovative TrustZone-based framework named IM-Visor which has the pre-IME nature. Specifically, IM-Visor creates the isolation environment named STIE as soon as a user intends to type on a soft keyboard, then the STIE intercepts,Android event sub translates and analyzes the user’s touch input. If the input is sensitive, the translation of keystrokes will be delivered to user apps through a trusted path. Otherwise, IM-Visor replays non-sensitive keystroke touch events for IME apps or replays non-keystroke touch events for other apps. A prototype of IM-Visor has been implemented and tested with several most popular IMEs. The experimental results show that IM-Visor has small runtime overheads. Keywords: TrustZone, Android app security, User privacy Introduction appears when a user intends to type characters in a user Nowadays, people are experiencing a booming growth app (e.g., type a location name in a map searching app). of Android smartphone apps and enjoying their con- Besides the default IME app, there are many kinds of third venience. According to Google Play’s statistics(Google party apps in Android market that a user can download 2007), thenumberofavailable apps in theGooglePlay from. These third party apps can provide value added fea- Store was most recently placed at 3.3 million apps in tures to a user app, such as cloud-based auto correcting, September 2017, after surpassing 1 million apps in July word association and clipboard. 2013. Most popular apps in the real world can be cat- Although an IME app provides great convenience to egorized into six groups: tools, communication, social users, they can introduce serious security problems. An interaction, efficiency, anime and sports. For example, ES attacker can use this kind of apps to steal users’ sensitive app as a tool app can help users transfer their files in keystrokes. As shown in Fig. 2, a keystroke processing in smartphone to a PC desktop. Android works as follows: When a user types a charac- An Input Method Editor (IME) app is a user-installed ter (e.g., a “K”) in a soft keyboard, the touch screen driver app that provides a soft keyboard to receive user input will receive a coordinate(x,y), then the event subsystem in mobile devices. As shown in Fig. 1, a default IME app transfers it into a touch event. Then, the input dispatcher thread will send the event to the target IME app. Finally, *Correspondence: wangyazhe@iie.ac.cn IME app will translate the event into a character (i.e., a State Key Laboratory of Information Security, Institute of Information “K”), and sends it to the target user app. After sending, Engineering, Chinese Academy of Sciences, Beijing 100093, People’s Republic of China an IME app can still revisit the buffer of that user app. School of Cyber Security, University of Chinese Academy of Sciences, Beijing, Here, we can see that an IME app is always the first service People’s Republic of China to receive (sensitive or non-sensitive) keystrokes prior to Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Tian et al. Cybersecurity (2018) 1:5 Page 2 of 17 Fig. 1 IME apps in the real world. An IME app is capable to provide a soft keyboard for user input. Typically, there are many kinds of third party apps in Android market that a user can download with his own preference a user app. Hence, if a user is typing sensitive informa- flow of I-BOX(Chen et al. 2015), which is a well-known tion (e.g., password, bank number, etc.) in a user app, a post-IME defense. Specifically, it saves the process state malicious IME app can work as a key logger to record of an IME app periodically and analyzes the translated and translate sensitive keystrokes, then store them in local keystrokes from IME apps each time an input transaction file system or send them to a remote server. This is a happens. If sensitive ones are found, it will let the IME app typical man-in-middle attack. There are several ways to “forget” sensitive keystrokes by a process roll-back. The I- construct such malicious IME apps, and repackaging is Boxalwayscheckswhether arollbackisneeded after the a common way which has been widely used by attack- IME has already processed keystrokes. And the salient fea- ers. (Zhou and Jiang 2012) Besides malicious IME apps, ture of the post-IME nature is that sensitive keystrokes there are also threats posed by potentially harmful IME appear in the dynamically allocated memory of an IME apps (PHAs). Without users’ consent, they collect sensi- app at least once. tive keystrokes and send them to an ad network doing Although post-IME defenses can prevent the sensitive targeted advertising based on the keywords in user inputs. data leakage in most common cases, there are still three To get rid of the above attacks, researchers have recently security holes (discovered so far) in current defense sys- proposed post-IME defenses. Figure 3 shows the work tems, that is, prefix substitution attack, colluding attack and sandbox bypassing attack (newly discovered attack). Prefix Substitution Attack. Figure 4 is an example of prefix substitution attack to I-BOX. The policy engine in I-Box is a status machine to detect whether the output string of an IME app is sensitive. Assuming the current input is sensitive data, but IME app developers use obfus- cated code to replace the prefix of the typed string with a non-sensitive one, then the policy engine is fooled and the roll-back will not be triggered. So the sensitive data obtained by the IME will not be cleaned and can still be sent to a remote server. Colluding Attack. Figure 5 is an example of colluding attack to I-BOX. To launch a colluding attack, as a post- IME design won’t do anything until it gets some output from the IMEs, an IME app needs to send sensitive text to a colluding app before it commits any text to a user app. So it is really easy to launch the attack in the real world. Sandbox Bypassing Attack. The “revisit” threat is dis- covered by us and I-Box was not aware of it yet. It is a threat for both post-IME and pre-IME defenses. From Fig. 2 Keystrokes porcessing in Android. An IME app obtains the the view of the I-Box, it regards the user input process coordinate(x,y) from touch screen driver and translate it into a as a transaction, which begins when a user starts to enter character, then send it to a user app. It is always the first service to the input and ends when the input session ends. When receive (sensitive or non-sensitive) keystrokes prior to a user app a user is typing sensitive data by a third party IME, the Tian et al. Cybersecurity (2018) 1:5 Page 3 of 17 Fig. 3 The work flow of a post-IME defense namely I-BOX. It always checks if a rollback is needed after the IME has already processed keystrokes. And the salient feature of the post-IME nature is that sensitive keystrokes appear in the dynamically allocated memory of an IME app at least once current transaction will be marked as sensitive by I-Box. will always access the touch event prior to the execution During this sensitive transaction, I-Box believes that the of any IME app code. restriction of network and roll-back can prevent sensitive This work seeks to solve the above problem by design- keystroke leakage. However, the sensitive text exist not ing, implementing and evaluating the first pre-IME only in an IME app while also in the buffer of a user app. defense based on 3 key ideas. The defense should ensure The roll-back only cleans the sensitive text in the IME app that touch events are intercepted before arriving at the but remains the one in the user app. In light of the fact that system (Key idea 1). Sensitive touch events are never sent some functions like getTextBeforeCursor in BaseIn- to IME apps (Key idea 2). Insensitive touch events should putConnection can be used to revisit the buffer of a user be replayed (Key idea 3). app, an IME app can launch a sandbox bypassing attack by calling revisited APIs at the beginning of the next new Challenges. To leverage the above three key ideas, we transaction. If the user app has not flush the buffer yet, are facing three main challenges. First, in the existing the IME can obtain the sensitive text committed in the modern mobile devices, an IME app is the first service last transaction. As a result, the sandbox of I-Box has been to receive (sensitive or non-sensitive) keystrokes from bypassed. Figure 6 shows how does the sandbox bypassing Android event subsystem, and translates them to text. attack case work. It is worth noting that the bypass attack Distinct from a post-IME design which does a rollback is not universally true. In other words, only when the user after the IMEs translating keystrokes, in a pre-IME design, app does not flush the buffer, there exists such an attack. how can we intercept and isolate sensitive keystrokes Problem statement. How to fill the three security holes ahead of IME translation? This is called the “Isolation through providing the following security property: the ahead of IME translation issue” (Challenge 1).Second, analysis on whether a touch event is a sensitive keystroke after we succeeding in intercepting and isolating those Fig. 4 Prefix-substitution attack. I-Box uses a policy engine to search substring in the output of an IME app. Malicious IME apps can obfuscate sensitive string into non-sensitive string to fool I-Box and leak it out to a remote server Tian et al. Cybersecurity (2018) 1:5 Page 4 of 17 Fig. 5 Colluding attack. To lauch such attack, an IME app just needs to send sensitive text to a colluding app before it commits any text to a user app sensitive keystrokes, how can we build a trusted path for have a clear security advantage over post-IME user apps to access these sensitive keystrokes? We call it designs. This is a key new insight of this work. the “Trusted path issue” (Challenge 2). Finally, recalling We build a concrete pre-IME defense named IM-Visor which leverages TrustZone to isolate the reason why users got incentives to use IMEs in the first paragraph, an IME app does provide convenience and sensitive keystrokes before the IMEs could access extra benefits. In a pre-IME design, how can we retain the them. IM-Visor resolves three main challenges: the value added feature for user apps? We call it the “Benefits “Isolation ahead of translation issue”, the “Trusted retaining issue” (Challenge 3). path issue” and the “Benefits retaining issue”. To address Challenge 1, we leveraged Trustzone and By noticing that sensitive keystrokes can generally achieved interception ahead of IME translation. The iso- flow both way (i.e., from IME apps to user apps and lation mechanism includes detection of soft keyboards, from user apps to IME apps), we discover a new initialization of STIE (Secure Typed Isolation Environ- sandbox bypassing vulnerability of I-Box. ment, touch event processing and keystrokes translation, We perform a thorough evaluation of IM-Visor. We and sensitiveness analysis. To address Challenge 2, we test a set of popular IME apps and the related user built a trusted path for sensitive keystrokes to be trans- apps, no sensitive keystroke leakage caused by IME ferred to the user app through creating a new IPC between apps is found. The experimental results show that the commit-proxy and the user app. To address Challenge IM-Visor has small runtime overheads. 3, we proposed a keystroke replay mechanism. Our main contributions are summarized as follows. Background Android IME, Input Method Framework (IMF) and event We propose a new idea “pre-IME”, which guarantees subsystem that “Is this touch event a sensitive keystroke?” Android IMF arbitrates interaction between applications analysis will always access user touch events prior to and the current input method (InputMethodManager the execution of any IME app code. 2016). A user app can use the standard TextView or We provide a deeper understanding that all the its subclass to interact with an IME app. InputMethod- designs with the post-IME nature are subject to the ManagerService (IMMS) in the IMF is a global system prefix-substitution and colluding attacks. Addressing service that manages the interaction across the above the two attacks, designs with the pre-IME nature processes. When a user touches on the TextView of a Fig. 6 Sandbox bypassing attack. The pink and blue color represent two different state of an IME app. As shown in blue color, after the roll-back, an IME app can still access the user app’s data buffer for sensitive text by some revisited APIs and leak it out to a remote server at the beginning of next input transaction Tian et al. Cybersecurity (2018) 1:5 Page 5 of 17 user app, IMMS will start an IME app. What’s more, uploading whatever a user types on a soft keyboard. So some functions in IMMS such as showSoftInput, there is a risk of sensitive keystroke leakage through third hideCurrentInputLocked can control when a soft patry IMEs. A current defense with the post-IME nature keyboard will be shown up or hidden. If a user types intends to discover sensitive input by analyzing the out- on the soft keyboard, TouchInputMapper in the put of an IME app and cleans it by a roll-back. However, Android event subsystem is the first entity to handle user an IME app can fool the defense by committing a replaced touch events. After the process of TouchInputMapper, text (Prefix-substitution Attack ) or leaking out sensitive an input dispatch thread in WindowManagerService keystrokes with a colluding app before the analysis is trig- (WMS)(Windowmanager) is responsible to dispatch gered (Colluding Attack). In “Introduction”section,we keystrokes to the active IME app. Then the IME can trans- have pointed out that all the designs with the post-IME late keystrokes to text and commits them to a user app by nature are subject to the above two attacks. And a key BaseInputConnection (BIC)(InputConnection). BIC is the motivation of our work is that we intend to build a more connection between a user app and an IME app. BIC pro- secure defense to get rid of the above attacks. Besides, vides some functions such as getTextBeforeCursor, we discover a new data leakage path from a user app to getSelectedText for IME apps to revisit the data an IME app by some revisit APIs (Sandbox Bypassing buffer in a user app. The reason why these functions exist Attack). So the “revisit” is another threat to our security is that an IME app may need to change some character concerns. before finally committing or it just wants to verify the It is possible that a malicious user app can collude with committing. In this paper, we put hooks in some func- an IME app to steal sensitive keystrokes. However, we con- tions in the IMF and event subsystem so that the “Is this sider this out of the scope of this paper. Because a user touch event a sensitive keystroke?” analysis can be invoked app can get whatever a user types in a soft keyboard, it before the IMEs access keystrokes. is unnecessary to steal sensitive keystrokes through an hacked IME app. Besides, from an attacker’s point of view, TrustZone it is much more easier to attack a single IME app than Processor state isolation. As hardware-level security iso- attacking all kinds of user apps which often use an IME lation, TrustZone provides Secure Monitor Call (SMC) keyboard. If an IME app is hacked, all user apps are hacked instruction for the processor to enter secure world from since an IME app processes all of a user’s input in modern normal world. The SMC instruction is a privileged mobile devices. instruction which is invoked in normal world. Program in secure state can access resources across the system Assumptions including I/O, memory, etc. Normal program has a lower As third party IME apps may cause sensitive keystroke execution privilege. leakage, we consider all third party IMEs (i.e., malicious andPHAs)as untrusted. ThegoalofIM-Visoristopre- I/O device and memory isolation. A major feature of vent IME apps from accessing data when a user types TrustZone is that it can flexibly configure the secure sensitive keystrokes, so we assume that the user app which stateofI/Odevicesusingsoftware.Thisfunctioninvolves employs an IME app for keystroke translation is trusted. TrustZone Protection Controller (TZPC) and Trust- Although there are lots of attacks targeting at user apps Zone Address Space Controller (TZASC). TZASC allows (Zhou and Jiang 2012;Suarez-Tangil etal. 2013), such secure and non-secure area partition for the mobile device threats are not in the scope of this work. We assume that DRAM memory. In existing mobile devices, touch screen the Android System Server and the kernel are not on the and display controller are usually configured as non- target list of attackers. As IM-Visor is a security scheme secure. based on TrustZone, so we assume the device is equipped with TrustZone and the function of TrustZone has been Trustlets. An application in secure world is known as a correctly implemented on the device. Considering Trust- trustlet. It can access the normal world memory but not Zone is an isolation solution with hardware support, we vice-versa. Considering the TCB size of secure world, a assume the hardware of the device is trusted. Hardware trustlet is usually designed to provide some higher secure attacks which may prevent the normal operation of Trust- operation such as displaying trusted UI or encryption. Zone are out of the scope of this work. Although OS is not on the list of attack targets, consid- Threat model and assumptions ering the following facts, we still use TrustZone to imple- Threat model ment our defense. First, TrustZone is widely deployed. Data from Samsung shows that millions of modern As mentioned, in the Android IMF, due to extra benefits, devices are outfitted with TrustZone (Azab et al. 2014). We user apps got incentives to use an IME app to access a soft hypothesize that more and more devices will use the ARM keyboard. However, an IME app is capable of logging and Tian et al. Cybersecurity (2018) 1:5 Page 6 of 17 TrustZone in the future. Second, the Trusted Execution keystrokes and isolate them before an IME app could Enviroment (TEE) is already deployed, there seems less a access them. need to reinvent wheels. Comparing to adding system or One possible way is to leverage TrustZone to implement kernel code, it is really more convenient to put our critical a trusted IME app with a trusted GUI. When users intend code as a trustlet in secure world and only put some hooks to type sensitive data, let them switch to the trusted IME. in Android. Third, it ensures minimum kernel modifica- However, this approach brings two disadvantages. First, tion. In our design, only a TrustZone driver is needed to be it is a burden for users to constantly keep this switch in installed in kernel. No kernel instrumentation is needed. mind. Second, a friendly trusted GUI means a lot of extra Forth, no significant impact on system overheads by test- coding work, such as efficient graphics rendering. So we ing with most popular IMEs. Fifth, TrustZone does reduce have to look for a new approach. our attack surface. For example, using a gravity sensor In the light of the fact that keystrokes will be pre- to launch a side channel attack is possible when a user processed by the event subsystem before an IME app types on a soft keyboard. TrustZone can configure related could access them, we put some hooks in event sub- hardware as secure to thwart such attack. system and leverage TrustZone to achieve the pre-IME nature. Subsystem hooks make SMC calls and jump to Overview secure world. In secure world, IM-Visor provides the Figure 7 shows the system components of IM-Visor, which STIE in which the touch screen and display devices includes a Secure Typing Isolation Environment (STIE) in are only controlled by secure world. For touch input, secure world, a system service named commit-proxy,a we implement a separate touch driver in secure world. daemon thread named replay executor in the event sub- Hence, whenever a touch input interrupt arrives, IM- system and some hooks. The STIE includes two parts: Visor would be the first to access keystrokes prior to secure hardware drivers and a trustlet named pre-IME the execution of any IME app code. The pre-IME Guard guard. As mentioned in “Introduction” section, in order receives keystrokes, translates them and analyzes whether to create a defense with the pre-IME nature, there are the char string is sensitive. Concerning about the flex- three main challenges: “Isolation ahead of IME translation ibility and efficiency, the STIE will be created only issue”, “Trusted path issue”, “Benefits retaining issue”. Now when a user intends to type in a soft keyboard (see we give a high-level overview of how IM-Visor resolves “STIE initialization”section). Compared to the development of an trusted IME app, them. the STIE helps IM-Visor avoid the above two disadvan- Isolation ahead of IME Translation Issue. In existing mobile devices, an IME app is the first entity to receive tages. First, as the STIE can be initialized automatically user touch events, and then translates keystrokes to text. when a user intends to type in a soft keyboard, a user does To achieve a pre-IME design, we must recognize sensitive not have to keep the keyboard switch in mind. Second, Fig. 7 IM-Visor consists of the STIE in secure world, a new service named commit-proxy, a daemon thread named replay executor in event subsytem and some hooks Tian et al. Cybersecurity (2018) 1:5 Page 7 of 17 because the STIE reuses the UI of a soft keyboard and isolates touch input, no trusted GUI lib is needed. Trusted path issue. As mentioned in “Introduction” section, after the isolation of sensitive keystrokes, we must buildatrustedpathfromthepre-IMEGuard toauser app (Zhou et al. 2012; trustonic). Obviously, we can- not use untrusted IME apps to commit sensitive text as this violates our security principle. So we have to find another data path isolated from IME apps. In light of the fact that TextView of a user app uses a local binder named IInputContext.Stub to receive text, we put some hooks in the IMF and create a new connection between a user app and our newly added service named commit-proxy. In other words, we create a new inter- process communication (IPC) between a user app and the commit-proxy to commit sensitive text. Benefits retaining issue. As an IME app does provide convenience and extra benefits, in a pre-IME design, we must retain the value added feature for user apps. The keyideaofIM-Visoristoreplayakeystrokeassoonas the pre-IME Guard determines it as non-sensitive and let the IMEs work for non-sensitive keystrokes. To achieve Fig. 8 Workflow and data paths under the IM-Visor protection. For this, we design replay executor running in System Server sensitive keystrokes as shown in red color, a trusted path from the secure touch screen to a user app is created by the STIE and process for replay. Specifically, the Replay Executor gets commit-proxy. For non-sensitive keystrokes, the Replay Executor touch event coordinates from the pre-IME Guard and dispatches them to the targeted IME app encapsulates them into Android touch event format, then triggers event subsystem to dispatch events to IME apps. Another issue related to replay is that we must replay non-keystroke touch events for the other apps. keystroke would be preprocessed by event subsys- tem before any IME app could access it. Specifically, Design and implementation TouchInputMapper in event subsystem is the class Workflow of IM-Visor for touch event processing. InputMethodManagerService As a pre-IME design, IM-Visor always recognizes and iso- (IMMS) in the IMF is a global system service that man- lates sensitive keystrokes before the IMEs could access ages the interaction across IME apps and user apps. them. To achieve this, whenever a user intends to type Anytime a user app requests a soft keyboard, IMMS in a soft keyboard, the STIE will be initialized to inter- would ask an IME app to show a soft keyboard by calling cept touch events and analyze whether it is a sensitive showSoftInput. keystroke. From the perspective of how touch events (i.e., keystrokes or non-keystrokes) are handled, Fig. 8 shows STIE initialization the workflow of IM-Visor after the STIE has been initial- The primary technical challenge of the STIE initialization ized. The red data path indicates the trusted path from is guaranteeing that IM-Visor is always aware of when a touch screen to a user app. On the other hand, as shown user is typing in a soft keyboard prior to the execution in green color, when non-sensitive touch events (i.e., non- of any IME app code. If IM-Visor can create the STIE as sensitive keystrokes or non-keystrokes) are found, the soon as a user firstly puts his or her finger on a soft key- pre-IME Guard asks the Replay Executor to replay the cor- board in normal world, then the pre-IME Guard is able to responding touch event to the targeted apps (e.g., IME intercept user keystrokes from the start of input and the apps or other apps). pre-IME nature can be ensured. To address this challenge, the key idea is to check whether a soft keyboard has been Address challenge 1: isolation ahead of IME translation shown up each time a touch event arrives in event sub- At first, let’s recall some backgrounds about the IMF system. In modern mobile devices with a touch screen, we andevent subsystemin“Android IME, Input Method assume that a user intends to type text when he or she Framework (IMF) and event subsystem”section.A taps on touch screen after a soft keyboard has been shown Tian et al. Cybersecurity (2018) 1:5 Page 8 of 17 up. And the keyboard display information is maintained user is typing, the pre-IME Guard takes two steps. Step 1) in secure world. The display controller is reconfigured as secure by Trust- Figure 9 shows how we initialize the STIE. The first user Zone TZPC so that normal world cannot change it. This tap on the edit box of a user app will ask IMMS to start up is important because display controller provides infor- an IME app. This process in fact invokes two hooks: sync mation about the start region of framebuffer.Ifthe and showSoftInput. IM-Visor will ignore the touch but display controller is not controlled by secure world, nor- update keyboard display information in secure world. At mal world software can deceive the pre-IME Guard into this moment, the STIE has not been initialized yet. Then awrong framebuffer and translated in a wrong soft the user may taps on a soft keyboard. This behaviour of keyboard layout.Step2)After atouch eventhappened, course invokes sync again. At this moment, the STIE the pre-IME Guard reads framebuffer and checks cor- must be initialized, because tapping on an IME soft key- rectness of the layout. As a proof-of-concept prototype, board is obviously a keystroke. We reconfigure peripherals IM-Visor preloads the layout information of popular IME like display controller and touch screen as secure. Then apps and determines whether the layout is correct by com- the pre-IME Guard receives touch events directly through paring the hash of the current layout with the preloaded secure touch screen. standard one. As a future work, in step 2, instead of the “preload&check” way, we will obtain the current layout by Touch event processing and keystroke translation an efficient optical character recognition (OCR). In order to intercept user keystrokes in secure world, the Leaving framebuffer in normal world is not a secure touch screen is reconfigured to be only accessed by secure concern as it seems. Supposing an untrusted IME app world, and a separate touch screen driver is implemented intends to figure out which keystroke a user types, it also in secure world. As a result, anytime a touch interrupt needs the above two pieces of information. But sensi- arrives,thedriverwillbethe firsttoreceive usertouch tive touch coordinates only stay in secure world. So an coordinates. In order to figure out which keystroke a user IME app cannot succeed in finding sensitive keystrokes types, we need two pieces of information: touch coordi- without touch coordinates. nates and the current soft keyboard layout. The touch After identifying a keystroke, the pre-IME Guard will screen driver in secure world provides a secure way to translate it into a character. Now we give an example of obtain touch coordinates. Now we explain how the pre- keystroke translation. For Latin language, every keystroke can directly correspond to a character, but for non-Latin IME Guard gets the soft keyboard layout securely. The languages candidate words often need to be shown. Here, soft keyboard layout is a piece of display data in nor- mal world that an IME app puts in framebuffer. And we only discuss Latin language translation with a qwerty framebuffer is a region of memory which is allo- keyboard. Supposing the user types “a” in the soft key- cated by a Linux display driver. The display controller is board, then secure touch screen gets the touch point a peripheral to generate the necessary control signals for C(x, y). The preloaded keyboard layout helps the pre- data display. To obtain the soft keyboard layout on which a IME Guard determine whether this point falls in the geo Fig. 9 STIE initialization. Hooks in the IMF and event subsystem are invoked to notify the pre-IME Guard in secure world to initialize the STIE. In modern mobile devices with a touch screen, we assume that a user intends to type text when he or she taps on touch screen after a soft keyboard has been shown up Tian et al. Cybersecurity (2018) 1:5 Page 9 of 17 range of “a” key button, which is defined by the top-left startInput will be invoked. The pre-IME Guard cre- point A(x1, y1) and the bottom-right point B(x2, y2). If ates a token to IMMS, which contains a unique id to it falls in, the pre-IME Guard translate the keystroke as a identify the current user app. Then IMMS requests the character “a”. commit-proxy to bind the user app with two parameters (token, InputConnection). When the commit-proxy Sensitive keystroke analysis receives this bind request, it makes a SMC call to check In order to analyze whether keystrokes are sensitive, we whether the id in token is valid. If it is valid, the commit- accept the I-BOX’s policy engine, which enforces a spe- proxy will add the new connection. Otherwise the bind cific context-based policy and a specific prefix-matching request will be refused. If any sensitive string is found, policy. In the IMF, text fields in user apps have different the pre-IME Guard sends sensitive string to the commit- types, such as dates and passwords. IM-Visor can lever- proxy by a shared memory, and then the commit-proxy age these information to decide whether current input is will commit it with the above new IPC. sensitive or not. Specifically, the hook startInput in IMMS can provide information of text fields. If the cur- Address challenge 3: benefits retaining rent edit box works for passwords (or something sensitive To retain the extra benefits of IME apps (e.g., auto cor- like that), the pre-IME Guard will know it from the start of recting and word association), one possible way is to a soft keyboard display and treat all following keystrokes implement the value added feature in IM-Visor. However, as sensitive. This is called the “Context-based Policy”. User this way makes all IME apps useless and means a lot of activities such as logging in is a typical case that IM-Visor extra coding works for IM-Visor (e.g., cloud-based word can enforce such policy. For general user input stream, association and local trusted GUI lib). We look for a more after translating keystrokes to string, IM-Visor leverages efficient and elegant approach. prefix-matching to search all possible substrings when a Our key idea here is to design a replay mechanism new char is typed (Aho and Corasick 1975). The sensi- and let IME apps work for non-sensitive keystrokes. We tive data set used for searching is defined by users. As designed a daemon thread named replay executor running a user could consider large numbers of data instances in System Server process to replay touch events. If some as sensitive, IM-Visor uses a trie-like structure to main- touch events need to be replayed, the pre-IME Guard puts tain it in secure world. This is called the “Prefix-matching them in a shared memory and then the Replay Executor Policy”. reads and replays them. We explain the detail as follows. In Android system, every activity or service maintains a Address challenge 2: trusted path thread loop to receive touch events or other input events After isolating and translating sensitive keystrokes, we by an input channel. If the Replay Executor intends to should commit them to the targeted user app. Obviously, replay a touch event directly, it needs to maintain the as a pre-IME defense, we cannot use untrusted IME apps input channels and selects which activity or service will to commit sensitive string. So we have to find another receive the touch event. The selection is based on not data path isolated from IME apps. Our main idea here is only touch coordinates but also window layouts and the to add an independent system service that can commit current window focus. sensitive string from secure world to a user app for the The key point is that the Replay Executor only receives trusted path. touch events from the pre-IME Guard and then triggers Normally, the edit box of a user app uses a local binder event subsystem to complete the “maintain&selection”. As named IInputContext.Stub to receive char strings. mentioned in “Android IME, Input Method Framework And the client of IInputContext.Stub is initialized (IMF) and event subsystem”section,aninput dispatch in the IMMS at the start of input. In light of the above thread in WindowManagerService is responsible for touch fact, we add some code in the IMF to make the IMMS events dispatching. In most cases, the input dispatch create an extra binder client for our newly added service thread sleeps on an input event queue. When a touch commit-proxy. And then the commit-proxy is capable of event needs to be dispatched, it wakes up and dequeues committing sensitive string to the user app. Because the an event, then selects an activity or service for dispatch- new IPC and new service are independent of an IME app, ing. If we can handle this input event queue and wake sensitive string in this data path cannot be accessed by any up the thread when a replay is needed, we are able to let IMEs. the event subsystem do the “maintain&selection” work. This is exactly how the Replay Executor works. The con- Create a new IPC. Figure 10 shows how the commit- text of WindowManagerService provides the event queue of input dispatch thread. The Replay Executor encapsu- proxy creates a new IPC with a user app. When a user lates non-sensitive keystrokes as required Android touch taps ontheeditbox,theuserappasksIMMStocall- event format and enqueues them. Then it simply wakes back functions in the current active IME app. Hooks in Tian et al. Cybersecurity (2018) 1:5 Page 10 of 17 Fig. 10 How a user app binds the commit-proxy. Note that a token is necessary to detect the legitimacy of a user app for the binding up the dispatch thread to do the rest work for our replay. in Android IMF. Listing 1 shows all the hooks we put in As a related issue, non-keystroke touch events also can be Android. replayed by this way. Secure touch screen and display controller reconfig- uration. When a user intends to type in an IME soft Minor challenge 4: buffer revisiting threat keyboard, some reconfiguration should be done for the As discussed in “Threat model” section, we discover a new STIE initialization. We reconfigure Interrupt Security data leakage path from a user app to an IME app by some Register(ICDISR), Priority Mask Register (ICCPMR) and revisit APIs. To prevent this threat, we hook all revisit Enable Set Register (ICDISER) to make the touch input APIs (see Listing 1) and analyze the revisited char string as a secure interrupt and mask all non-secure intterupt. again to detect and block sensitive text when a third-party In CPU Interface Control Register (ICCICR), FIQEn, IME app revisits the buffer of a user app. EnableS are set to 1 to enable FIQ interrupt. FIQ bit in Secure Configuration Register (SCR) is also set to 1 Implementation to ensure FIQ interrupt routing to TrustZone monitor We have implemented IM-Visor on Samsung 4412 devel- mode. Besides, touch screen and display controller are opment board equipped with ARM TrustZone. Android settobesecureperipherals withTZPC.Asaproof- and kernel version on the board are 4.0 and 3.0.2 of-concept prototype, we only implement single-touch respectively. in the separate touch driver and leave multi-touch as a future work. Pre-IME guard and services. Specifically, the pre-IME Guard runs as a trustlet in secure world and Android runs Evaluation in normal world. The commit-proxy is a system service Security evaluation in Android System Server process. The Replay Executor Malicious IME apps and PHAs will upload user sensi- is a daemon thread running in System Server process. tive data to remote servers, which cause harm to users. Both of them are passively waiting to receive data from To evaluate the defense effectiveness of IM-Visor against the pre-IME Guard. When a user types in the STIE, the malicious IME apps, we construct malicious IME apps by pre-IME Guard receives keystrokes from touch screen and repackaging some popular ones to make them send sensi- translates them into a char string. Corresponding to its tive keystrokes to a remote server. We want to see whether sensitiveness, we return it through green path or red path. they can still leak out sensitive keystrokes during the IM- (see Fig. 8). Visor protection. To evaluate the defense effectiveness of IM-Visor against PHAs, we analyze the commonly used Hooks in the IMF and event subsystem. To min- IME apps’ network packets. As we can see, without IM- imize system overhead, we have to hook as less as Visor, these IME apps may send user sensitive keystrokes possible. Specifically, only three classes in Android outside, and with IM-Visor, user sensitive keystrokes won’t has been hooked: InputMethodManagerService, be found in their network packets. BaseInputConnection and TouchInputMapper. In order to jump into secure world, a TrustZone driver Defense against malicious IME Apps is installed in Linux kernel. Hooks make SMC calls We use repackaging to design malicious IME apps and through the newly installed driver. When these hooks are the targets of repackaging are the popular third party IME invoked, the pre-IME Guard can intervene the data-flows apps. As many IME apps have their own different defenses Tian et al. Cybersecurity (2018) 1:5 Page 11 of 17 against repackaging, the difficulty of repackaging on dif- in the smali file. For example, as most IME apps use the ferent IME apps is different. Some IME apps just design Android IMF which provides various classes and APIs, simple defense which are not difficult to be cracked, and we can hook the API commitText to intercept all the some construct complex solutions which will cost much user input. The added code is inserted into the location of time to be repackaged. We repackaged three IME apps, commitText and the functionality of the added code is one is Sogou IME which is the most popular third party to upload each user keystroke to a remote server by socket IME, and the other two are QQ IME and TouchPal IME connections. Finally, the modified code is recompiled, which are also very popular third party IME apps. We can signed and installed in the terminal. After installation, the get the smali code for each app after decompiling the APK repackaged IME can be set as the default IME by mod- file and then add some code in several critical locations ifying the system settings. Now we open an app which Tian et al. Cybersecurity (2018) 1:5 Page 12 of 17 needs a user to type user name and password, and we find intercept the network packets of Sogou IME app. Figure 12 that the entered user name and password are sent to the shows the intercepted packets. This indicates that sen- remote server when the user is typing. sitive keystrokes have been leaked out by the IME app. To verify the defense effectiveness of IM-Visor, we With IM-Visor, these potentially harmful IME apps can no repeat the above operations in the development board longer access the user input when the input is sensitive, as with IM-Visor for several times. Figure 11 shows the we don’t see Wireshark capturing any packets containing defense effectiveness of IM-Visor against malicious IME sensitive keystrokes. apps. It is clearly shown from the above that IM-Visor can prevent malicious third party IME apps from stealing Correctness evaluation sensitive data. IM-Visor is a pre-IME design, it intervenes in the com- munication between user apps and IME apps. As an IME Defense against PHAs app cannot trigger input by itself, it must be employed by Most commercial-off-the-shelf (COTS) IME apps actually a user app which has edit boxes. So in this section, we collect the user input to improve user experience by ana- need test if user apps and IME apps can normally run with lyzing the user input habits or to do targeted advertising. IM-Visor deployed. To verify this, we use Wireshark to intercept the net- First, we need test if user apps and IME apps can work packets when users enter data using IME apps. After run without crashing. To implement this, we first down- experiments on commonly used IME apps, we indeed find load and install the top 10 IME apps from Android that a continuous sequence of packets will be captured Market. Then we use the Android automated testing by Wireshark when user is typing. For further verifica- tool MonkeyRunner to download 100 user apps from tion, we need to analyze the content of captured packets. the Android Market. As the touch events triggered by Although some IME apps such as Baidu IME and iFly MonkeyRunner are random, we restrict the screen area IME use encryption to prevent the content analysis, there where touch events can happen based on the location are still other IME apps which upload users’ input in analysis of edit boxes in many user apps. In this way, plain-text with HTTP protocol. After experiments, IME MonkeyRunner can trigger more keystrokes. For each apps include Sogou (v8.0), QQ (v5.4.0), Octopus (v4.2.6) IME app, we use MonkeyRunner to install and run these and TouchPal (for pad, v5.4.5) have dawn our attention. 100 user apps. After experiments, we find only 3 user Taking the Sougou IME app as an example, after typing apps crashed and none of the 10 IME apps crashed. the word “password” in the SMS, we use Wireshark to Forthe3crasheduser apps,wemanuallyrunthemin (a) (b) (c) Fig. 11 The defense of IM-Visor on repackaged IME apps. The repackaged IME apps are capable of uploading user names and passwords to the remote server without IM-Visor. However, with the IM-Visor protection, they cannot leak out sensitive information. a Email log-in using the repackaged Sogou IME. b WeChat log-in using the repackaged QQ IME. c AliPay log-in using the repackaged TouchPal IME Tian et al. Cybersecurity (2018) 1:5 Page 13 of 17 Fig. 12 The analysis on Sogou IME app’s network packets. The leaked data “password” appears in one of Sogou IME app’s packets the development board without IM-Visor, however, they Excluding irregular touches (e.g., fumbling phones) and still crashed. So we think these 3 user apps crashed multi-touch behaviour (e.g., zooming gestures), our test because of their bad compatibilities with our development is focused on the most common case that a user types board. characters on a qwerty soft keyboard with his or her Besides the crash problem, we also need to test if single-touch behaviour. The user app used for test is SMS. IM-Visor can guarantee that a user app is able to run When a user types text in SMS, the translation results of without any input data missing or input data disorder keystrokes will be analyzed by IM-Visor to decide whether (the input data are from IM-Visor and IME apps). For the keystrokes are sensitive. The conclusions can be classi- each user app, we design several different use cases, and fied into two types: Sensitive keystrokes and non-sensitive for each use case, we use some commonly used IME keystrokes. apps including Sogou, QQ, TouchPal, Baidu and iFly to test. We have tested 10 typical user apps including the Sensitive keystrokes. We choose a phone number of Email Client and SMS. For the Email Client we design 11 characters and an email address of 19 characters two use cases including normal log-in and resumed log- which are in the sensitive data set. Then for each IME in (i.e., the user is typing and then he picks up a phone app, we type the phone number and email address 50 call and resumes to log in after hanging up). For the times separately and calculate the average elapsed times, SMS, we also design two use cases including normal respectively. The results are shown in the left half of text-edit and resumed text-edit. After experiments for Table 1. 20 times, we manually verify and find that a user app Based on the results for sensitive keystrokes in Table 1, can work normally without any data missing or data we find that the elapsed time taken for the user app disorder. to get user input data in IM-Visor is 1.84% longer than the time without IM-Visor deployed. This is mainly Usability evaluation due to the overhead of world switches between secure In this evaluation, we need to test how long it costs when world and normal world. The secure kernel we port a user app can get user input data. This refers to the dura- is a Linux-like kernel, it takes about 110ms to switch tion from the time when the first keystroke in the test from user mode in secure world (the context of pre-IME string happens, to the time when the full string is com- Guard) to user mode in normal world (the context of mitted to the user app. The IME apps used for test are java hooks ). Sogou IME, Baidu IME, iFly IME, QQ IME and Touch- One additional issue is about user experience. For the Pal IME. The sensitive data set used for test contains whole sensitive phone number, although IM-Visor brings phones numbers, ID numbers, bank card numbers, and only 1.84% reception latency, the display latency may be email addresses. We select a phone number (11 charac- user-perceptible. Recalling policies in “Sensitive keystroke ters) and an email address (19 characters) for sensitive analysis” section, IM-Visor enforces two different policies data test. (i.e., context-based policy and prefix-matching policy) to Tian et al. Cybersecurity (2018) 1:5 Page 14 of 17 Table 1 Elapsed time for the user app to get the data. We compare the time without/with IM-Visor Sensitive Keystrokes Non-Sensitive Keystrokes Phone Number Email Address Phrases of 15 Characters Phrases of 25 Characters IME apps Without With IM-Visor Without With IM-Visor Without With IM-Visor Without With IM-Visor IM-Visor (ms) (ms) IM-Visor (ms) (ms) IM-Visor (ms) (ms) IM-Visor (ms) (ms) Sogou 6143 6256 11028 11132 8565 9315 14658 15972 Baidu 6090 6192 10960 11063 8543 9316 14632 15933 iFly 6302 6408 11332 11433 8890 9632 15130 16456 QQ 6085 6184 10971 11079 8507 9275 14601 15899 TouchPal 6098 6112 10948 11061 8513 9269 14613 15925 analyze user input. With the context-based policy, IM- for non-sensitive keystrokes is usually longer than that for Visor will treat every single number as sensitive and com- sensitive keystrokes. mit it to a user app one by one, so the display latency is For non-sensitive keystrokes, whether the display non-perceptible. With the prefix-matching policy, the dis- latency is user-perceptible depends on how the prefix play latency is user-perceptible for sensitive keystrokes. of non-sensitive typed string matches the prefix of user- For a sensitive numeric string like phone number, the defined sensitive data set (i.e., phone numbers in our case). prefix-matching of IM-Visor cannot determine the sen- If there is no long common prefix between non-sensitive sitiveness of input until the last number has been typed. string and items of sensitive data set, the display latency is Hence, from the view of a user, no character is displayed non-perceptible. Otherwise, it is perceptible. until the last number of whole sensitive data has been typed. To strike a balance between user privacy and expe- Non-keystroke touch events. The above evaluation is rience, those long sensitive string in user-defined sensitive about keystrokes, but there are also non-keystroke touch data set will be maintained in the form of shorter pieces to events which will be intercepted by IM-Visor. With the alleviate the uncomfortable display latency. For example, display information in secure world, we optimized the a sensitive phone number “1320469299” will be automat- secure kernel to prevent trapping in user mode in secure ically maintained in the form of two shorter pieces like world when a non-keystroke touch event happened, that “13204” and “69299”. is, when no keyboard is shown, the secure kernel will return to normal world imediately without trapping into the pre-IME guard. The optimized world switch here is Non-sensitive keystrokes. We select some phrases from only 27ms and it will not affect the Android touch event system to distinguish user gesture as the default timeout the commonly used sentences set (Braden 1969). In order of a long press in Android is 500 ms . to facilitate the average time calculation, we select 50 dif- ferent phrases of 15 characters and calculate the average time to input these 50 phrases. Then we select 50 different Performance evaluation phrases of 25 characters and calculate the average time. In order to test the performance of the impact of IM-Visor The sensitive data set is the same as the above test, that on Android system, we use the CaffeineMark benchmark is, a phone number of 11 characters and an email address and compare it to original Android. CaffeineMark is a of 19 characters. The results are shown in the right half of popular Android benchmarking tool that runs a series of Table 1. tests and gives an assessment score (John and Eeckhout Based on the results for non-sensitive keystrokes in 2005). We run the benchmark 15 times, each time with Table 1, we find that the elapsed time taken for the user a reboot to eliminate impact caused by different system app to get user input data in IM-Visor is 9.5% longer workload, then calculate the average score. The results are than the time without IM-Visor deployed. This is also in Fig. 13. Overall, IM-Visor performs only 1.53% worse mainly due to the overhead of world switches between than stock Android. This is mainly due to the reason that secure world and normal world. Under the pre-fix match- the IME is an event-driven service which makes IM-Visor ing policy, the world switch for sensitive string only keep idle in most time. needs twice(i.e., from normal world to secure world, and return to normal world from secure world), but for non- Discussion and limitation sensitive string, this switch may happen many times as Discussion the replay mechanism results in a switch from secure We leverage TrustZone to implement the first “pre-IME” world to normal world, so the time taken by IM-Visor defense with hooking Android framework. Recalling the Tian et al. Cybersecurity (2018) 1:5 Page 15 of 17 Fig. 13 CaffeineMark results for original Android and Android with IM-Visor goal of an attacker is to steal sensitive keystrokes, another as secure peripherals to thwart such threat (Nahapetian option is to implement it entirely inside OS kernel with- 2016;Avivetal. 2012). out using the TrustZone. However, it would be unsecure considering the following reasons: First, an IME app may Related work get the coordinates from the linux driver interface “/de- Defense against the Android third-party IME apps v/input/event0” directly, which may result in the leakage belongs to a relatively new problem, I-Box(Chen et al. of sensitive keystrokes. Without the secure touch screen 2015) tries to establish a sandbox mechanism for third- driver in STIE, it can not guarantee the “pre-IME” if party IME apps, by analyzing the user keystrokes to not trusting the linux kernel touchscreen driver. Sec- determine whether to rollback the IME app. As a post- ond, the keyboard layout is a key to figure out which IME design, I-Box is vulnerable to the prefix-substitution keystroke is the user typing, without reconfiguring the attack and colluding attack. In contrast, IM-Visor is a display controller by TrustZone, it can not ensure that defense with the pre-IME nature and it can defend against the framebuffer is the right one in current typing the above attacks. Also the solution does not notice the environment. “Buffer revisiting threat”, so it can be cracked by sandbox bypassing attack. With hooks in revisiting APIs, IM-Visor Limitation can block the data leakage path from a user app to an Although IM-Visor has made the first “pre-IME” attempt IME app. to prevent sensitive keystroke leakage against third party To implement secure password entry, ScreenPass (Liu IMEs, some limitations still exist in its design and et al. 2013) designs a trusted software keyboard to enter implementation. the password. The use of trusted keyboard in ScreenPass is guaranteed by using the Optical Character Recognition SystemServer attacks Recently, some vulnerabilities (OCR), but the OCR itself can be cracked by attackers, so have been discovered to attack System Server (Horn 2014; the security of ScreenPass cannot be guaranteed. What’s Huang et al. 2015b; Ren et al. 2015;Shaoetal. 2016). But more, using a new keyboard instead of the original key- none of them can achieve a control flow hijacking, so mali- board will inevitably harm the user experience and the cious code cannot modify hooks in System Server to stop likelihood the user will adopt the new keyboard cannot IM-Visor from intercepting touch events. be guaranteed. In contrast, IM-Visor adopts TrustZone to provide secure isolation, so the security of IM-Visor can GUI attacks A malicious app may mimic the user app’s be guaranteed. Also IM-Visor reuses the original UI of an UI to mount phishing or click-jacking. However, at IME soft keyboard. present, there are quite a few prior systems which can For password and other privacy data protection, detect such attacks (Bianchi et al. 2015;Akhaweetal. researchers have also tried other solutions. Taint-tracking 2014;Huang etal. 2015a). (Kang et al. 2011) is a commonly used method. Taint- tracking tracks the sensitive information flow in the tar- Side channel attacks Malicious apps may use gravity get app and sets appropriate strategies to prevent the sensor and acceleration sensor to launch a side chan- outflow and abuse of sensitive data. TaintDroid (Enck et al. 2010) is the first taint-tracking method used in nel attack. IM-Visor provides the STIE for user typing, Android and it tracks the flow of sensitive data by in which we can also reconfigure those related sensors Tian et al. Cybersecurity (2018) 1:5 Page 16 of 17 tagging these data. ScreenPass (Liu et al. 2013)alsouses is a TrustZone-based memory acquisition mechanism to taint-tracking to monitor the password flow to prevent detect and prevent the newest malware, and the isola- illegal outflow. SpanDex (Cox et al. 2014)trackshow tion between the OS and the memory acquisition tool password information flows in an app, and compared to is achieved by TrustZone. These solutions focus on the the previous work, SpanDex focuses on the implicit infor- underlying system especially the kernel, and they have little relation to the Android frameworks. In contrast, IM- mation flow in apps. Although the taint-tracking method can get detailed information about sensitive data circula- Visor does much modification on the Android framework tion, it is not very suitable for tracking sensitive keystroke besides the kernel. AdAttester (Li et al. 2015)usesTrust- leakage. IME apps usually use native code in their key Zone to secure online mobile Ad attestation, leveraging function such as the send of sensitive inputs, but taint- the secure world of TrustZone to implement unforge- tracking cannot track the data flow in native code. Reg- able clicks and verifiable display. (Marforio et al. 2014) ulating ARM (Brasser et al. 2016) thwarts the sensitive uses TrustZone to ensure the trusted execution envi- information leakage through misused sensors or periph- ronment for the payment process. Similar to the two erals on smart personal devices. It replaces the original solutions,IM-Visoraimstoprotectonecertain functional peripheral drivers by a remote update when a user enters serviceinAndroid,butIM-Visorismorecomprehensive restricted spaces such as a federal building, and doesn’t asthetrustletinIM-Visorneedstocompletesomefunc- cancel the enforcement of usage policies until the user tional operation and needs more interaction with Android checks out. App Guardian (Zhang et al. 2015)thwarts framework while the trustlet in other two solutions mainly the runtime-information-gathering of malicious apps by complete the operation such as signature and encryption. blocking the runtime monitoring attempt. To realize this, App Guardian pauses the malicious app when sensitive Conclusion app is running. In contrast, IM-Visor will not pause the In this paper, we discuss the insecurity of IME apps, normal run of malicious IME apps which results in lit- including the Potentially Harmful Apps (PHAs) and mali- tle impact on Android system. Screenmilker (Lin et al. cious IME apps. We provide a deeper understanding that 2014) constructs an app which exploits the malicious use all the designs with the post-IME nature are subject to the of the Android ADB capabilities to monitor the screen prefix-substitution and colluding attacks. To remedy the and pick up a user’s password when he or she is typing. above post-IME system flaws, we propose a new idea, pre- Then it presents a mitigation mechanism that controls the IME, which guarantees that “Is this touch event a sensitive exposure of the ADB capabilities only to authorized apps. keystroke?” analysis will always access user touch events While IM-Visor and Screenmilker both aim to protect prior to the execution of any IME app code, and designed the sensitive keystrokes, there are substantial differences: an innovative TrustZone-based framework named IM- The threat in Screenmilker is caused by the flaws of the Visor which has the pre-IME nature. A prototype of Android permission system, whereas IM-Visor regards IM-Visor has been implemented and tested with several IME apps as the threat. The complicated construction of most popular IMEs. The experimental results show that the attacks in Screenmilker makes the attacks difficult IM-Visor has small runtime overheads. to apply widely, while the attacks in IM-Visor commonly Acknowledgment exist and can be built using repackaging. We would like to thank the anonymous reviewers for their valuable comments In recent years, TrustZone has obtained lots of research and suggestions. Yazhe Wang’s work was supported by the National Key and applications in many aspects. Some researchers aim Research and Development Program of China NO.2017YFB0801900 and Youth Innovation Promotion Association of CAS. Peng Liu was supported by NSF to improve the security and usability of TrustZone. CNS-1422594, NSF CNS-1505664, and NSF SBE-1422215 (social). SecReT (Jang et al. 2015) mainly solves the establishment of secure communication between the Rich Execution Authors’ contributions CT conceived of the study and participated in the design of IM-Visor. YW and Environment (REE) and Trust Execution Environment PL participated in the implementation of IM-Visor and drafted the manuscript. (TEE). ICE (Sun et al. 2015b) runs the secure code in the QZ carried out the evaluation for IM-Visor. CZ participated in drafting the non-secure domain by designing isolated secure environ- manuscript. All authors read and approved the final manuscript. ment to restrictthecodesizeofTEE environment. Competing interests Besides the above ones, more researchers aim to apply The authors declare that they have no competing interests. TrustZone to protect the sensitive kernel operations and Publisher’s Note sensitive service. Hypervision (Azab et al. 2014)uses Springer Nature remains neutral with regard to jurisdictional claims in TrustZone to reinforce the Linux kernel by replacing sen- published maps and institutional affiliations. sitive instructions in Linux kernel and controlling access Author details to sensitive kernel data. TrustOTP (Sun et al. 2015a)uses State Key Laboratory of Information Security, Institute of Information TrustZone to protect the full process from generation Engineering, Chinese Academy of Sciences, Beijing 100093, People’s Republic to use for one-time key. TrustDump (Sun et al. 2014) of China. School of Cyber Security, University of Chinese Academy of Tian et al. Cybersecurity (2018) 1:5 Page 17 of 17 Sciences, Beijing, People’s Republic of China. College of Information Sciences Li W, Li H, Chen H, Xia Y (2015) Adattester: Secure online mobile advertisement and Technology, Pennsylvania State University, University Park 16802, PA, USA. attestation using trustzone. In: MobiSys ’15 Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Received: 4 January 2018 Accepted: 17 April 2018 Services. ACM, Florence. pp 75–88 Lin CC, Li H, Zhou XY, Wang XF (2014) Screenmilker: How to Milk Your Android Screen for Secrets. Network and Distributed System Security Symposium. The Internet Society, San Diego References Liu D, Cuervo E, Pistol V, Scudellari R, Cox LP (2013) ScreenPass: secure Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic password entry on touchscreen devices. In: Proceeding of the, search. Commun ACM 18(6):333–340 International Conference on Mobile Systems, Applications, and Services. Akhawe D, He W, Li Z, Moazzezi R, Song D (2014) Clickjacking revisited: A ACM, Taipei. pp 291–304 perceptual view of ui security. Usenix Conference on Offensive Marforio C, Karapanos N, Soriente C, Kostiainen K, Capkun S (2014) Technologies. USENIX Association Smartphones as Practical and Secure Location Verification Tokens for Aviv AJ, Sapp B, Blaze M, Smith JM (2012) Practicality of accelerometer side Payments. Network and Distributed System Security Symposium. The channels on smartphones. In: Proceeding ACSAC ’12 Proceedings of the Internet Society, San Diego 28th Annual Computer Security Applications Conference. ACM, Orlando. Nahapetian A (2016) Side-channel attacks on mobile and wearable systems. pp 41–50 IEEE Consumer Communications & NETWORKING Conference. IEEE, Las Azab AM, Ning P, Shah J, Chen Q, Bhutkar R, Ganesh G, Ma J, Shen W (2014) Vegas Hypervision across worlds: Real-time kernel protection from the arm Ren C, Zhang Y, Xue H, et al (2015) Towards discovering and understanding trustzone secure world. In: ACM Sigsac Conference on Computer and task hijacking in android. In: Usenix Conference on Security Symposium. Communications Security. ACM, Scottsdale. pp 90–102 USENIX Association, Washington, D.C. pp 945–959 Bianchi A, Corbetta J, Invernizzi L, Fratantonio Y, Kruegel C, Vigna G (2015) Shao Y, Ott J, Chen QA, Qian Z, Mao ZM (2016) Kratos: Discovering inconsistent What the app is that? Deception and Countermeasures in the Android security policy enforcement in the android framework. In: Proc. 23rd Annual User Interface. In: 2015 IEEE Symposium on Security and Privacy. IEEE, San Network and Distributed System Security Symposium (NDSS’16). ISOC Jose. pp 915–930 Suarez-Tangil G, Tapiador JE, Peris-Lopez P, Ribagorda A (2013) Evolution, Braden WW (1969) Random common sentences. http://www.englishinuse. detection and analysis of malware for smart devices. IEEE Commun Surv net/. Accessed Aug 2016 Tutor 16(2):961–987 Brasser F, Kim D, Liebchen C, Ganapathy V, Iftode L, Sadeghi AR (2016) Sun H, Sun K, Wang Y, Jing J (2015a) TrustOTP: Transforming Smartphones into Regulating ARM TrustZone Devices in Restricted Spaces. International Secure One-Time Password Tokens. In: ACM Sigsac Conference on Conference on Mobile Systems, Applications, and Services. ACM, Computer and Communications Security. ACM, Denver. pp 976–988 Singapore. pp 413–425 Sun H, Sun K, Wang Y, Jing J, Jajodia S (2014) TrustDump: Reliable Memory Chen J, Chen H, Bauman E, Lin Z, Zang B, Guan H (2015) You shouldn’t collect Acquisition on Smartphones. European Symposium on Research in my secrets: thwarting sensitive keystroke leakage in mobile ime apps. In: Computer Security. Springer, Wroclaw Proceeding SEC’15 Proceedings of the 24th USENIX Conference on Sun H, Sun K, Wang Y, Jing J, Wang H (2015b) Trustice: Hardware-assisted Security Symposium. USENIX Association, Washington, D.C. pp 675–690 isolated computing environments on mobile devices. In: Ieee/ifip Cox LP, Gilbert P, Lawler G, Pistol V, Razeen A, Wu B, Cheemalapati S (2014) International Conference on Dependable Systems and Networks. IEEE Spandex: Secure password tracking for android. Usenix Conference on Computer Society, Rio de Janeiro. pp 367–378 Security Symposium. USENIX Association trustonic Trustzone, tee and trusted video path implementation. http://www. Enck W, Gilbert P, Chun BG, Cox LP, Jung J, Mcdaniel P, Sheth AN (2010) arm.com/files/event/Developer_Track_6_TrustZone_TEEs_and_Trusted_ Taintdroid: an information flow tracking system for real-time privacy Video_Path_implementation_considerations.pdf. Accessed Nov 2016 monitoring on smartphones. In: Usenix Conference on Operating Systems Windowmanager. Android Developer. https://developer.android.com/ Design & Implementation. USENIX Association, Vancouver. pp 393–407 reference/android/view/WindowManager.html/. Accessed Nov 2016 Google (2007) Number of available applications in the google play. https:// Zhang N, Yuan K, Naveed M, Zhou X, Wang XF (2015) Leave me alone: www.statista.com/statistics/266210/number-of-available-applications-in- App-level protection against runtime information gathering on android. In: the-google-play-store/. Accessed Nov 2017 2015 IEEE Symposium on Security and Privacy. IEEE, San Jose. pp 915–930 Zhou Y, Jiang X (2012) Dissecting android malware: Characterization and Horn J (2014) Cve-2014-7911: Privilege escalation using objectinputstream. evolution. In: 2012 IEEE Symposium on Security and Privacy. IEEE, San https://www.reddit.com/r/netsec/comments/2mr9cz/cve20147911_ Francisco. pp 95–109 android_50_privilege_escalation_using /. Accessed Nov 2014 Zhou Z, Gligor VD, Newsome J, McCune JM (2012) Building verifiable trusted Huang J, Li Z, Xiao X, Wu Z, Lu K, Zhang X, Jiang G (2015a) SUPOR: Precise and path on commodity x86 computers. In: 2012 IEEE Symposium on Security scalable sensitive user input detection for android apps. Usenix and Privacy. IEEE, San Francisco. pp 616–630 Conference on Security Symposium. In: 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C. pp 977–992 Huang H, Zhu S, Chen K, Liu P (2015b) From system services freezing to system server shutdown in android: All you need is a loop in an app. ACM Sigsac Conference on Computer and Communications Security. ACM, Denver InputConnection. Android Developer. https://developer.android.com/ reference/android/view/inputmethod/InputConnection.html/. Accessed Nov 2016 InputMethodManager (2016) The reference of android developer. https:// developer.android.com/reference/android/view/inputmethod/ InputMethodManager.html. Accessed Nov 2016 Jang J, Kong S, Kim M, Kim D, Kang BB (2015) SeCReT: Secure Channel between Rich Execution Environment and Trusted Execution Environment. Network and Distributed System Security Symposium. The Internet Society, San Diego John LK, Eeckhout L (2005) Caffeinemark 3.0. http://www.benchmarkhq.ru/ cm30/info.html. Accessed Nov 2016 Kang MG, Mccamant S, Poosankam P, Song D (2011) DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation. Network and Distributed System Security Symposium, NDSS. The Internet Society, San Diego

Journal

CybersecuritySpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off