DroidEcho: an in-depth dissection of malicious behaviors in Android applications

DroidEcho: an in-depth dissection of malicious behaviors in Android applications A precise representation for attacks can benefit the detection of malware in both accuracy and efficiency. However, it is still far from expectation to describe attacks precisely on the Android platform. In addition, new features on Android, such as communication mechanisms, introduce new challenges and difficulties for attack detection. In this paper, we propose abstract attack models to precisely capture the semantics of various Android attacks, which include the corresponding targets, involved behaviors as well as their execution dependency. Meanwhile, we construct a novel graph-based model called the inter-component communication graph (ICCG) to describe the internal control flows and inter-component communications of applications. The models take into account more communication channel with a maximized preservation of their program logics. With the guidance of the attack models, we propose a static searching approach to detect attacks hidden in ICCG. To reduce false positive rate, we introduce an additional dynamic confirmation step to check whether the detected attacks are false alarms. Experiments show that DROIDECHO can detect attacks in both benchmark and real-world applications effectively and efficiently with a precision of 89.5%. Keywords: Semantic attack model, Android malware detection, Inter-component communication graph, Privacy leakage Introduction ion of security analysts and antivirus software, such as Nowadays, Android malware detection is facing two crit- PROGUARD (ProGuard 2017) and reflection (Zhou and ical challenges: 1) how to design a precise and efficient Jiang 2011). All of these raised challenges for the existing model to represent malware; 2) how to reduce false alarms detection approaches to reach a desirable precision and and distinguish real malware from benign applications. scalability simultaneously. Android malware varies in many aspects such as attack On the other hand, it is challenging to eliminate targets, attack methods, and applied obfuscation tech- greyware from malware (Symantec Inc. 2017), espe- niques. For example, Android malware may steal users’ cially when they are requesting privileged permissions sensitive information (Grace et al. 2012;Arztetal. 2014a), for accomplishing specific functionalities. For instance, elevate their privilege (Xing et al. 2014;Gunadiand Tiu WECHAT, one of the top-ranked applications in Google 2013), deplete device resources (Vekris et al. 2012;Pathak Play, requests permissions of reading SMS messages and et al. 2012), and remote control users’ devices (Zhou accessing network simultaneously. It may raise the con- and Jiang 2012). Malware may accomplish attack mis- cern of security analysts since it is speculated as a poten- sions either individually or collaboratively (Octeau et al. tially malicious behavior which sends SMS messages out 2013;Bosuetal. 2017), perform attacks only once or to the network. However, the fact is that it only reads periodically (Zhou and Jiang 2012), and be triggered by the SMS messages from its remote server for the two- the installation or a broadcast message. In addition, mal- factor authentication use. Similar cases are pervasive on ware may adopt several mechanisms to bypass the detect Android: weather applications show the weather situa- tion and forecast to users, and thereby, need to read and *Correspondence: mengguozhu@gmail.com send out users’ location information; social applications SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, may ask for users’ contacts to find friends quickly; fit- Beijing, China Nanyang Technological University, Singapore, Singapore ness applications sometimes access the sensors in order Full list of author information is available at the end of the article to measure users’ exercise. Therefore, the detection based © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Meng et al. Cybersecurity (2018) 1:4 Page 2 of 17 on an imprecise and coarse-grained malicious behavior can be utilized to guide our detection of attacks in a model would lead to a high false positive rate. precise way. Even with a precise model of malicious behaviors, mal- Meanwhile, we transform Android applications into a ware searching in applications with static approaches is comprehensive graph, incorporating call graphs between not easy. New execution paradigm, system libraries and methods, and control flow graphs as per method. We rich communication features provided by Android have conduct an in-depth static analysis through the graph facilitated the development of rich-functionality applica- with the guidance of attack model, and generate a full tions. On the other hand, however, they also make static path with the trigger and the predicates that guarantee analysis of application more complicated and difficult, the occurrence of these behaviors. The detected malicious which are summarized below. behaviors will be filtered by two conditions: if a seemingly malicious behavior is triggered by the user, it is likely that Implicit Execution Sequence. Android framework the behavior is user-intended, which we regard it as being provides a variety of program execution harmless; presence of suspicious behaviors does not mean environments, callbacks and control frameworks for there is a real attack. It happens because some applica- each Android component .Itisknownas lifecycle. tions indeed need to carry out several seeming “malicious” For example, after an activity is started by the system, behaviors to fulfill their tasks with good purposes. This is it will execute the methods onCreate(), learnt and induced by investigating a group of applications onStart() and onResume() in proper order, under the same category or being similar. We make use which cannot be observed from the application code; of the mined social knowledge to filter out these harmless • behaviors with a high level of confidence, i.e., these behav- Various Triggers for an Application. There are iors are likely a necessary part for applications. It does not many ways for an application to interact with the only facilitate the efficiency of detection, but also reduce external environment. The application can be false positive in practice. triggered or impacted by users’ GUI operations (e.g., After the identification of malicious behaviors, we pro- clicking a button). It can register a broadcast receiver pose an approach to confirm the detected attacks with the to respond once a broadcast message arrives. In dynamic execution. Our dynamic analysis is driven by the addition, local sensors can drive the application to attack traces generated previously, and provides a satisfied run in a pre-defined way. On the other hand, an condition to guarantee the program to proceed along the application can be started and driven via remote trace. The dynamic execution reproduces the occurrence messages, such as Google Cloud Messaging (GCM), of attacks, and makes the attack detection more precise. HTTP response, and an incoming SMS or phone call; Different from the existing research on static analysis Complicated Communication Mechanisms. based approaches (Arzt et al. 2014a; Arzt and Bodden Although each application is running in a separated 2016;Xuetal. 2016; Wei et al. 2014), our work starts from sandbox, Android provides them various ways to the comprehension of Android malware by constructing communicate with each other. For instance, the semantic models. To reduce the false positive rate, we Intent model (Octeau et al. 2013)isthemost propose an approach to confirm attacks complying with compelling method for component communication. the identified executed traces. To sum up, we make the Additionally, applications can define bound services, following contributions: for example, an AIDL (Android Interface Definition Language) interface, and implement a Binder or a Attack model We propose a novel representation, to Messenger to accomplish the communication even characterize malicious behaviors. An attack in the between different processes or applications. model is constituted of target assets, execution To overcome the above challenges, we propose an inte- actions, triggers, execution flows and apps’ grated framework called DROIDECHO to analyze Android declaimers. It can facilitate the understanding of the applications. First, we summarize the features of attacks essential features of attacks, and the detection of happening on the Android platform, and propose a novel malware. attack model. The model illustrates a variety of attack Accurate attack detection approach We propose a types at an abstract level, which is platform-independent. richly descriptive representation, named ICCG, to In particular, an attack is composed of: assets, which depict an Android application, with a maximal preservation of information. Based on ICCG, we are the targets of attacks; actions, the execution oper- design a synthetic approach to identify a malicious ations performed on assets, and triggers, of which one entrance to the app that leads to the attack behav- application by considering both the engineering iors. Then we specialize the attack model into attack aspect and the social aspect. A reduced but sufficient instances which are close to the Android platform, and static analysis is to prove the presence of suspicious Meng et al. Cybersecurity (2018) 1:4 Page 3 of 17 behaviors, then confirmed with the help of the learnt releasing to exhaust battery quickly. On the Android plat- social knowledge. form, all the assets we concern about can be accessed by Attack Confirmation After the identification of invoking certain system APIs. We list the representative malicious behaviors, we conduct a confirmation examples of these assets on Android as follows. process to prove the existence of a real attack with Information Assets: Identity code, Contact, SMS dynamic execution. The dynamic execution is fed messages, File system, Location, System setting, etc. with the traces of malicious behaviors generated by Software Assets: Phone service, SMS service, DROIDECHO, and further identifies the satisfiable Package Manager, Download Manager, Broadcast conditions. Then it drives the application to execute service, etc. along the traces, and thereby reproduces the attacks Hardware Assets: Camera, Media, Sensor, etc. for confirmation. Evaluation We have evaluated DROIDECHO on the Actions malware benchmarks (i.e., GENOME and An attack action is an operation performing on a cer- DROIDBENCH), and 7,643 real world applications. It tain asset with the purpose of acquisition, tampering and shows that DROIDECHO outperforms the interception, e.g., to fetch the IMEI code of the mobile state-of-the-art tool. Moreover, we have found out phone. 444 applications with malicious behaviors in Google Play, and have a competitive edge in precision of Category According to the type of the target assets, 89.5% to the counterpart approaches and tools. actions can be categorized into several classes. For exam- ple, an action can acquire, edit, or delete some informa- Organization tion stored on device; invoke, interrupt or stop a service Section Semantic model of attack proposes abstract provided by Android; and occupy or release a hard- models for various attacks in Android. Section The inter- ware resource. Therefore, the semantics of actions can be component communication graph describes a represen- uniquely specified by the association of the action type tation for Android applications. Section System design of and the target assets. In addition, there is a unique kind DroidEcho presents our approach in malware detection. of actions on Android which are used for communication Section Evaluation gives a comprehensive evaluation for (see Section The inter-component communication graph our approach. Section Discussion discusses the experi- for more details). Within communication, there must be ments and limitations of our approach. Section Related at least one sender and one receiver, and the communi- work summarizes summarizes relevant literatures, and cation can occur between an application and the external Section Conclusion concludes this work. environment, or between two components in one appli- cation. As a result, we summarize four actions related to Semantic model of attack communication in the scope of application. Table 1 shows In this section, we first give an in-depth discussion on the categories of actions covered in this paper. the attacks happening on the Android platform, and then Parametrization An action is often implemented by provide a formal description of these attacks. invoking a set of system APIs. These APIs are orga- nized with a certain dependency relationship. For exam- Building blocks ple, the action of retrieving data stored in a content An attack on the Android platform has its unique features provider can be described as: obtaining an instance and characteristics. It has a variety of attack targets, and of ContentResolver; specifying the URI of the tar- includes a sequence of actions that often leverage the get asset; and retrieving the data stored in this con- APIs provided by Android. In order to depict these ele- tent provider. Every action of retrieving data in content ments of an attack, we start with introducing the building provider follows the above processes. And we provide elements of attacks and their representative examples more details about this in Section Action recognition. on Android, in order to construct a general and formal As a functional unit in the attack model, an action definition of attacks. usually has an input, an output or both. Let α be an action, and β be an asset, then α(β) denotes the Assets Assets are referred to hardware, software and information input of the action α is the asset β,and α on Android devices, which are the targets of attacks. For denotes the output of the action α is the asset β (refer example, contact information is an important asset, which to Section Flows). A variety of concrete actions are attackers aim to steal and make use of for malicious pur- derived from parameterizing these actions with assets. poses; front light is a battery-consuming hardware such For instance, when acquiring the content of a content that some malicious applications may acquire it without provider, we can specify some assets as the target, such as Meng et al. Cybersecurity (2018) 1:4 Page 4 of 17 Table 1 The category of actions on Android Category Operation Action Example Corresponding Implementation acquire get SMS message ContentResolver.query(Inbox) insert insert a contact ContentResolver.insert(Contact) Information-based edit change system setting Wallpaper.setBitmap(Image) delete delete local files File.delete() invoke call a number startActivity(Intent{tel:num}) Software-based interrupt block SMS messages abortBroadcast() stop uninstall an app startActivity(Intent{pkg:app}) occupy hold the wakelock WakeLock.aquire() Hardware-based release release the wakelock WakeLock.release() e_send send data to environment sendTextMessage(SMS) e_recv receive data from environment getMessagesFromIntent(Intent) Communication i_send send data to other component startService(Intent) i_recv receive data from other component getIntent(Intent) ContactsContract.Contacts.CONTENT_URI and triggered by environmental input as potential attacks CalendarContract.Events. As a consequence, two for a further analysis. actions are generated to fetch the contact list and events in the calendar, respectively. Table 1 list 9 basic kinds of As suggested by (Yang et al. 2013;Chenetal. 2013), behaviors that would never been executed until they are actions, based on which more actions can be generated by parametrization with explicit target assets. triggered by the user interaction reflect the “intention” of the user. Therefore, in this work, we assume that user Triggers interactions will not trigger any malicious behaviors, i.e., Triggers are events which are taken as inputs to an appli- potential attacks that are triggered by user interactions are cation and lead to the occurrence of a behavior. Although false positive. However, environmental input triggers can triggers, which occur during runtime, are unpredictable proceed stealthily, preventing users from knowing them. for applications, the application can provide handlers to This kind of triggers usually bring in many security risks, subscribe and capture these triggers. Once the application which are our main concern in this paper. receives a subscribed trigger, it will go into the life cycle Since triggers are external objects that cause the exe- and execute specific methods. In light of the awareness of cution of attacks, we can instead recognize e_recv (see users, we present two sorts of triggers in the following: Table 1) to observe the arrivals of triggers. Specifically, the listeners can be categorized in terms of types of triggers. User Interaction. This kind of triggers are usually For example, onClick(View), onDrag(View,...) GUI-related, which are visible to the operating users. and onKey(View,...) are the entry points of pro- For example, when the user clicks a button drawn on gram when a user interaction trigger comes. While the screen, the behavior is triggered and starts to onCreate() and onReceive() are the entry points execute. From this, the user can learn that the for the boot of applications and a broadcast mes- behavior is caused by his/her click operation, and we sage, respectively, which are regarded as environmental call it user-awareness. For simplicity, we assume that inputs. users can know the behaviors from the context which the user interaction causes. Flows Environmental Inputs. There is another kind of Actions have a flow relationship in between. It is a kind triggers which can drive the execution of an Android of dependency relationship which is either directional or application. The trigger could be the initialization of contextual. The directional relationship indicates the cer- the application, a broadcast message or registered tain order of execution, which has been defined in the listeners to sensors. The whole process is free from program logic for a specific task ; and the contextual the involvement of the user, which means that the relationship can be described as a semantic connection user is likely unware of the execution of behaviors. As between two actions, for example, the input of an action is a consequence, we classify malicious behaviors the output of the other action . Generally, the contextual Meng et al. Cybersecurity (2018) 1:4 Page 5 of 17 γ γ relationship needs a transition of the negotiated data from PL =− → e_recv → acquireacquire  e_send(γ ) one participant to the other. A flow can exist between the environment and an Information interception Mobile devices can interact action, and triggers are their negotiated data between with the external environment in many ways. However, them. Take an incoming SMS message for example, if malicious applications intercept the communication, sus- an application registers a BroadcastReceiver for SMS pend, or even break off the communication. The common messages, once an incoming SMS message arrives, the attacks include blocking an incoming SMS messages and application will start to execute from the listener, and it phone calls. For such kind of attacks, malicious applica- can also get the content of the message as input. There- tions need to register a listener (i.e., e_recv)for broadcast fore, there exists a directional and contextual relationship messages of incoming messages and calls, which stops between the environment and the action acquire(SMS), the spreading (i.e.,intercept) to avoid the messages from i.e., reaching to other applications or the user. II = e_recv → intercept(γ ) A flow can also exist between two actions. After an Content tampering Malicious applications may tamper application gets an incoming SMS message, it can send content on mobile devices, such as contact, SMS, account, the message to a remote server via the Internet. In such and system settings. It can cause severe damages to the a case, it is a contextual flow between these two actions. user. Usually, an application can insert, update and delete The flow guarantees the two actions perform on the same an item in a content provider with specific permissions. SMS message. Therefore, we present the flow as: In addition, it can change system settings such as network connection, wallpaper and sleep time. We use insert, edit and delete to describe such kind of behaviors. The trigger of this attack will not give rise to users’ attention and does Attack models not have any data flow relationship with these actions. The Based on the aforementioned building blocks for an attack is defined as follows: attack, we define different attacks in this section. In the remainder of this section, we use the following notations. CT =− → e_recv → α(γ ), where α ∈{insert, edit, delete}. E is the set of Environmental Input triggers; t is the trigger of the attack and t ∈ E; Asset is the set of assets involved in Service abuse Malicious applications may abuse the ser- the attack; Let α be an action or a trigger, β be an action, vices provided by Android (Luo et al. 2013). According and γ be an asset. A flow is either a control flow denoted to our investigation, the most prevailing services which as α → β,oradataflowdenotedas α  β. are abused include phone service, SMS service, package manager,and download manager. For example, if an appli- Attack taxonomy cation possesses the permission of sending SMS messages, We conduct a comprehensive investigation of existing it can subscribe a premium-rate mobile service which attacks of malicious behaviors (Enck et al. 2009;Shabtaiet causes users’ financial charge. Let α be the kind of actions al. 2010;Encketal. 2011;ZhouandJiang 2011; 2012), and which abuses services, and the attack model can be pre- propose a taxonomy of attacks in terms of these building sented as: blocks and semantic information as follows. SA =− → e_recv → α(γ ), where α ∈{invoke, stop}. Privacy leakage Privacy leakage (Enck et al. 2010;Grace et al. 2012; Zhang and Yin 2014) refers to the exposure Resource depletion Due to portability and simplicity, of sensitive information on devices. As discussed in the mobile devices usually carry low-frequency CPU, RAM action part, such kind of information can be acquired by of limited size and small capacity battery. Mobile devices specifying an acquire action, which is regarded as source thereby can only provide a limited computation capa- in the attack of privacy leakage. If there exists a data bility, storage and energy. It would make worse if any flow from the return value of the acquire action to the installed applications occupy these resources immoder- data sent out to the external environment by a com- ately, which can influence other applications, and even the munication action, usually called sink, privacy leakage battery life of the device. Either intentionally or uninten- happens. In addition, the attack needs to happen without tionally, applications keep consuming resources (Pathak users’ awareness, and it is not necessary for the trigger to et al.; 2012;Vekrisetal. 2012) or carry on useless and have a dataflow relationship with these two actions. As a endless works (Oliner et al. 2012; Hao et al. 2013), while result, the formal attack model of privacy leakage can be never release or stop them. Let occupy be the kind of defined as: actions which exhausts resources, and release be the kind Meng et al. Cybersecurity (2018) 1:4 Page 6 of 17 of actions which releases resources. And we use  to in which some behaviors are excluded from considera- show a missing flow between these two actions. The attack tion for the determination of attacks. The violation of model is given in the following: certain security properties cannot imply the occurrence of attacks. Some applications may need to carry on some RD =− → e_recv → occupy(γ )  release(γ ) suspicious looking behaviors which they already claimed the potential security violation explicitly. We conclude Discussion that the users who install their applications would like to The taxonomy of attacks is based on the 102 malware undertake the introduced risks by default. Therefore, in families we have studied. However, there are some attacks this work, we filter out the “attacks” that are allowed by the out of detection of our approach, such as fishing, adware users, and remove them from the generated attack report. and privilege escalation.Fishingisakindof attacksin which one application disguises an authentic and legiti- The inter-component communication graph mate application, and induces users to enter their creden- For an accurate representation of Android applications tials of, for example, bank account (Prince). Adware is a and the convenience of attack detection, this section program that displays advertisements to its users, which presents the proposed the inter-component communica- isannoyingratherthan harmfulatmostoftime(F-Secure tion graph (ICCG) to capture all possible communications Lab 2013). Some applications may exploit the vulnerabil- between components and threads inside Android applica- ities of Android, such as Exploid, RATC/Zimperlich and tions. Ginger Break (Xuxian and Yajin 2013), to elevate the priv- ilege once installed on device; Pilup (Xing et al. 2014)is Android communication medium a newfound flaw in Package Management Service which Medium is a special data structure used for commu- can be exploited by malicious applications only during the nications. The communications can occur between phase of upgrading the Android OS. At last, side chan- either two components (i.e., activity, service, broad- nel attacks (Schlegel et al. 2011;Hilgers et al. 2014;Chen cast receiver and content provider), or two isolated et al. 2014), which collect memory information or timing processes. Medium is playing a critical role in information, are not our scope of attack detection. the behavior of Android applications. Besides the The insufficiency of DROIDECHO comes from two frequently-talked Inter-Component Communication(ICC) aspects: 1) our static analysis is carried on Java code, and (Orthacker et al. 2011; Schlegel et al. 2011), which is based does not go inside the native code. Many of malware of on the Intent medium, there are three other mediums privilege escalation utilize native code to elevate the privi- which can be also used during the communication. Here lege; 2) we try to avoid to make a subjective judgement, but we provide the different types of mediums existing on the prefer to detect an objective existence of malicious behav- Android platform. iors. That is, fishing and adware just deceive and bother users respectively, which do not violate security policies Intent Intents are the main vehicle for communication. (Enck et al. 2009) of Android precisely. We give the statis- One intent can be either explicit or implicit. Explicit tics of attacks mentioned previously in Table 2, indicating intents have a specific class to start, while implicit intents that our approach can detect up to 90.4% of attacks in do not specify the corresponding class, and the sys- theory. tem will select the most well-suited class or application to execute. An explicit intent can only invoke a spe- Disclaimers cific component, which is defined in the constructor, There is a significant exception for determining an attack or by calling setComponent(ComponentName) or - disclaimers. A disclaimer is a white list for an application setClass(Context, Class); an implicit intent can be received by many well-suited components. It appoints Table 2 The category of attacks on Android potentialreceivers bysettinganactionintheconstruc- Attack Percent Supported by DroidEcho tor or setAction(String) (Meanwhile, it can be Privacy Leakage 31.4 instrumented with a data type to restrict its receivers) Information Interception 11.6 (Feng et al. 2014). Intent can influence the execution order Content Tampering 13.4 (a.k.a., control flow) of the application, and also impact on the data flow if enclosed with extras. Service Abuse 31.4 Resource Depletion 1.8 Message Message is a concise data structure for arbitrary Fishing 1.7 ✗ data. Two isolated processes or threads can communicate Adware 2.3 ✗ with each other by transferring a message. In general, the Privilege Escalation 6.4 ✗ message receiver has to create a Messenger to handle the Meng et al. Cybersecurity (2018) 1:4 Page 7 of 17 received messages. On the sender side, it needs to obtain method invocation relationship and possible communica- the reference to this Messenger, and sends its crafted tions between different functions. In the fine-grained level message by invoking send(Message message) of the of granularity, a node is in-depth dissected and shows the Messenger. In order to send a message, for example, to a internal logic, i.e., control flow. When we are identifying daemon service, the component can first bind to this ser- the elements of attacks, especially behaviors, we need to vice via bindService(), and then fetch the reference goin deep atthecodelevel,andrecognizethedifferent to the Messenger from the returned Binder object. patterns of behaviors. We employ two different kinds of edges to denote the relationship between nodes - call relationship and com- Binder Binder is used for a component to talk to a dae- munication relationship. Flow edges reflect the call rela- mon service. The component which attempts to bind to a tionship among nodes. This is the primary concept in service needs to invoke bindService() and implement the program analysis, which consists of explicit calls and ServiceConnection, which establishes the connec- implicit calls. Here we emphasize the unique implicit tion with the service. On the service side, it needs to pro- calls, i.e., Android Lifecycle, existing on Android. An vide an inherited class of Binder, exposing public methods Android lifecycle indicates an implicit function invoca- to customers; or design an AIDL interface as well as the tion between different methods or classes. The implicit implementation. After that, the component can obtain a calls are either callbacks passed to a concrete method, binder object, which is a remotable object for a lightweight or control frameworks specifying a call sequence. Besides remote procedure call. In addition, AIDL can be exposed the lifecycle features of standard Java, e.g., the method to other applications for remote invocations. void start() of one thread instance will implicitly call the override method void run(), Android has included Persistent storage On Android, applications may many libraries to support an amount of implicit calls. For exchange data through persistent storage. There are three each component of Android, it has a unique call sequence types of persistent storage: File, Shared Preferences and pre-defined by Android. In addition, all GUI components SQLite database. They can be used for applications or on Android allow developers to pass a callback to execute components to exchange data, that is, they provide an functionalities when the corresponding event occurs. implicit data flow from one component to another. The communication edges are connecting between nodes and mediums. As defined previously, there are four kinds of mediums used for communication, and it is worth Inter-component communication graph mentioning that the communications are not only show- Definition 1. Let M be the communication mediums ing the logic order of execution, some of them also enclose existing on Android. An ICCG is a directed graph defined data which can be transferred from one node to another as G ={V , E , E },where V is a set of nodes; E : V × V f c f node. is a set of flow edges; and E : V × M × Vis a set of communication edges. We use the DroidKungFu malware as an example to The nodes of a graph are the methods contained in explain the ICCG. As shown in Fig. 1 (a), there are an the application, which come with two levels of granular- activity and a service, which communicate via an Intent ity. The coarse-grained nodes only represent the signature medium. The activity obtains sensitive data (refer to  1 in of the functions, and help to express the relationship onStart), and passes the data to the service. Then the between functions in the system level. We can learn the servicesends thedataoutat  2 in onCreate.Figure 1 Fig. 1 An example of malicious behaviors and the corresponding ICCG. a The snippet code of malicious behavior b The corresponding ICCG of the code Meng et al. Cybersecurity (2018) 1:4 Page 8 of 17 (b) shows the constructed ICCG based on the code. As construction, attack detection and attack confirmation. discussed in the previous section, each node represents The first phase disclaimer learning receives the descrip- a method of the application, and contains a control flow tive text of applications as input, and generates a white list graph. The nodes are connected by two kinds of edges: of “necessary” behaviors (a.k.a., disclaimer of the applica- Android mediums (e.g,. the Intent object) and method tion) in a supervised manner. The white list will be used to invocations either implicit invocations (e.g., lifecycle) or exclude the detection for the claimed functionality of the explicit invocations. application. Second, ICCG construction takes class files and the manifest file of the application as input, and con- Sufficiency of ICCG structs an ICCG, which is then passed to the third phase. We construct ICCG for representing the overall structure Attack detection can find out, if any, existing attacks and of functions in the application, and search if any attack the corresponding traces which cause these attacks in the model is hidden in the graph. As the attack model pro- application. At last, attack confirmation receives the can- posed in Section Semantic model of attack is general and didate attacks, and determines whether one attack candi- platform-independent, we show the sufficiency of ICCG date is a false positive or not by a trace-guided dynamic to detect attacks below. execution. As modeled in Section Semantic model of attack,an attack is a set of operations which the attacker performs Disclaimer learning to achieve a certain objective, and it is composed of 5 Some Android applications may perform seemingly sus- essential elements. ICCG retains almost all program infor- picious behaviors while they are actually demanded to mation, and we can extract a number of call sequences accomplish the functionality. The demanded function- from it. By checking each call sequence, we can recog- ality and the risks it may bring are usually claimed in nize actions which are attack related, identify the trigger their descriptive text. We regard this as a benign behav- of it, and perform data flow analysis on the call sequence. ior (henceforth disclaimer), and it will not be considered Hence, we could find a mapping from the attach model as an attack candidate. For example, TripAdvisor is a to the ICCG, which means that ICCG contains sufficient travel application, which can provide the nearby restau- information to detect an attack inside. rants and hotels when the user is travelling. For ease of use, it acquires the permission FINE_LOCATION to learn System design of DroidEcho the user’s location such that it can provide the most suit- This section presents the design of DROIDECHO.As able information for the customers. Although we detect that TripAdisor has a privacy issue, which sends the user’s showninFig. 2,DROIDECHO takes as input an Android location to a remote server from time to time, we regard application, which contains the class files, the manifest file and the description of its functionality. DROIDE- this as being benign and harmless. CHO will generate an attack report which contains identi- As shown in Fig. 3, we obtain the descriptions of fied malicious behaviors and the corresponding traces of applications and perform a description-to-permission these behaviors for forensic use. DROIDECHO leverages fidelity analysis (Qu et al. 2014). The fidelity analysis the attack model which is presented in Section Seman- builds a description-to-permission relatedness model in tic model of attack as the guidance for attack detection, which one permission is associated with a list of noun and proceeds in four phases: disclaimer learning, ICCG phrases. For the description of a given application, we Fig. 2 The architecture of the system Meng et al. Cybersecurity (2018) 1:4 Page 9 of 17 reference at a time, and Set(values) denotes a set of possible values to which the variables can be assigned. PointerTable plays a critical role in the step of link anal- ysis and action recognition. During the step link analysis, PointerTable is used to infer the actions and classes of an Intent object, thereby DROIDECHO can identify which components are able to receive this Intent. And DROIDE- CHO needs the PointerTable to recognize the semantics of actions during the action recognition. For example, when DROIDECHO encounters an operation to query a content provider, it needs to learn the value of the argument URI, to distinguish different content providers. Parts of our pointer analysis are based on SPARK Fig. 3 The learning process of disclaimers (Lhoták and Hendren 2003), which is a pointer analysis framework. It can cluster the variables into several sets, i.e., Set(variables), where all variables in the same set have can leverage this model to produce a list of requested been pointed to with same reference at a time. Since we permissions. Then, we employ PScout (Au et al. 2012) have got a rough call graph and control flow graphs of all to elicit the corresponding APIs that request per- methods, we traverse the call graph and go inside con- missions. For example, the sentence “Your location: trol flow graphs to perform value inference. We evaluate These permissions are needed to obtain your location each node in a control flow graph, and infer the possi- so we can help you discover hotels, restaurants, and ble values of the variables. The value inference can handle attractions around you” in app TripAdvisor implies that it basic arithmetic and String operations. In addition, we do requests for recognizing users’ current location the per- not evaluate all types of variables, which are both compu- mission android.permission.ACCESS_COARSE_LOCATION tation expensive and useless to our attack detection. We and android.permission.ACCESS_FINE_LOCATION.There- only pay attention to the valuation of primary types (e.g., fore, 21 Android APIs (e.g., void requestLocationUp- boolean, int, double), String, ComponentName, URI/URL dates(float, LocationListener) and Location getLast- and Intent. It is worth mentioning that the values of Com- KnownLocation(String)) are regarded as being necessary ponentName and URI/URL objects can be expressed by to invoke by permission-to-api mapping. a String, while we construct a more complicated struc- The produced Android APIs serve as disclaimers to ture for Intent objects, which basically contains four fields: refine the attack model. During attack detection (see action, class, data and category. Section Attack detection), these APIs will not be consid- The pointer analysis used in this work is type-sensitive, ered as attack actions. however, flow-insensitive. That is, every variable in the samesetneedstosharethesamedatatypewithothers. ICCG construction In order to reduce the expense of storage and computa- The construction of ICCG takes class files and the mani- tion, we store all possible values which the set of variables fest file of the application to be checked as inputs. Primar- can be assigned to rather than only parts of them after a ily, DROIDECHO employs Soot (Vallée-Rai et al. 1999)to certain statement. generate a rough call graph of the whole application and a control flow graph for each method. Given that, DROIDE- Link analysis CHO proceeds in three steps successively: pointer analysis, Link analysis is to establish all links between methods or link analysis and graph assembling.Thefirsttwo stepscan components in an application, i.e., the edges in ICCG. provide all auxiliary information to assemble an ICCG. Primarily, the call graph generated by Soot only contains Pointer analysis the call relationship between Java methods. As introduced Pointer analysis is a static analysis to infer which vari- in Section The inter-component communication graph, ables are pointed to by pointer references or heap ref- there are implicit invocations and a variety of communi- erences. In this step, we want to identify all references cation mechanisms on Android. On the basis of the call which are pointing to variables in the application, and graph, we analyze all links between methods and build a all possible values which the variables can be assigned complete communication graph for the application. to. The result of this step is a PointerTable, which There are two kinds of links between two methods, contains mappings from variables to concrete values: invocation links (either explicit or implicit) and com- Set(variables) → Set(values). Set(variables) denotes munication links via Android medium (e.g., Intent and a set of variables which are pointed to with the same message). We first build call chains for the lifecycle Meng et al. Cybersecurity (2018) 1:4 Page 10 of 17 of Android components. For example, one of the call Algorithm 1: Model-based attack detection chains of Android Activity is onCreate → onStart Input: ICCG of the application Input: Attack model {(action , action ,data|control)}, where → onResume, which shows the implicit invocations after i i+1 0 ≤ i < n − 1 the start of the Activity. As a result, the above meth- Output:if ICCG contains attack 1 for action ∈ attack do ods in the call graph will be linked with an invocation 2 if !(ICCG contains action) then edge, respectively. For communication links, we recog- 3 return false; nize the mediums as well as their attributes existing in 4 for i = 0 to #actions − 2 do the methods, and identify which components or methods 5 if flow(action ,action )==data then i i+1 can receive these mediums. Take the Intent medium as an 6 data_flow = TaintAnalysis(action , action ,asset); i i+1 7 if data_flow is not satisfied then example, if we find an action which starts activities, like 8 return false; startActivity(Intent), we retrieve the attributes 9 if flow(action ,action ) == control then i i+1 (e.g., class and action) of the Intent object and identify 10 control_flow = ForwardControlFlowAnalysis(action , which activities can be triggered by this Intent object. As a action ); i+1 11 if control_flow is not satisfied then result, we add a new link between the method which sends 12 return false; out the Intent and the constructor method of the target activities. 13 trigger := BackwardControlFlowAnalysis(action ); 14 if trigger ∈ {Environmental Input} then 15 return true; Graph assembling 16 else By far, we have obtained the control flow graph for each 17 return false; method of the application, and all links between these methods. We take the control flow graphs as nodes, the links as edges, and assemble them into an ICCG. The whether the flows are satisfied or not. At last, we get the graph depicts the execution order and communications trigger causing this attack (Line 13), and check if it is between different methods at the system level, and illus- akindof environmental input, e.g., the initialization of trates the control flow at the method level. Combined application, system broadcast message and a timer task. In with PointerTable, ICCG is passed to the attack detection the following, we will give a more detailed description for phase. Attack detection will search the graph and find out each step. any existing attack. Action recognition We use actions to describe the basic elements in an attack, Attack detection which is semantic but domain-independent. However, we To reduce the search space of attack detection, we will not analyze the program from its entry points. In con- need to define a system of notations in a specific domain verse, we first recognize attack-related actions existing in (here Android), to capture these actions and triggers in the program in a fast way, and perform a bidirectional flow ICCG. On Android, we recognize an action by the cor- analysis from behaviors, which can effectively speedup the responding constraints. Here we define three kinds of predicates to express APIs and constraints in these actions search process. we metinthe code: sig(api), type(arg), and value(arg), Algorithm 1 shows the whole process to check whether where api is an Android API, arg is a variable, and these one attack is contained by the application or not. The algo- predicates will return a comparable constant value. As a rithm takes ICCG of an application, and one attack model consequence, action recognition can be transformed into as the input, and outputs whether the attack model exists a satisfiability problem, in the ICCG. Line 1-3 show that it recognizes all actions existing in the ICCG. If any of actions in the attack is not action |= sig(api) (1) contained in the ICCG, DROIDECHO concludes that the application does not contain this attack. In our implemen- sig(api) |= type(arg) ∩ value(arg) (2) tation, we conduct an one-time retrieval of the ICCG for each application and store all recognized actions. By com- One action is recognized if we detect some APIs which paring the included actions in each attack, we can quickly satisfies the above constraints progressively. Equation 1 eliminate some attacks which will definitely not happen. shows the action can be recognized with an API with the If all actions in the attack model are found in ICCG, specific signature, and moreover, the arguments or the we proceed the reachability analysis and program slicing. base, if any, need to satisfy two kinds of predicates, type Since there are two kinds of flows (referred to control flow and value. As shown in Eq. 2, arg is either the base of the and data flow in program analysis, respectively) defined API (static methods do not have a base), or the arguments. in our attack model, we carry on ForwardControlFlow- Specially, arg may be another invocation of API, i.e., sig. Analysis (Line 10) and TaintAnalysis (Line 6) to determine Therefore, we will recursively solve the constraints until Meng et al. Cybersecurity (2018) 1:4 Page 11 of 17 the action is recognized. Taking the example of obtaining attack confirmation to guide the dynamic execution of the contacts, the essential code at language level of this action application; 3) identify the search space for potential taint can be described as follows: analysis. The forward control flow analysis aims to complete sig(api) = obj.query(uri, ∗) (3) two tasks: 1) determine the occurrence of the subsequent obtain contact actions in an attack model; 2) similar to the backward control flow analysis, identify the search space for the type(obj) = ContentResolver, taint analysis. As a result, we will not search the entire type(uri) = Uri, ICCG during the taint analysis, which is computationally value(uri) = “content : //contacts" expensive. (4) sig(api) = obj.query(uri, ∗) Taint analysis As showninEq. 3, we first need to find a pivotal func- Taint analysis can track the flow of data during detection. tion whose signature matches obj.query(uri, ), Taking privacy leakage as an example, we need to carry and the methods need to meet three constraints: the base on taintanalysistotracktheflowofdata, andifthedata of the invocation obj needs to be an object of the class is flowed to a sink action and sent out eventually. During android.content.ContentResolver,the type of the taint analysis, we get a domain set in a control-flow uri needs to be an object of android.net.Uri,and order SearchDomain = D → D , ... → Dn,and the 1 2 itsvalue needstobe content://contacts as shown source action is located at D after the above steps. Then sr in Eq. 4. The code statements, which together form a we perform a forward data flow analysis on the domain behavior, might have dependency relationship or follow an set SearchDomain.Figure 4 illustrates the ways how the execution order in between. We deal with it as a constraint data can be tainted cross domains. First of all, data in the satisfaction problem, and recognize a behavior with rea- domain D can influence the data in its previous domain soning. The benefits are that we do not need to care about by three methods: return the data at the call site in the theexecution orderofcodeinabehavior,andhenceour previous domains, referring to  1 ;thedata flow  2 shows approach is more general so as to identify more variations. how the data in the latter domain influences the data in its previous domains; and we can assign the data to one com- Reachability analysis & slicing monly shared variable between the domain D as shown If the ICCG contains all necessary elements for one attack, in  3 . There are three possible ways for the data in domain we start to do program slicing from these elements. The D to influence the data in the successive domains: enclose slicing consists of backward and forward control flow s communication medium with data and pass it to the next analysis. The backward control flow analysis aims to com- domainsasshown bythedataflow  4 ;passthedataas plete three tasks: 1) find the root cause that lead to such action, i.e., its entry points. Based on the entry points, we an argument to its successive domains, which are used in can infer the type of the triggers. Then we know whether these domains, referring to  5 ;assign thedatatoacom- the attack is triggered by a user interaction or environ- monly shared variable in between as shown by the data mental inputs; 2) obtain all conditions in a trace from flow . In addition, we take a coarse-grained aliasing the entry points to the action. The conditions are used in analysis in this paper, i.e., if for example a string variable Fig. 4 Taint analysis across multiple domains Meng et al. Cybersecurity (2018) 1:4 Page 12 of 17 is passed to a function, and this function will encrypt the ten with 17,038 lines of Java, and 163 lines of scripts string and return a new encrypted value with a crypto- (Python and Shell). The dynamic confirmation is imple- graphic scheme. Although we do not know how to convert mented based on TaintDroid (Enck et al. 2010)and Intel- the original string to the encrypted one (we do not infer liDroid (Wong and Lie 2016). TaintDroid enables us to the meaning of cryptographic schemes), we can definitely track the information flows of applications. In addition, ensure the operation is reversible, and the returned data is we customized TaintDroid for two purposes. First, we also of sensitive information. intercept the APIs in our Action set to monitor whether they are invoked by the tested applications. Second, we Dynamic attack confirmation intercept the APIs providing the applications with envi- As discussed before, DROIDECHO’s ICCG construction ronmental inputs, such as location and time information, and attack detection are based on static program analysis, where we can return the applications values that would which is less precise than dynamic analysis. As a result, the activate the target behaviors. During the confirmation, we attacks reported by DROIDECHO may be false positives. employ IntelliDroid to generate the call paths for specific Therefore, we introduce a confirmation step to reduce Android APIs as well as conditions that enable the paths. false positives, and the attack confirmation is based on the Then the driver script takes them as input to automatically technique of dynamic testing. drive the execution of the suspicious applications. To esti- An attack candidate, which is passed from the attack mate the overall performance of DROIDECHO,weconduct detection phase to the attack confirmation phase, con- the experiments from three aspects: evaluation on mal- tains an attack trace and the conditions that guarantee ware benchmark, evaluation on real apps and evaluation the occurrence of attacks. Given that, we simulate the on performance. inputs to drive the dynamic execution of the applica- tion and check whether the attack trace can occur in Evaluation on Malware Benchmark the real execution. In order to activate the attack candi- To evaluate the performance of our approach on the infa- date and capture malicious behaviors, we first instrument mous malware, we conduct an experiment on 1260 sam- Android OS by hooking specific Android APIs which are ples of malware of the collection (Zhou and Jiang 2011). included in our attack model, and then generate the trig- According to the types of malware, we filter out 108 of gers which are used to activate the contained malicious them (e.g., Asroot, DroidCoupon and DroidDeluxe) which behaviors. onlyusenativecodetolaunchattacks.Atlast,wesuc- cessfully detect 940 (89.5%) samples, and also show the • Instrumentation. Since the actions in attack model attack type. There are mainly two reasons for the missing are recognized as the invocations of specific Android malware: 1) some malware use reflection to dynamically APIs, we instrument Android OS to monitor the invoke malicious code. For example, AnserverBot loads invocation behaviors. In this paper, we leverage an executable file in its asset folder, retrieves the included TaintDroid (Enck et al. 2010) to determine whether classes and runs the code. 2) some of them leverage com- these APIs are invoked. plicated obfuscation and encryption to confuse AV tools. Triggers. We leverage IntelliDroid (Wong and Lie For example, Geinimi leverages several cryptographic 2016) to generate all triggers leading to specific schemes (e.g., DES) to encrypt the communication and malicious behaviors, and subsequently schedule these strings. triggers to drive the execution of the application. We In addition, we conduct an experiment to compare simply feeds the application with all possible trigger DROIDECHO’s capability of attack detection with Flow- sequences, and in order to eliminate the impossible Droid (Arzt et al. 2014b), which is a static tool in detecting sequences (which never occur during the real privacy leakage. The subjects of this experiments include executions), we exploit the “happen-before” relations a set of open-source Android applications named Droid- among these triggers to generate sequences. 5 Bench , of which the applications may contain the attacks of privacy leakage. Obtaining these inputs, DROIDECHO is able to exe- DROIDECHO successfully detects 34 samples of mal- cute the suspicious applications to determine if the attack ware, while fails to find 8 malicious samples. We provide is reachable. In order to make the exploration faster, Table 3 to illustrate the comparison results, actually only DROIDECHO prunes the paths which rarely lead to the the different results, with FlowDroid. As shown in Table 3, attack trace, which can significantly reduce the search DROIDECHO has an edge in detecting the first six kinds spaceofthe program. of privacy leakage, but cannot detect the last three kind Evaluation of privacy leakage. PrivateDataLeak-1&2 are two appli- cations which steal the text in a password field of an We implement an automatic platform DROIDECHO to facilitate the detection, accordingly. DROIDECHO is writ- Android GUI view. Since the data on GUI components Meng et al. Cybersecurity (2018) 1:4 Page 13 of 17 Table 3 Comparison with FlowDroid measurement for the usage of applications, e.g., Flurry and Crittercism, diagnose the crash of applications, e.g., App DroidEcho FlowDroid Crashlytics, or advertise, e.g, Umeng and Google Ads. ArrayAccess-1&2 TP FP Table 4 shows third-party libraries that are contained in HashMapAccess1 TP FP the applications. ListAccess1 TP FP Ordering1 TP FP False positive analysis To evaluate DROIDECHO’s accu- Unregister1 TP FP racy, we randomly selected 50 samples, and manually Exception-1&4 TP FP identified 4 false positives. Two false positives are because DROIDECHO cannot well handle collection objects such PrivateDataLeak-1&2 FN TP as array, list, and map. If any element in a collection ImplicitFlow-1&2&3&4 FN FN is tainted, DROIDECHO determines the whole collection Reflection-3&4 FN FN object is tainted. One false positive is due to the ignorance of execution conditions of flows. The execution condition may not be satisfied during runtime leading the malicious are hard to be determined to be sensitive, in addition, behaviors cannot be practically triggered. The last false applications which need authentication have to send cre- positive is attributed to the insufficient modelling of per- dentials, such as user input from keyboard, to the remote sistent storage. As an alternative communication channel, server for authentication. As a result, DROIDECHO does persistent storage (e.g., file, database) might contains mul- not track the flow of the data on GUI components. And tiple dimensional data. It is non-trivial to track the flow of last, DROIDECHO and FlowDroid both cannot cope with data in the persistent storage, which will be further studies the last two kinds of applications, where ImplicitFlows are in future. samples which leverage obfuscation techniques to confuse the analysis, and Reflections are two samples which use Evaluation on performance reflection to dynamically invoke methods or fetch fields to In order to evaluate the efficiency and scalability of complete the process of privacy leakage. DROIDECHO, we measured runtime parameters in the previous experiments. The runtime parameters consist Evaluation on real Apps of the complexity of applications and runtime for each We have collected 7643 applications from Google Play, phase of DROIDECHO. And the experiments are con- which are hot and free application in their respective cat- ducted on a Linux Ubuntu 14.04 machine, carrying 12 egories. By running DROIDECHO, we find out 444 appli- cores of Intel Xeon(R) CPU E5-16500, and 16G Mem- cations which have malicious behaviors. In addition, we ory. We depict the complexity of applications from four have done a statistics of behaviors which are user-awared aspects: thefilesizeofapplication, thenumberofnodes, or already claimed by the description of applications. We edges and mediums of the ICCG. We have measured the compare DROIDECHO with other anti-virus (AV) tools, by runtime for pointer analysis, link analysis, action recogni- uploading apk files into VirusTotal (www.virustotal.com). tion and attack detection, respectively. The detailed data Although AV tools have detected 1541 (20.2%) samples of can be found in Table 5. As shown in Column Runtime(ms) malware, most of them are Adware, of which the num- of DroidEcho,DROIDECHO is very effective in detecting ber is up to 1217 (79.0%). Due to the restriction of our approach, we do not provide a detection for Adware. By Table 4 Privacy leakage via 3rd-libraries filtering these applications of Adware, we can also find 149 Library Description Num Behaviors more applications which have malicious behaviors. We investigate the 149 applications which contain mali- Adobe Measurement of Usage 1 Identity Code, etc. cious behaviors, of which 131 applications have privacy Flurry Measurement of Usage 20 Identity Code, Location, etc. leakages, while the remaining applications have other four Conversant Measurement of Usage 1 Identity Code, Location, etc. kinds of malicious behaviors. In particular, 10 applica- Crashlytics Diagnosis of Crash 8 Identity Code, Sys. Info, etc. tions contain service abuse attacks, i.e., sending SMS Map Service Map Service 5 Location, etc. messages without users’ consent; 6 applications contain Crittercism Optimization Tool 1 Identify Code, etc. content tampering attacks, i.e., deleting SMS messages from the inbox; 2 applications are depleting battery by Umeng Advertisement 4 Identity Code, Location, etc. holding Screen lock for a long time. By investigating the Google Ads Advertisement 3 Identity Code, Location, etc. code of these applications, we find that many of them Amazon Ads Advertisement 1 Identity Code, Locatoin, etc. are employing a third-party library which has exposed Millennialmedia Advertisement 2 Identity Code, Location, etc. sensitive information. The third-party libraries may do a Meng et al. Cybersecurity (2018) 1:4 Page 14 of 17 Table 5 Evaluation on performance of DroidEcho ICCG Runtime(ms) of DroidEcho Size (K) Runtime(ms) of Soot #N #E #M Pointer Link Assembling Recognition Detection Total DroidBench 186 15 1 0 46 3 14 11 40 114 24,702 Malware 893 1,327 6,070 5 4,818 108 55 747 2,358 8,086 65,241 Real Apps 5,392 3,900 75,117 10 17,114 611 453 3,742 13,301 35,221 135,763 attacks, with the average time of about 35s to complete written in native code. To date, our work only accepts Java the analysis of a real application. In addition, since we bytecode as the analysis object. leverage Soot to generate the rough call graph and control The limitations of our approach can be largely ascribed flow graphs for each method of applications, the run- to the expressive ability of the attack model. Since the time of Soot should also be considered to complete the detection is based on static analysis, the attack model whole detection. Soot performs a heavy work of reverse proposed in this paper only contains static features of engineering, i.e., converting Android .dex code into Java attacks. As a result, we can detect more attacks by enrich- bytecode, the time spent on that is hence much larger than ing and enhancing the attack model, for example, taking the runtime of DROIDECHO. into account dynamic features of attacks. Discussion Related work Our attack detection is guided with the semantic attack Attack representation models, which describe the essential attack elements com- Chen et al. (Chen et al. 2013) present permission event bined in a logic order. In this way, our approach is general graphs (PEG) to depict API- and permission-related such that we could detect several types of attacks as well as behaviors occurring on Android. In addition, to express their variations on Android. Although it is hard to include the sequence of occurrence of events, they add the tem- exhaustive attack types, considering that zeroday attacks poral order and leverage the LTL to depict a policy occur from time to time, each augmentation of the attack specification. Combining static analysis, model check- model can enhance and increase the ability of detecting ing and runtime monitoring, they are able to detect the attacks significantly. On the other hand, we have improved violation of contextual policies of Android applications. the conventional static analysis on Java with taking into Gunadi and Tiu (Gunadi and Tiu 2013)proposeasecu- account the new features provided by the Android plat- rity policy specification language to describe privilege form. It helps to produce a more complete and compre- escalation on Android. The language is based on met- hensive communication graph for Android applications, ric linear-time temporal logic (MTL) plus an extension of and thereby makes the attack detection more accurate and recursive definitions. It can help to figure out the context- effective. However, considering the flaws of static analysis sensitive privilege for one application. By monitoring the and the experiment results we got, D ROIDECHO still has chain of privilege in runtime, they manage to find out some shortcomings in detecting attacks: the elevated privilege and detect collusion attacks. Aim- ing at privacy issues on Android, Arzt et al. (Arzt et al. Transformation attacks 2014b)reducethemintoanIFDS(Reps et al. 1995)prob- It is a kind of attacks against anti-malware tools and lem, and construct a flow- and context-sensitive graph approaches, with transforming a malware into different to present the entire behavioral system by static analy- forms, but reserving the original logic (Rastogi et al. sis. Graph reachability and value evaluation are performed 2013). Our approach has a sufficient resistance against to figure out whether the messages being sent out are trivial transformation attacks and transformation attacks tainted as sensitive information. Yang et al. (Yang et al. detectable by static analysis (DSA). However, transfor- 2014) propose a two-level behavioral graph (Component mation attacks non-detectable by static analysis (NSA), Dependency Graph and Component Behavior Graph) to e.g., reflection and bytecode encryption, can paralyze our express the program logic. At first, they leverage an unsu- approach, which is also a common issue in static analysis. pervised mining approach to mine the program logic in malware automatically. Based on the mined graphs, they Vulnerability exploits search crawled applications from marketplaces whether We put more attentions on the attacks which invoke they contain any of malicious behaviors or not. Mariconti Android APIs. There exist a kind of attacks which exploit et al. (Mariconti et al. 2016) propose to use Markov Chain the vulnerabilities of Android, and trigger the vulnerabili- to represent malicious behaviors in Android malware, ties by crafting a special input or executing some code in and employ static analysis to identify malicious behaviors. a certain order. It is more difficult when the exploits are AppContext (Yang et al. 2015) proposes two heuristics Meng et al. Cybersecurity (2018) 1:4 Page 15 of 17 (i.e., activating and guarding conditions) to identify mal- 2014) takes into account the inter-component communi- ware, while not classifying malware in terms of attack cation on Android, and constructs an inter-component targets. call graph to link up all components of the application to A handful of works are devoted to identifying user- detect malware of privacy leakage with crafted signatures. intended behaviors. In a PEG, Chen et al. (Chen et al. Different from these two approaches, our approach pro- poses to detect attacks based on semantic model of attacks 2013) define pre-conditions either with or without users’ consents. Although it only focuses on GUI operations, it and use dynamic analysis to confirm their maliciousness. provides a new prospective of learning the essential char- Our approach combines two approaches, static and acteristics of malware. AppIntent, proposed by Yang et al. dynamic analysis, and achieves both advantages of two (Yang et al. 2013), is another work to extract a sequence aspects. We first employ static analysis based on seman- of GUI operations which causes data transmission. They tic models of attacks to quickly find out the potential first reduce the search space by static analysis to avoid malicious applications, with the trigger and the predicates time-consuming, but useless, searching; then the event which cause the occurrence of attacks. Then we leverage sequence is generated after running symbolic execution dynamic analysis to confirm the attacks to reduce false guided by the reduced space. positives. As a result, our approach is effective on large- Our attack model combines the program-level behav- scale tests and reduces the false positive rate via dynamic iors and external inputs (i.e., triggers) to model attacks. attack confirmation. First, on the program level, we consider the combination Conclusion of assets, actions and flows to model a complete attack In this paper, we introduce a novel attack model to depict behavior. In addition, our model is not on an abstract the essential characteristics and features. In addition, we level as most of the previous studies do. It thus can be build a transformation from an Android application to directly mapped to the real implementation of the tested a directed graph, called the inter-component communi- applications, without missing critical details of the attacks. cation graph. ICCG captures all structure information Second, the triggers are taken into our consideration, of application, including call relationships and commu- which can effectively differentiate the benign behaviors nication between different methods, and it contains all from malicious ones. control flow information for each method. Then we pro- pose an effective algorithm to search attacks in ICCG. Attack detection The approach is proved to be feasible and effective in the Attack detection via program analysis can be roughly experiments. In future, we expect to extend our detect divided into two categories: dynamic analysis and static algorithm to handle more complicated obfuscation or analysis. TaintDroid (Enck et al. 2010) tracks the propa- encryption techniques, and will continue enriching the gation of sensitive information on a customized Android, attack model in order to handle more variants or new and determines whether there exists any attack of privacy attacks. leakage. DroidScope (Yan and Yin 2012) and VetDroid Acknowledgments (Zhang et al. 2013) both reconstruct malicious behaviors Kai Chen was supported in part by National Key R&D Program of China (No. by collecting information during the dynamic analysis. 2016QY04W0805), NSFC U1536106, 61728209, National Top-notch Youth However, the difficulty of the deployment of the mon- Talents Program of China, Youth Innovation Promotion Association CAS, Beijing Nova Program and a research grant from Ant Financial. This work is also itor system restricts the scale of attack detection; and partly supported by International Cooperation Program on CyberSecurity, exhaustive test inputs are nearly impossible, which means administered by SKLOIS, Institute of Information Engineering, Chinese attacks may not be triggered and detected sometimes due Academy of Sciences, China (No. SNSBBH-2017111036). to insufficient inputs. Authors’ contributions As a result, more researchers focus on detecting attacks All authors read and approved the final manuscript. via static analysis. FlowDroid (Arzt et al. 2014b)per- Competing interests forms static analysis, specifically dataflow analysis, on the The authors declare that they have no competing interests. code of applications to check if they contain behaviors of Publisher’s Note privacy leakage. IccTa (Li et al. 2015)incorporatesICC Springer Nature remains neutral with regard to jurisdictional claims in analysis to achieve a more complete and accurate detec- published maps and institutional affiliations. tion of privacy leakage. However, these two approaches Author details are only focusing on the attack detection of privacy leak- 1 SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, age. DroidSIFT (Zhang et al. 2014) analyzes the code of Beijing, China. Nanyang Technological University, Singapore, Singapore. 3 4 Singapore Institute of Technology, Singapore, Singapore. School of Cyber applications and constructs behavior graphs to denote the Security, University of Chinese Academy of Sciences, Beijing, China. program logic. Taking the behavior graphs as signatures, DroidSIFT builds a classification system to distinguish Received: 4 January 2018 Accepted: 17 April 2018 benign applications from malware. Apposcopy (Feng et al. Meng et al. Cybersecurity (2018) 1:4 Page 16 of 17 References Luo W, Xu S, Jiang X (2013) Real-time Detection and Prevention of Android SMS Arzt S, Bodden E (2016) StubDroid: Automatic Inference of Precise Data-flow Permission Abuses. In: Proceedings of the First International Workshop on Summaries for the Android Framework. In: Proceedings of the 38th Security in Embedded Systems and Smartphones, SESP ’13. ACM, New York International Conference on Software Engineering. pp 725–735 Mariconti E, Onwuzurike L, Andriotis P, Cristofaro ED, Ross GJ, Stringhini G Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, (2016) Mamadroid: Detecting android malware by building markov chains McDaniel P (2014) FlowDroid: Precise Context, Flow, Field, Object-sensitive of behavioral models. CoRR abs/1612:04433 and Lifecycle-aware Taint Analysis for Android Apps. In: Proceedings of the Octeau D, McDaniel P, Jha S, Bartel A, Bodden E, Klein J, Traon YL (2013) Effective 35th ACM SIGPLAN Conference on Programming Language Design and Inter-Component Communication Mapping in Android: An Essential Step Implementation, Edinburgh. pp 259–269 Towards Holistic Security Analysis. In: Proceedings of the 22Nd USENIX Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, Conference on Security, SEC’13. USENIX Association, Berkeley. pp 543–558 McDaniel P (2014) Flowdroid: Precise context, flow, field, object-sensitive Oliner AJ, Iyer A, Lagerspetz E, Tarkoma S (2012) Collaborative Energy and lifecycle-aware taint analysis for android apps. In: Proceedings of the Debugging for Mobile Devices. In: the 8th Workshop on Hot Topics in 35th ACM SIGPLAN Conference on Programming Language Design and System Dependability. USENIX, Berkeley Implementation, PLDI ’14. ACM, New York. pp 259–269 Orthacker C, Teufl P, Kraxberger S, Lackner G, Gissing M, Marsalek A, Au KWY, Zhou Y, Huang Z, Lie D (2012) PScout: Analyzing the Android Leibetseder J, Prevenhueber O (2011) Android Security Permissions - Can Permission Specification. In: Proceedings of the 2012 ACM Conference on We Trust Them? In: Security and Privacy in Mobile Information and Computer and Communications Security, CCS ’12. ACM, New York. Communication Systems. Springer Berlin Heidelberg, Berlin. pp 40–51 pp 217–228 Pathak A, Hu YC, Zhang M Bootstrapping Energy Debugging on Smartphones: Bosu A, Liu F, Yao DD, Wang G (2017) Collusive Data Leak and More: A First Look at Energy Bugs in Mobile Devices. In: Proceedings of the 10th Large-scale Threat Analysis of Inter-app Communications. In: Proceedings ACM Workshop on Hot Topics in Networks, HotNets-X. ACM, New York. of the 2017 ACM on Asia Conference on Computer and Communications pp 5:1–5:6. https://doi.org/10.1145/2070562.2070567 Security, Abu Dhabi. pp 71–85 Pathak A, Hu YC, Zhang M (2012) Where is the energy spent inside my app? Chen KZ, Johnson NM, D’Silva V, Dai S, MacNamara K, Magrino TR, Wu EX, Fine-grained Energy Accounting on Smartphones with Eprof. In: Rinard M, Song DX (2013) Contextual Policy Enforcement in Android Proceedings of the 7th ACM European Conference on Computer Systems, Applications with Permission Event Graphs. In: 20th Annual Network and EuroSys ’12. ACM, New York. pp 29–42. https://doi.org/10.1145/2168836. Distributed System Security Symposium, NDSS ’13, San Diego. http:// internetsociety.org/doc/contextual-policy-enforcement-android- Prince B New Android Malware Targets Banking Apps, Phone Information: applications-permission-event-graphs Fireeye. http://www.securityweek.com/new-android-malware-targets- Chen QA, Qian Z, Mao ZM (2014) Peeking into Your App without Actually banking-apps-phone-information-fireeye. Accessed 05 Oct 2017 Seeing It: UI State Inference and Novel Android Attacks. In: Proceedings of ProGuard (2017). http://developer.android.com/tools/help/proguard.html. the 23rd USENIX Conference on Security Symposium, SEC’14. USENIX Accessed 03 Dec 2017 Association, Berkeley. pp 1037–1052 Qu Z, Rastogi V, Zhang X, Chen Y, Zhu T, Chen Z (2014) AutoCog: Measuring Enck W, Gilbert P, Chun B-G, Cox LP, Jung J, McDaniel P, Sheth AN (2010) the Description-to-permission Fidelity in Android Applications. In: TaintDroid: An Information-flow Tracking System for Realtime Privacy Proceedings of the 2014 ACM SIGSAC Conference on Computer and Monitoring on Smartphones. In: Proceedings of the 9th USENIX Communications Security. pp 1354–1365 Conference on Operating Systems Design and Implementation, OSDI’10. Rastogi V, Chen Y, Jiang X (2013) DroidChameleon: Evaluating Android USENIX Association, Berkeley. pp 393–407 Anti-malware Against Transformation Attacks. In: Proceedings of the 8th Enck W, Octeau D, McDaniel P, Chaudhuri S (2011) A Study of Android ACM SIGSAC Symposium on Information, Computer and Communications Application Security. In: Proceedings of the 20th USENIX Conference on Security, ASIA CCS ’13. ACM, New York. pp 329–334 Security, SEC’11. USENIX Association, Berkeley. pp 21–21 Reps TW, Horwitz S, Sagiv S (1995) Precise Interprocedural Dataflow Analysis Enck W, Ongtang M, McDaniel PD (2009) Understanding Android Security. IEEE via Graph Reachability. In: Conference Record of POPL’95: 22nd ACM Secur Priv 7(1):50–57 SIGPLAN-SIGACT Symposium on Principles of Programming Languages, F-Secure Lab (2013) Mobile Threat Report, January - March 2013. Technical San Francisco. https://doi.org/10.1145/199448.199462 report Schlegel R, Zhang K, Zhou X, Intwala M, Kapadia A, Wang X (2011) Feng Y, Anand S, Dillig I, Aiken A (2014) Apposcopy: Semantics-Based Soundcomber: A Stealthy and Context-Aware Sound Trojan for Detection of Android Malware Through Static Analysis. ACM, New Year. Smartphones. In: 18th Annual Network and Distributed System Security https://doi.org/10.1145/2635868.2635869 Symposium Shabtai A, Fledel Y, Kanonov U, Elovici Y, Dolev S, Glezer C (2010) Google Grace MC, Zhou Y, Wang Z, Jiang X (2012) Systematic Detection of Capability Android: A Comprehensive Security Assessment. IEEE Secur Priv 8(2):35–44 Leaks in Stock Android Smartphones. In: 19th Annual Network & Distributed System Security Symposium. http://dblp.uni-trier.de/rec/bib/ Symantec Inc. (2017) Internet Security Threat Report. Technical report conf/ndss/GraceZWJ12 Vallée-Rai R, Co P, Gagnon E, Hendren L, Lam P, Sundaresan V (1999) Soot - a Gunadi H, Tiu A (2013) Efficient runtime monitoring with metric temporal Java Bytecode Optimization Framework. In: Proceedings of the 1999 logic: A case study in the android operating system. CoRR abs/1311.2362. Conference of the Centre for Advanced Studies on Collaborative Research, http://arxiv.org/abs/1311.2362 CASCON ’99. IBM Press. p 13. http://dl.acm.org/citation.cfm?id=781995. Hao S, Li D, Halfond WGJ, Govindan R (2013) Estimating Mobile Application 782008 Energy Consumption Using Program Analysis. In: Proceedings of the 2013 Vekris P, Jhala R, Lerner S, Agarwal Y (2012) Towards Verifying Android Apps for International Conference on Software Engineering, ICSE ’13. IEEE Press, the Absence of No-Sleep Energy Bugs. In: Proceedings of the 2012 USENIX Piscataway. pp 92–101 Conference on Power-Aware Computing and Systems, HotPower’12. USENIX Association, Berkeley. pp 3–3 Hilgers C, Macht H, Müller T, Spreitzenbarth M (2014) Post-Mortem Memory Analysis of Cold-Booted Android Devices. In: Proceedings of Wei F, Roy S, Ou X, Robby (2014) Amandroid: A Precise and General the 2014 Eighth International Conference on IT Security Incident Inter-component Data Flow Analysis Framework for Security Vetting of Management & IT Forensics, IMF ’14. IEEE Computer Society, Washington. Android Apps. In: Proceedings of the 2014 ACM SIGSAC Conference on pp 62–75 Computer and Communications Security. pp 1329–1341 Lhoták O, Hendren L (2003) Scaling Java Points-to Analysis Using SPARK. In: Wong MY, Lie D (2016) IntelliDroid: A Targeted Input Generator for the Proceedings of the 12th International Conference on Compiler Dynamic Analysis of Android Malware. In: 23rd Annual Network & Construction, CC’03. Springer-Verlag, Berlin. pp 153–169 Distributed System Security Symposium Li L, Bartel A, Bissyandé TF, Klein J, Traon YL, Arzt S, Rasthofer S, Bodden E, Xing L, Pan X, Wang R, Yuan K, Wang X (2014) Upgrading Your Android, Octeau D, McDaniel PD (2015) IccTA: Detecting Inter-Component Privacy Elevating My Malware: Privilege Escalation Through Mobile OS Updating. Leaks in Android Apps. In: 37th IEEE/ACM International Conference on In: IEEE Security & Privacy Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Xu K, Li Y, Deng RH (2016) ICCDetector: ICC-Based Malware Detection on Volume 1. pp 280–291 Android. IEEE Trans Inf Forensics Secur 11(6):1252–1264 Meng et al. Cybersecurity (2018) 1:4 Page 17 of 17 Xuxian J, Yajin Z (2013) Android Malware. SpringerBriefs in Computer Science Yan LK, Yin H (2012) DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis. In: USENIX Security. USENIX Association, Berkeley. pp 29–29 Yang C, Xu Z, Gu G, Yegneswaran V, Porras PA (2014) DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications. In: 19th European Symposium on Research in Computer Security. Springer International Publishing. pp 163–182 Yang W, Xiao X, Andow B, Li S, Xie T, Enck W (2015) AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context. Proceedings of the 37th International Conference on Software Engineering. pp. 303–313 Yang Z, Yang M, Zhang Y, Gu G, Ning P, Wang XS (2013) AppIntent: Analyzing Sensitive Data Transmission in Android for Privacy Leakage Detection. In: Proceedings of the 2013 ACM SIGSAC conference on Computer and Communications Security, CCS ’13. ACM, New York. pp 1043–1054 Zhang M, Duan Y, Yin H, Zhao Z (2014) Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In: Proceedings of the 21th ACM Conference on Computer and Communications Security, CCS ’14, Scottsdale Zhang M, Yin H (2014) Efficient, Context-aware Privacy Leakage Confinement for Android Applications Without Firmware Modding. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security (ASIACCS’14), Kyoto Zhang Y, Yang M, Xu B, Yang Z, Gu G, Ning P, Wang XS, Zang B (2013) Vetting Undesirable Behaviors in Android Apps with Permission Use Analysis. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS ’13. ACM, New York. pp 611–622. https:// doi.org/10.1145/2508859.2516689 Zhou Y, Jiang X (2011) An Analysis of the AnserverBot Trojan. Technical report. http://www.csc.ncsu.edu/faculty/jiang/pubs/AnserverBot_Analysis.pdf Zhou Y, Jiang X (2012) Dissecting Android Malware: Characterization and Evolution. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12. IEEE Computer Society, Washington. pp 95–109 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Cybersecurity Springer Journals

DroidEcho: an in-depth dissection of malicious behaviors in Android applications

Free
17 pages
Loading next page...
 
/lp/springer_journal/droidecho-an-in-depth-dissection-of-malicious-behaviors-in-android-Yv7a4OEB18
Publisher
Springer Singapore
Copyright
Copyright © 2018 by The Author(s)
Subject
Computer Science; Computer Science, general
eISSN
2523-3246
D.O.I.
10.1186/s42400-018-0006-7
Publisher site
See Article on Publisher Site

Abstract

A precise representation for attacks can benefit the detection of malware in both accuracy and efficiency. However, it is still far from expectation to describe attacks precisely on the Android platform. In addition, new features on Android, such as communication mechanisms, introduce new challenges and difficulties for attack detection. In this paper, we propose abstract attack models to precisely capture the semantics of various Android attacks, which include the corresponding targets, involved behaviors as well as their execution dependency. Meanwhile, we construct a novel graph-based model called the inter-component communication graph (ICCG) to describe the internal control flows and inter-component communications of applications. The models take into account more communication channel with a maximized preservation of their program logics. With the guidance of the attack models, we propose a static searching approach to detect attacks hidden in ICCG. To reduce false positive rate, we introduce an additional dynamic confirmation step to check whether the detected attacks are false alarms. Experiments show that DROIDECHO can detect attacks in both benchmark and real-world applications effectively and efficiently with a precision of 89.5%. Keywords: Semantic attack model, Android malware detection, Inter-component communication graph, Privacy leakage Introduction ion of security analysts and antivirus software, such as Nowadays, Android malware detection is facing two crit- PROGUARD (ProGuard 2017) and reflection (Zhou and ical challenges: 1) how to design a precise and efficient Jiang 2011). All of these raised challenges for the existing model to represent malware; 2) how to reduce false alarms detection approaches to reach a desirable precision and and distinguish real malware from benign applications. scalability simultaneously. Android malware varies in many aspects such as attack On the other hand, it is challenging to eliminate targets, attack methods, and applied obfuscation tech- greyware from malware (Symantec Inc. 2017), espe- niques. For example, Android malware may steal users’ cially when they are requesting privileged permissions sensitive information (Grace et al. 2012;Arztetal. 2014a), for accomplishing specific functionalities. For instance, elevate their privilege (Xing et al. 2014;Gunadiand Tiu WECHAT, one of the top-ranked applications in Google 2013), deplete device resources (Vekris et al. 2012;Pathak Play, requests permissions of reading SMS messages and et al. 2012), and remote control users’ devices (Zhou accessing network simultaneously. It may raise the con- and Jiang 2012). Malware may accomplish attack mis- cern of security analysts since it is speculated as a poten- sions either individually or collaboratively (Octeau et al. tially malicious behavior which sends SMS messages out 2013;Bosuetal. 2017), perform attacks only once or to the network. However, the fact is that it only reads periodically (Zhou and Jiang 2012), and be triggered by the SMS messages from its remote server for the two- the installation or a broadcast message. In addition, mal- factor authentication use. Similar cases are pervasive on ware may adopt several mechanisms to bypass the detect Android: weather applications show the weather situa- tion and forecast to users, and thereby, need to read and *Correspondence: mengguozhu@gmail.com send out users’ location information; social applications SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, may ask for users’ contacts to find friends quickly; fit- Beijing, China Nanyang Technological University, Singapore, Singapore ness applications sometimes access the sensors in order Full list of author information is available at the end of the article to measure users’ exercise. Therefore, the detection based © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Meng et al. Cybersecurity (2018) 1:4 Page 2 of 17 on an imprecise and coarse-grained malicious behavior can be utilized to guide our detection of attacks in a model would lead to a high false positive rate. precise way. Even with a precise model of malicious behaviors, mal- Meanwhile, we transform Android applications into a ware searching in applications with static approaches is comprehensive graph, incorporating call graphs between not easy. New execution paradigm, system libraries and methods, and control flow graphs as per method. We rich communication features provided by Android have conduct an in-depth static analysis through the graph facilitated the development of rich-functionality applica- with the guidance of attack model, and generate a full tions. On the other hand, however, they also make static path with the trigger and the predicates that guarantee analysis of application more complicated and difficult, the occurrence of these behaviors. The detected malicious which are summarized below. behaviors will be filtered by two conditions: if a seemingly malicious behavior is triggered by the user, it is likely that Implicit Execution Sequence. Android framework the behavior is user-intended, which we regard it as being provides a variety of program execution harmless; presence of suspicious behaviors does not mean environments, callbacks and control frameworks for there is a real attack. It happens because some applica- each Android component .Itisknownas lifecycle. tions indeed need to carry out several seeming “malicious” For example, after an activity is started by the system, behaviors to fulfill their tasks with good purposes. This is it will execute the methods onCreate(), learnt and induced by investigating a group of applications onStart() and onResume() in proper order, under the same category or being similar. We make use which cannot be observed from the application code; of the mined social knowledge to filter out these harmless • behaviors with a high level of confidence, i.e., these behav- Various Triggers for an Application. There are iors are likely a necessary part for applications. It does not many ways for an application to interact with the only facilitate the efficiency of detection, but also reduce external environment. The application can be false positive in practice. triggered or impacted by users’ GUI operations (e.g., After the identification of malicious behaviors, we pro- clicking a button). It can register a broadcast receiver pose an approach to confirm the detected attacks with the to respond once a broadcast message arrives. In dynamic execution. Our dynamic analysis is driven by the addition, local sensors can drive the application to attack traces generated previously, and provides a satisfied run in a pre-defined way. On the other hand, an condition to guarantee the program to proceed along the application can be started and driven via remote trace. The dynamic execution reproduces the occurrence messages, such as Google Cloud Messaging (GCM), of attacks, and makes the attack detection more precise. HTTP response, and an incoming SMS or phone call; Different from the existing research on static analysis Complicated Communication Mechanisms. based approaches (Arzt et al. 2014a; Arzt and Bodden Although each application is running in a separated 2016;Xuetal. 2016; Wei et al. 2014), our work starts from sandbox, Android provides them various ways to the comprehension of Android malware by constructing communicate with each other. For instance, the semantic models. To reduce the false positive rate, we Intent model (Octeau et al. 2013)isthemost propose an approach to confirm attacks complying with compelling method for component communication. the identified executed traces. To sum up, we make the Additionally, applications can define bound services, following contributions: for example, an AIDL (Android Interface Definition Language) interface, and implement a Binder or a Attack model We propose a novel representation, to Messenger to accomplish the communication even characterize malicious behaviors. An attack in the between different processes or applications. model is constituted of target assets, execution To overcome the above challenges, we propose an inte- actions, triggers, execution flows and apps’ grated framework called DROIDECHO to analyze Android declaimers. It can facilitate the understanding of the applications. First, we summarize the features of attacks essential features of attacks, and the detection of happening on the Android platform, and propose a novel malware. attack model. The model illustrates a variety of attack Accurate attack detection approach We propose a types at an abstract level, which is platform-independent. richly descriptive representation, named ICCG, to In particular, an attack is composed of: assets, which depict an Android application, with a maximal preservation of information. Based on ICCG, we are the targets of attacks; actions, the execution oper- design a synthetic approach to identify a malicious ations performed on assets, and triggers, of which one entrance to the app that leads to the attack behav- application by considering both the engineering iors. Then we specialize the attack model into attack aspect and the social aspect. A reduced but sufficient instances which are close to the Android platform, and static analysis is to prove the presence of suspicious Meng et al. Cybersecurity (2018) 1:4 Page 3 of 17 behaviors, then confirmed with the help of the learnt releasing to exhaust battery quickly. On the Android plat- social knowledge. form, all the assets we concern about can be accessed by Attack Confirmation After the identification of invoking certain system APIs. We list the representative malicious behaviors, we conduct a confirmation examples of these assets on Android as follows. process to prove the existence of a real attack with Information Assets: Identity code, Contact, SMS dynamic execution. The dynamic execution is fed messages, File system, Location, System setting, etc. with the traces of malicious behaviors generated by Software Assets: Phone service, SMS service, DROIDECHO, and further identifies the satisfiable Package Manager, Download Manager, Broadcast conditions. Then it drives the application to execute service, etc. along the traces, and thereby reproduces the attacks Hardware Assets: Camera, Media, Sensor, etc. for confirmation. Evaluation We have evaluated DROIDECHO on the Actions malware benchmarks (i.e., GENOME and An attack action is an operation performing on a cer- DROIDBENCH), and 7,643 real world applications. It tain asset with the purpose of acquisition, tampering and shows that DROIDECHO outperforms the interception, e.g., to fetch the IMEI code of the mobile state-of-the-art tool. Moreover, we have found out phone. 444 applications with malicious behaviors in Google Play, and have a competitive edge in precision of Category According to the type of the target assets, 89.5% to the counterpart approaches and tools. actions can be categorized into several classes. For exam- ple, an action can acquire, edit, or delete some informa- Organization tion stored on device; invoke, interrupt or stop a service Section Semantic model of attack proposes abstract provided by Android; and occupy or release a hard- models for various attacks in Android. Section The inter- ware resource. Therefore, the semantics of actions can be component communication graph describes a represen- uniquely specified by the association of the action type tation for Android applications. Section System design of and the target assets. In addition, there is a unique kind DroidEcho presents our approach in malware detection. of actions on Android which are used for communication Section Evaluation gives a comprehensive evaluation for (see Section The inter-component communication graph our approach. Section Discussion discusses the experi- for more details). Within communication, there must be ments and limitations of our approach. Section Related at least one sender and one receiver, and the communi- work summarizes summarizes relevant literatures, and cation can occur between an application and the external Section Conclusion concludes this work. environment, or between two components in one appli- cation. As a result, we summarize four actions related to Semantic model of attack communication in the scope of application. Table 1 shows In this section, we first give an in-depth discussion on the categories of actions covered in this paper. the attacks happening on the Android platform, and then Parametrization An action is often implemented by provide a formal description of these attacks. invoking a set of system APIs. These APIs are orga- nized with a certain dependency relationship. For exam- Building blocks ple, the action of retrieving data stored in a content An attack on the Android platform has its unique features provider can be described as: obtaining an instance and characteristics. It has a variety of attack targets, and of ContentResolver; specifying the URI of the tar- includes a sequence of actions that often leverage the get asset; and retrieving the data stored in this con- APIs provided by Android. In order to depict these ele- tent provider. Every action of retrieving data in content ments of an attack, we start with introducing the building provider follows the above processes. And we provide elements of attacks and their representative examples more details about this in Section Action recognition. on Android, in order to construct a general and formal As a functional unit in the attack model, an action definition of attacks. usually has an input, an output or both. Let α be an action, and β be an asset, then α(β) denotes the Assets Assets are referred to hardware, software and information input of the action α is the asset β,and α on Android devices, which are the targets of attacks. For denotes the output of the action α is the asset β (refer example, contact information is an important asset, which to Section Flows). A variety of concrete actions are attackers aim to steal and make use of for malicious pur- derived from parameterizing these actions with assets. poses; front light is a battery-consuming hardware such For instance, when acquiring the content of a content that some malicious applications may acquire it without provider, we can specify some assets as the target, such as Meng et al. Cybersecurity (2018) 1:4 Page 4 of 17 Table 1 The category of actions on Android Category Operation Action Example Corresponding Implementation acquire get SMS message ContentResolver.query(Inbox) insert insert a contact ContentResolver.insert(Contact) Information-based edit change system setting Wallpaper.setBitmap(Image) delete delete local files File.delete() invoke call a number startActivity(Intent{tel:num}) Software-based interrupt block SMS messages abortBroadcast() stop uninstall an app startActivity(Intent{pkg:app}) occupy hold the wakelock WakeLock.aquire() Hardware-based release release the wakelock WakeLock.release() e_send send data to environment sendTextMessage(SMS) e_recv receive data from environment getMessagesFromIntent(Intent) Communication i_send send data to other component startService(Intent) i_recv receive data from other component getIntent(Intent) ContactsContract.Contacts.CONTENT_URI and triggered by environmental input as potential attacks CalendarContract.Events. As a consequence, two for a further analysis. actions are generated to fetch the contact list and events in the calendar, respectively. Table 1 list 9 basic kinds of As suggested by (Yang et al. 2013;Chenetal. 2013), behaviors that would never been executed until they are actions, based on which more actions can be generated by parametrization with explicit target assets. triggered by the user interaction reflect the “intention” of the user. Therefore, in this work, we assume that user Triggers interactions will not trigger any malicious behaviors, i.e., Triggers are events which are taken as inputs to an appli- potential attacks that are triggered by user interactions are cation and lead to the occurrence of a behavior. Although false positive. However, environmental input triggers can triggers, which occur during runtime, are unpredictable proceed stealthily, preventing users from knowing them. for applications, the application can provide handlers to This kind of triggers usually bring in many security risks, subscribe and capture these triggers. Once the application which are our main concern in this paper. receives a subscribed trigger, it will go into the life cycle Since triggers are external objects that cause the exe- and execute specific methods. In light of the awareness of cution of attacks, we can instead recognize e_recv (see users, we present two sorts of triggers in the following: Table 1) to observe the arrivals of triggers. Specifically, the listeners can be categorized in terms of types of triggers. User Interaction. This kind of triggers are usually For example, onClick(View), onDrag(View,...) GUI-related, which are visible to the operating users. and onKey(View,...) are the entry points of pro- For example, when the user clicks a button drawn on gram when a user interaction trigger comes. While the screen, the behavior is triggered and starts to onCreate() and onReceive() are the entry points execute. From this, the user can learn that the for the boot of applications and a broadcast mes- behavior is caused by his/her click operation, and we sage, respectively, which are regarded as environmental call it user-awareness. For simplicity, we assume that inputs. users can know the behaviors from the context which the user interaction causes. Flows Environmental Inputs. There is another kind of Actions have a flow relationship in between. It is a kind triggers which can drive the execution of an Android of dependency relationship which is either directional or application. The trigger could be the initialization of contextual. The directional relationship indicates the cer- the application, a broadcast message or registered tain order of execution, which has been defined in the listeners to sensors. The whole process is free from program logic for a specific task ; and the contextual the involvement of the user, which means that the relationship can be described as a semantic connection user is likely unware of the execution of behaviors. As between two actions, for example, the input of an action is a consequence, we classify malicious behaviors the output of the other action . Generally, the contextual Meng et al. Cybersecurity (2018) 1:4 Page 5 of 17 γ γ relationship needs a transition of the negotiated data from PL =− → e_recv → acquireacquire  e_send(γ ) one participant to the other. A flow can exist between the environment and an Information interception Mobile devices can interact action, and triggers are their negotiated data between with the external environment in many ways. However, them. Take an incoming SMS message for example, if malicious applications intercept the communication, sus- an application registers a BroadcastReceiver for SMS pend, or even break off the communication. The common messages, once an incoming SMS message arrives, the attacks include blocking an incoming SMS messages and application will start to execute from the listener, and it phone calls. For such kind of attacks, malicious applica- can also get the content of the message as input. There- tions need to register a listener (i.e., e_recv)for broadcast fore, there exists a directional and contextual relationship messages of incoming messages and calls, which stops between the environment and the action acquire(SMS), the spreading (i.e.,intercept) to avoid the messages from i.e., reaching to other applications or the user. II = e_recv → intercept(γ ) A flow can also exist between two actions. After an Content tampering Malicious applications may tamper application gets an incoming SMS message, it can send content on mobile devices, such as contact, SMS, account, the message to a remote server via the Internet. In such and system settings. It can cause severe damages to the a case, it is a contextual flow between these two actions. user. Usually, an application can insert, update and delete The flow guarantees the two actions perform on the same an item in a content provider with specific permissions. SMS message. Therefore, we present the flow as: In addition, it can change system settings such as network connection, wallpaper and sleep time. We use insert, edit and delete to describe such kind of behaviors. The trigger of this attack will not give rise to users’ attention and does Attack models not have any data flow relationship with these actions. The Based on the aforementioned building blocks for an attack is defined as follows: attack, we define different attacks in this section. In the remainder of this section, we use the following notations. CT =− → e_recv → α(γ ), where α ∈{insert, edit, delete}. E is the set of Environmental Input triggers; t is the trigger of the attack and t ∈ E; Asset is the set of assets involved in Service abuse Malicious applications may abuse the ser- the attack; Let α be an action or a trigger, β be an action, vices provided by Android (Luo et al. 2013). According and γ be an asset. A flow is either a control flow denoted to our investigation, the most prevailing services which as α → β,oradataflowdenotedas α  β. are abused include phone service, SMS service, package manager,and download manager. For example, if an appli- Attack taxonomy cation possesses the permission of sending SMS messages, We conduct a comprehensive investigation of existing it can subscribe a premium-rate mobile service which attacks of malicious behaviors (Enck et al. 2009;Shabtaiet causes users’ financial charge. Let α be the kind of actions al. 2010;Encketal. 2011;ZhouandJiang 2011; 2012), and which abuses services, and the attack model can be pre- propose a taxonomy of attacks in terms of these building sented as: blocks and semantic information as follows. SA =− → e_recv → α(γ ), where α ∈{invoke, stop}. Privacy leakage Privacy leakage (Enck et al. 2010;Grace et al. 2012; Zhang and Yin 2014) refers to the exposure Resource depletion Due to portability and simplicity, of sensitive information on devices. As discussed in the mobile devices usually carry low-frequency CPU, RAM action part, such kind of information can be acquired by of limited size and small capacity battery. Mobile devices specifying an acquire action, which is regarded as source thereby can only provide a limited computation capa- in the attack of privacy leakage. If there exists a data bility, storage and energy. It would make worse if any flow from the return value of the acquire action to the installed applications occupy these resources immoder- data sent out to the external environment by a com- ately, which can influence other applications, and even the munication action, usually called sink, privacy leakage battery life of the device. Either intentionally or uninten- happens. In addition, the attack needs to happen without tionally, applications keep consuming resources (Pathak users’ awareness, and it is not necessary for the trigger to et al.; 2012;Vekrisetal. 2012) or carry on useless and have a dataflow relationship with these two actions. As a endless works (Oliner et al. 2012; Hao et al. 2013), while result, the formal attack model of privacy leakage can be never release or stop them. Let occupy be the kind of defined as: actions which exhausts resources, and release be the kind Meng et al. Cybersecurity (2018) 1:4 Page 6 of 17 of actions which releases resources. And we use  to in which some behaviors are excluded from considera- show a missing flow between these two actions. The attack tion for the determination of attacks. The violation of model is given in the following: certain security properties cannot imply the occurrence of attacks. Some applications may need to carry on some RD =− → e_recv → occupy(γ )  release(γ ) suspicious looking behaviors which they already claimed the potential security violation explicitly. We conclude Discussion that the users who install their applications would like to The taxonomy of attacks is based on the 102 malware undertake the introduced risks by default. Therefore, in families we have studied. However, there are some attacks this work, we filter out the “attacks” that are allowed by the out of detection of our approach, such as fishing, adware users, and remove them from the generated attack report. and privilege escalation.Fishingisakindof attacksin which one application disguises an authentic and legiti- The inter-component communication graph mate application, and induces users to enter their creden- For an accurate representation of Android applications tials of, for example, bank account (Prince). Adware is a and the convenience of attack detection, this section program that displays advertisements to its users, which presents the proposed the inter-component communica- isannoyingratherthan harmfulatmostoftime(F-Secure tion graph (ICCG) to capture all possible communications Lab 2013). Some applications may exploit the vulnerabil- between components and threads inside Android applica- ities of Android, such as Exploid, RATC/Zimperlich and tions. Ginger Break (Xuxian and Yajin 2013), to elevate the priv- ilege once installed on device; Pilup (Xing et al. 2014)is Android communication medium a newfound flaw in Package Management Service which Medium is a special data structure used for commu- can be exploited by malicious applications only during the nications. The communications can occur between phase of upgrading the Android OS. At last, side chan- either two components (i.e., activity, service, broad- nel attacks (Schlegel et al. 2011;Hilgers et al. 2014;Chen cast receiver and content provider), or two isolated et al. 2014), which collect memory information or timing processes. Medium is playing a critical role in information, are not our scope of attack detection. the behavior of Android applications. Besides the The insufficiency of DROIDECHO comes from two frequently-talked Inter-Component Communication(ICC) aspects: 1) our static analysis is carried on Java code, and (Orthacker et al. 2011; Schlegel et al. 2011), which is based does not go inside the native code. Many of malware of on the Intent medium, there are three other mediums privilege escalation utilize native code to elevate the privi- which can be also used during the communication. Here lege; 2) we try to avoid to make a subjective judgement, but we provide the different types of mediums existing on the prefer to detect an objective existence of malicious behav- Android platform. iors. That is, fishing and adware just deceive and bother users respectively, which do not violate security policies Intent Intents are the main vehicle for communication. (Enck et al. 2009) of Android precisely. We give the statis- One intent can be either explicit or implicit. Explicit tics of attacks mentioned previously in Table 2, indicating intents have a specific class to start, while implicit intents that our approach can detect up to 90.4% of attacks in do not specify the corresponding class, and the sys- theory. tem will select the most well-suited class or application to execute. An explicit intent can only invoke a spe- Disclaimers cific component, which is defined in the constructor, There is a significant exception for determining an attack or by calling setComponent(ComponentName) or - disclaimers. A disclaimer is a white list for an application setClass(Context, Class); an implicit intent can be received by many well-suited components. It appoints Table 2 The category of attacks on Android potentialreceivers bysettinganactionintheconstruc- Attack Percent Supported by DroidEcho tor or setAction(String) (Meanwhile, it can be Privacy Leakage 31.4 instrumented with a data type to restrict its receivers) Information Interception 11.6 (Feng et al. 2014). Intent can influence the execution order Content Tampering 13.4 (a.k.a., control flow) of the application, and also impact on the data flow if enclosed with extras. Service Abuse 31.4 Resource Depletion 1.8 Message Message is a concise data structure for arbitrary Fishing 1.7 ✗ data. Two isolated processes or threads can communicate Adware 2.3 ✗ with each other by transferring a message. In general, the Privilege Escalation 6.4 ✗ message receiver has to create a Messenger to handle the Meng et al. Cybersecurity (2018) 1:4 Page 7 of 17 received messages. On the sender side, it needs to obtain method invocation relationship and possible communica- the reference to this Messenger, and sends its crafted tions between different functions. In the fine-grained level message by invoking send(Message message) of the of granularity, a node is in-depth dissected and shows the Messenger. In order to send a message, for example, to a internal logic, i.e., control flow. When we are identifying daemon service, the component can first bind to this ser- the elements of attacks, especially behaviors, we need to vice via bindService(), and then fetch the reference goin deep atthecodelevel,andrecognizethedifferent to the Messenger from the returned Binder object. patterns of behaviors. We employ two different kinds of edges to denote the relationship between nodes - call relationship and com- Binder Binder is used for a component to talk to a dae- munication relationship. Flow edges reflect the call rela- mon service. The component which attempts to bind to a tionship among nodes. This is the primary concept in service needs to invoke bindService() and implement the program analysis, which consists of explicit calls and ServiceConnection, which establishes the connec- implicit calls. Here we emphasize the unique implicit tion with the service. On the service side, it needs to pro- calls, i.e., Android Lifecycle, existing on Android. An vide an inherited class of Binder, exposing public methods Android lifecycle indicates an implicit function invoca- to customers; or design an AIDL interface as well as the tion between different methods or classes. The implicit implementation. After that, the component can obtain a calls are either callbacks passed to a concrete method, binder object, which is a remotable object for a lightweight or control frameworks specifying a call sequence. Besides remote procedure call. In addition, AIDL can be exposed the lifecycle features of standard Java, e.g., the method to other applications for remote invocations. void start() of one thread instance will implicitly call the override method void run(), Android has included Persistent storage On Android, applications may many libraries to support an amount of implicit calls. For exchange data through persistent storage. There are three each component of Android, it has a unique call sequence types of persistent storage: File, Shared Preferences and pre-defined by Android. In addition, all GUI components SQLite database. They can be used for applications or on Android allow developers to pass a callback to execute components to exchange data, that is, they provide an functionalities when the corresponding event occurs. implicit data flow from one component to another. The communication edges are connecting between nodes and mediums. As defined previously, there are four kinds of mediums used for communication, and it is worth Inter-component communication graph mentioning that the communications are not only show- Definition 1. Let M be the communication mediums ing the logic order of execution, some of them also enclose existing on Android. An ICCG is a directed graph defined data which can be transferred from one node to another as G ={V , E , E },where V is a set of nodes; E : V × V f c f node. is a set of flow edges; and E : V × M × Vis a set of communication edges. We use the DroidKungFu malware as an example to The nodes of a graph are the methods contained in explain the ICCG. As shown in Fig. 1 (a), there are an the application, which come with two levels of granular- activity and a service, which communicate via an Intent ity. The coarse-grained nodes only represent the signature medium. The activity obtains sensitive data (refer to  1 in of the functions, and help to express the relationship onStart), and passes the data to the service. Then the between functions in the system level. We can learn the servicesends thedataoutat  2 in onCreate.Figure 1 Fig. 1 An example of malicious behaviors and the corresponding ICCG. a The snippet code of malicious behavior b The corresponding ICCG of the code Meng et al. Cybersecurity (2018) 1:4 Page 8 of 17 (b) shows the constructed ICCG based on the code. As construction, attack detection and attack confirmation. discussed in the previous section, each node represents The first phase disclaimer learning receives the descrip- a method of the application, and contains a control flow tive text of applications as input, and generates a white list graph. The nodes are connected by two kinds of edges: of “necessary” behaviors (a.k.a., disclaimer of the applica- Android mediums (e.g,. the Intent object) and method tion) in a supervised manner. The white list will be used to invocations either implicit invocations (e.g., lifecycle) or exclude the detection for the claimed functionality of the explicit invocations. application. Second, ICCG construction takes class files and the manifest file of the application as input, and con- Sufficiency of ICCG structs an ICCG, which is then passed to the third phase. We construct ICCG for representing the overall structure Attack detection can find out, if any, existing attacks and of functions in the application, and search if any attack the corresponding traces which cause these attacks in the model is hidden in the graph. As the attack model pro- application. At last, attack confirmation receives the can- posed in Section Semantic model of attack is general and didate attacks, and determines whether one attack candi- platform-independent, we show the sufficiency of ICCG date is a false positive or not by a trace-guided dynamic to detect attacks below. execution. As modeled in Section Semantic model of attack,an attack is a set of operations which the attacker performs Disclaimer learning to achieve a certain objective, and it is composed of 5 Some Android applications may perform seemingly sus- essential elements. ICCG retains almost all program infor- picious behaviors while they are actually demanded to mation, and we can extract a number of call sequences accomplish the functionality. The demanded function- from it. By checking each call sequence, we can recog- ality and the risks it may bring are usually claimed in nize actions which are attack related, identify the trigger their descriptive text. We regard this as a benign behav- of it, and perform data flow analysis on the call sequence. ior (henceforth disclaimer), and it will not be considered Hence, we could find a mapping from the attach model as an attack candidate. For example, TripAdvisor is a to the ICCG, which means that ICCG contains sufficient travel application, which can provide the nearby restau- information to detect an attack inside. rants and hotels when the user is travelling. For ease of use, it acquires the permission FINE_LOCATION to learn System design of DroidEcho the user’s location such that it can provide the most suit- This section presents the design of DROIDECHO.As able information for the customers. Although we detect that TripAdisor has a privacy issue, which sends the user’s showninFig. 2,DROIDECHO takes as input an Android location to a remote server from time to time, we regard application, which contains the class files, the manifest file and the description of its functionality. DROIDE- this as being benign and harmless. CHO will generate an attack report which contains identi- As shown in Fig. 3, we obtain the descriptions of fied malicious behaviors and the corresponding traces of applications and perform a description-to-permission these behaviors for forensic use. DROIDECHO leverages fidelity analysis (Qu et al. 2014). The fidelity analysis the attack model which is presented in Section Seman- builds a description-to-permission relatedness model in tic model of attack as the guidance for attack detection, which one permission is associated with a list of noun and proceeds in four phases: disclaimer learning, ICCG phrases. For the description of a given application, we Fig. 2 The architecture of the system Meng et al. Cybersecurity (2018) 1:4 Page 9 of 17 reference at a time, and Set(values) denotes a set of possible values to which the variables can be assigned. PointerTable plays a critical role in the step of link anal- ysis and action recognition. During the step link analysis, PointerTable is used to infer the actions and classes of an Intent object, thereby DROIDECHO can identify which components are able to receive this Intent. And DROIDE- CHO needs the PointerTable to recognize the semantics of actions during the action recognition. For example, when DROIDECHO encounters an operation to query a content provider, it needs to learn the value of the argument URI, to distinguish different content providers. Parts of our pointer analysis are based on SPARK Fig. 3 The learning process of disclaimers (Lhoták and Hendren 2003), which is a pointer analysis framework. It can cluster the variables into several sets, i.e., Set(variables), where all variables in the same set have can leverage this model to produce a list of requested been pointed to with same reference at a time. Since we permissions. Then, we employ PScout (Au et al. 2012) have got a rough call graph and control flow graphs of all to elicit the corresponding APIs that request per- methods, we traverse the call graph and go inside con- missions. For example, the sentence “Your location: trol flow graphs to perform value inference. We evaluate These permissions are needed to obtain your location each node in a control flow graph, and infer the possi- so we can help you discover hotels, restaurants, and ble values of the variables. The value inference can handle attractions around you” in app TripAdvisor implies that it basic arithmetic and String operations. In addition, we do requests for recognizing users’ current location the per- not evaluate all types of variables, which are both compu- mission android.permission.ACCESS_COARSE_LOCATION tation expensive and useless to our attack detection. We and android.permission.ACCESS_FINE_LOCATION.There- only pay attention to the valuation of primary types (e.g., fore, 21 Android APIs (e.g., void requestLocationUp- boolean, int, double), String, ComponentName, URI/URL dates(float, LocationListener) and Location getLast- and Intent. It is worth mentioning that the values of Com- KnownLocation(String)) are regarded as being necessary ponentName and URI/URL objects can be expressed by to invoke by permission-to-api mapping. a String, while we construct a more complicated struc- The produced Android APIs serve as disclaimers to ture for Intent objects, which basically contains four fields: refine the attack model. During attack detection (see action, class, data and category. Section Attack detection), these APIs will not be consid- The pointer analysis used in this work is type-sensitive, ered as attack actions. however, flow-insensitive. That is, every variable in the samesetneedstosharethesamedatatypewithothers. ICCG construction In order to reduce the expense of storage and computa- The construction of ICCG takes class files and the mani- tion, we store all possible values which the set of variables fest file of the application to be checked as inputs. Primar- can be assigned to rather than only parts of them after a ily, DROIDECHO employs Soot (Vallée-Rai et al. 1999)to certain statement. generate a rough call graph of the whole application and a control flow graph for each method. Given that, DROIDE- Link analysis CHO proceeds in three steps successively: pointer analysis, Link analysis is to establish all links between methods or link analysis and graph assembling.Thefirsttwo stepscan components in an application, i.e., the edges in ICCG. provide all auxiliary information to assemble an ICCG. Primarily, the call graph generated by Soot only contains Pointer analysis the call relationship between Java methods. As introduced Pointer analysis is a static analysis to infer which vari- in Section The inter-component communication graph, ables are pointed to by pointer references or heap ref- there are implicit invocations and a variety of communi- erences. In this step, we want to identify all references cation mechanisms on Android. On the basis of the call which are pointing to variables in the application, and graph, we analyze all links between methods and build a all possible values which the variables can be assigned complete communication graph for the application. to. The result of this step is a PointerTable, which There are two kinds of links between two methods, contains mappings from variables to concrete values: invocation links (either explicit or implicit) and com- Set(variables) → Set(values). Set(variables) denotes munication links via Android medium (e.g., Intent and a set of variables which are pointed to with the same message). We first build call chains for the lifecycle Meng et al. Cybersecurity (2018) 1:4 Page 10 of 17 of Android components. For example, one of the call Algorithm 1: Model-based attack detection chains of Android Activity is onCreate → onStart Input: ICCG of the application Input: Attack model {(action , action ,data|control)}, where → onResume, which shows the implicit invocations after i i+1 0 ≤ i < n − 1 the start of the Activity. As a result, the above meth- Output:if ICCG contains attack 1 for action ∈ attack do ods in the call graph will be linked with an invocation 2 if !(ICCG contains action) then edge, respectively. For communication links, we recog- 3 return false; nize the mediums as well as their attributes existing in 4 for i = 0 to #actions − 2 do the methods, and identify which components or methods 5 if flow(action ,action )==data then i i+1 can receive these mediums. Take the Intent medium as an 6 data_flow = TaintAnalysis(action , action ,asset); i i+1 7 if data_flow is not satisfied then example, if we find an action which starts activities, like 8 return false; startActivity(Intent), we retrieve the attributes 9 if flow(action ,action ) == control then i i+1 (e.g., class and action) of the Intent object and identify 10 control_flow = ForwardControlFlowAnalysis(action , which activities can be triggered by this Intent object. As a action ); i+1 11 if control_flow is not satisfied then result, we add a new link between the method which sends 12 return false; out the Intent and the constructor method of the target activities. 13 trigger := BackwardControlFlowAnalysis(action ); 14 if trigger ∈ {Environmental Input} then 15 return true; Graph assembling 16 else By far, we have obtained the control flow graph for each 17 return false; method of the application, and all links between these methods. We take the control flow graphs as nodes, the links as edges, and assemble them into an ICCG. The whether the flows are satisfied or not. At last, we get the graph depicts the execution order and communications trigger causing this attack (Line 13), and check if it is between different methods at the system level, and illus- akindof environmental input, e.g., the initialization of trates the control flow at the method level. Combined application, system broadcast message and a timer task. In with PointerTable, ICCG is passed to the attack detection the following, we will give a more detailed description for phase. Attack detection will search the graph and find out each step. any existing attack. Action recognition We use actions to describe the basic elements in an attack, Attack detection which is semantic but domain-independent. However, we To reduce the search space of attack detection, we will not analyze the program from its entry points. In con- need to define a system of notations in a specific domain verse, we first recognize attack-related actions existing in (here Android), to capture these actions and triggers in the program in a fast way, and perform a bidirectional flow ICCG. On Android, we recognize an action by the cor- analysis from behaviors, which can effectively speedup the responding constraints. Here we define three kinds of predicates to express APIs and constraints in these actions search process. we metinthe code: sig(api), type(arg), and value(arg), Algorithm 1 shows the whole process to check whether where api is an Android API, arg is a variable, and these one attack is contained by the application or not. The algo- predicates will return a comparable constant value. As a rithm takes ICCG of an application, and one attack model consequence, action recognition can be transformed into as the input, and outputs whether the attack model exists a satisfiability problem, in the ICCG. Line 1-3 show that it recognizes all actions existing in the ICCG. If any of actions in the attack is not action |= sig(api) (1) contained in the ICCG, DROIDECHO concludes that the application does not contain this attack. In our implemen- sig(api) |= type(arg) ∩ value(arg) (2) tation, we conduct an one-time retrieval of the ICCG for each application and store all recognized actions. By com- One action is recognized if we detect some APIs which paring the included actions in each attack, we can quickly satisfies the above constraints progressively. Equation 1 eliminate some attacks which will definitely not happen. shows the action can be recognized with an API with the If all actions in the attack model are found in ICCG, specific signature, and moreover, the arguments or the we proceed the reachability analysis and program slicing. base, if any, need to satisfy two kinds of predicates, type Since there are two kinds of flows (referred to control flow and value. As shown in Eq. 2, arg is either the base of the and data flow in program analysis, respectively) defined API (static methods do not have a base), or the arguments. in our attack model, we carry on ForwardControlFlow- Specially, arg may be another invocation of API, i.e., sig. Analysis (Line 10) and TaintAnalysis (Line 6) to determine Therefore, we will recursively solve the constraints until Meng et al. Cybersecurity (2018) 1:4 Page 11 of 17 the action is recognized. Taking the example of obtaining attack confirmation to guide the dynamic execution of the contacts, the essential code at language level of this action application; 3) identify the search space for potential taint can be described as follows: analysis. The forward control flow analysis aims to complete sig(api) = obj.query(uri, ∗) (3) two tasks: 1) determine the occurrence of the subsequent obtain contact actions in an attack model; 2) similar to the backward control flow analysis, identify the search space for the type(obj) = ContentResolver, taint analysis. As a result, we will not search the entire type(uri) = Uri, ICCG during the taint analysis, which is computationally value(uri) = “content : //contacts" expensive. (4) sig(api) = obj.query(uri, ∗) Taint analysis As showninEq. 3, we first need to find a pivotal func- Taint analysis can track the flow of data during detection. tion whose signature matches obj.query(uri, ), Taking privacy leakage as an example, we need to carry and the methods need to meet three constraints: the base on taintanalysistotracktheflowofdata, andifthedata of the invocation obj needs to be an object of the class is flowed to a sink action and sent out eventually. During android.content.ContentResolver,the type of the taint analysis, we get a domain set in a control-flow uri needs to be an object of android.net.Uri,and order SearchDomain = D → D , ... → Dn,and the 1 2 itsvalue needstobe content://contacts as shown source action is located at D after the above steps. Then sr in Eq. 4. The code statements, which together form a we perform a forward data flow analysis on the domain behavior, might have dependency relationship or follow an set SearchDomain.Figure 4 illustrates the ways how the execution order in between. We deal with it as a constraint data can be tainted cross domains. First of all, data in the satisfaction problem, and recognize a behavior with rea- domain D can influence the data in its previous domain soning. The benefits are that we do not need to care about by three methods: return the data at the call site in the theexecution orderofcodeinabehavior,andhenceour previous domains, referring to  1 ;thedata flow  2 shows approach is more general so as to identify more variations. how the data in the latter domain influences the data in its previous domains; and we can assign the data to one com- Reachability analysis & slicing monly shared variable between the domain D as shown If the ICCG contains all necessary elements for one attack, in  3 . There are three possible ways for the data in domain we start to do program slicing from these elements. The D to influence the data in the successive domains: enclose slicing consists of backward and forward control flow s communication medium with data and pass it to the next analysis. The backward control flow analysis aims to com- domainsasshown bythedataflow  4 ;passthedataas plete three tasks: 1) find the root cause that lead to such action, i.e., its entry points. Based on the entry points, we an argument to its successive domains, which are used in can infer the type of the triggers. Then we know whether these domains, referring to  5 ;assign thedatatoacom- the attack is triggered by a user interaction or environ- monly shared variable in between as shown by the data mental inputs; 2) obtain all conditions in a trace from flow . In addition, we take a coarse-grained aliasing the entry points to the action. The conditions are used in analysis in this paper, i.e., if for example a string variable Fig. 4 Taint analysis across multiple domains Meng et al. Cybersecurity (2018) 1:4 Page 12 of 17 is passed to a function, and this function will encrypt the ten with 17,038 lines of Java, and 163 lines of scripts string and return a new encrypted value with a crypto- (Python and Shell). The dynamic confirmation is imple- graphic scheme. Although we do not know how to convert mented based on TaintDroid (Enck et al. 2010)and Intel- the original string to the encrypted one (we do not infer liDroid (Wong and Lie 2016). TaintDroid enables us to the meaning of cryptographic schemes), we can definitely track the information flows of applications. In addition, ensure the operation is reversible, and the returned data is we customized TaintDroid for two purposes. First, we also of sensitive information. intercept the APIs in our Action set to monitor whether they are invoked by the tested applications. Second, we Dynamic attack confirmation intercept the APIs providing the applications with envi- As discussed before, DROIDECHO’s ICCG construction ronmental inputs, such as location and time information, and attack detection are based on static program analysis, where we can return the applications values that would which is less precise than dynamic analysis. As a result, the activate the target behaviors. During the confirmation, we attacks reported by DROIDECHO may be false positives. employ IntelliDroid to generate the call paths for specific Therefore, we introduce a confirmation step to reduce Android APIs as well as conditions that enable the paths. false positives, and the attack confirmation is based on the Then the driver script takes them as input to automatically technique of dynamic testing. drive the execution of the suspicious applications. To esti- An attack candidate, which is passed from the attack mate the overall performance of DROIDECHO,weconduct detection phase to the attack confirmation phase, con- the experiments from three aspects: evaluation on mal- tains an attack trace and the conditions that guarantee ware benchmark, evaluation on real apps and evaluation the occurrence of attacks. Given that, we simulate the on performance. inputs to drive the dynamic execution of the applica- tion and check whether the attack trace can occur in Evaluation on Malware Benchmark the real execution. In order to activate the attack candi- To evaluate the performance of our approach on the infa- date and capture malicious behaviors, we first instrument mous malware, we conduct an experiment on 1260 sam- Android OS by hooking specific Android APIs which are ples of malware of the collection (Zhou and Jiang 2011). included in our attack model, and then generate the trig- According to the types of malware, we filter out 108 of gers which are used to activate the contained malicious them (e.g., Asroot, DroidCoupon and DroidDeluxe) which behaviors. onlyusenativecodetolaunchattacks.Atlast,wesuc- cessfully detect 940 (89.5%) samples, and also show the • Instrumentation. Since the actions in attack model attack type. There are mainly two reasons for the missing are recognized as the invocations of specific Android malware: 1) some malware use reflection to dynamically APIs, we instrument Android OS to monitor the invoke malicious code. For example, AnserverBot loads invocation behaviors. In this paper, we leverage an executable file in its asset folder, retrieves the included TaintDroid (Enck et al. 2010) to determine whether classes and runs the code. 2) some of them leverage com- these APIs are invoked. plicated obfuscation and encryption to confuse AV tools. Triggers. We leverage IntelliDroid (Wong and Lie For example, Geinimi leverages several cryptographic 2016) to generate all triggers leading to specific schemes (e.g., DES) to encrypt the communication and malicious behaviors, and subsequently schedule these strings. triggers to drive the execution of the application. We In addition, we conduct an experiment to compare simply feeds the application with all possible trigger DROIDECHO’s capability of attack detection with Flow- sequences, and in order to eliminate the impossible Droid (Arzt et al. 2014b), which is a static tool in detecting sequences (which never occur during the real privacy leakage. The subjects of this experiments include executions), we exploit the “happen-before” relations a set of open-source Android applications named Droid- among these triggers to generate sequences. 5 Bench , of which the applications may contain the attacks of privacy leakage. Obtaining these inputs, DROIDECHO is able to exe- DROIDECHO successfully detects 34 samples of mal- cute the suspicious applications to determine if the attack ware, while fails to find 8 malicious samples. We provide is reachable. In order to make the exploration faster, Table 3 to illustrate the comparison results, actually only DROIDECHO prunes the paths which rarely lead to the the different results, with FlowDroid. As shown in Table 3, attack trace, which can significantly reduce the search DROIDECHO has an edge in detecting the first six kinds spaceofthe program. of privacy leakage, but cannot detect the last three kind Evaluation of privacy leakage. PrivateDataLeak-1&2 are two appli- cations which steal the text in a password field of an We implement an automatic platform DROIDECHO to facilitate the detection, accordingly. DROIDECHO is writ- Android GUI view. Since the data on GUI components Meng et al. Cybersecurity (2018) 1:4 Page 13 of 17 Table 3 Comparison with FlowDroid measurement for the usage of applications, e.g., Flurry and Crittercism, diagnose the crash of applications, e.g., App DroidEcho FlowDroid Crashlytics, or advertise, e.g, Umeng and Google Ads. ArrayAccess-1&2 TP FP Table 4 shows third-party libraries that are contained in HashMapAccess1 TP FP the applications. ListAccess1 TP FP Ordering1 TP FP False positive analysis To evaluate DROIDECHO’s accu- Unregister1 TP FP racy, we randomly selected 50 samples, and manually Exception-1&4 TP FP identified 4 false positives. Two false positives are because DROIDECHO cannot well handle collection objects such PrivateDataLeak-1&2 FN TP as array, list, and map. If any element in a collection ImplicitFlow-1&2&3&4 FN FN is tainted, DROIDECHO determines the whole collection Reflection-3&4 FN FN object is tainted. One false positive is due to the ignorance of execution conditions of flows. The execution condition may not be satisfied during runtime leading the malicious are hard to be determined to be sensitive, in addition, behaviors cannot be practically triggered. The last false applications which need authentication have to send cre- positive is attributed to the insufficient modelling of per- dentials, such as user input from keyboard, to the remote sistent storage. As an alternative communication channel, server for authentication. As a result, DROIDECHO does persistent storage (e.g., file, database) might contains mul- not track the flow of the data on GUI components. And tiple dimensional data. It is non-trivial to track the flow of last, DROIDECHO and FlowDroid both cannot cope with data in the persistent storage, which will be further studies the last two kinds of applications, where ImplicitFlows are in future. samples which leverage obfuscation techniques to confuse the analysis, and Reflections are two samples which use Evaluation on performance reflection to dynamically invoke methods or fetch fields to In order to evaluate the efficiency and scalability of complete the process of privacy leakage. DROIDECHO, we measured runtime parameters in the previous experiments. The runtime parameters consist Evaluation on real Apps of the complexity of applications and runtime for each We have collected 7643 applications from Google Play, phase of DROIDECHO. And the experiments are con- which are hot and free application in their respective cat- ducted on a Linux Ubuntu 14.04 machine, carrying 12 egories. By running DROIDECHO, we find out 444 appli- cores of Intel Xeon(R) CPU E5-16500, and 16G Mem- cations which have malicious behaviors. In addition, we ory. We depict the complexity of applications from four have done a statistics of behaviors which are user-awared aspects: thefilesizeofapplication, thenumberofnodes, or already claimed by the description of applications. We edges and mediums of the ICCG. We have measured the compare DROIDECHO with other anti-virus (AV) tools, by runtime for pointer analysis, link analysis, action recogni- uploading apk files into VirusTotal (www.virustotal.com). tion and attack detection, respectively. The detailed data Although AV tools have detected 1541 (20.2%) samples of can be found in Table 5. As shown in Column Runtime(ms) malware, most of them are Adware, of which the num- of DroidEcho,DROIDECHO is very effective in detecting ber is up to 1217 (79.0%). Due to the restriction of our approach, we do not provide a detection for Adware. By Table 4 Privacy leakage via 3rd-libraries filtering these applications of Adware, we can also find 149 Library Description Num Behaviors more applications which have malicious behaviors. We investigate the 149 applications which contain mali- Adobe Measurement of Usage 1 Identity Code, etc. cious behaviors, of which 131 applications have privacy Flurry Measurement of Usage 20 Identity Code, Location, etc. leakages, while the remaining applications have other four Conversant Measurement of Usage 1 Identity Code, Location, etc. kinds of malicious behaviors. In particular, 10 applica- Crashlytics Diagnosis of Crash 8 Identity Code, Sys. Info, etc. tions contain service abuse attacks, i.e., sending SMS Map Service Map Service 5 Location, etc. messages without users’ consent; 6 applications contain Crittercism Optimization Tool 1 Identify Code, etc. content tampering attacks, i.e., deleting SMS messages from the inbox; 2 applications are depleting battery by Umeng Advertisement 4 Identity Code, Location, etc. holding Screen lock for a long time. By investigating the Google Ads Advertisement 3 Identity Code, Location, etc. code of these applications, we find that many of them Amazon Ads Advertisement 1 Identity Code, Locatoin, etc. are employing a third-party library which has exposed Millennialmedia Advertisement 2 Identity Code, Location, etc. sensitive information. The third-party libraries may do a Meng et al. Cybersecurity (2018) 1:4 Page 14 of 17 Table 5 Evaluation on performance of DroidEcho ICCG Runtime(ms) of DroidEcho Size (K) Runtime(ms) of Soot #N #E #M Pointer Link Assembling Recognition Detection Total DroidBench 186 15 1 0 46 3 14 11 40 114 24,702 Malware 893 1,327 6,070 5 4,818 108 55 747 2,358 8,086 65,241 Real Apps 5,392 3,900 75,117 10 17,114 611 453 3,742 13,301 35,221 135,763 attacks, with the average time of about 35s to complete written in native code. To date, our work only accepts Java the analysis of a real application. In addition, since we bytecode as the analysis object. leverage Soot to generate the rough call graph and control The limitations of our approach can be largely ascribed flow graphs for each method of applications, the run- to the expressive ability of the attack model. Since the time of Soot should also be considered to complete the detection is based on static analysis, the attack model whole detection. Soot performs a heavy work of reverse proposed in this paper only contains static features of engineering, i.e., converting Android .dex code into Java attacks. As a result, we can detect more attacks by enrich- bytecode, the time spent on that is hence much larger than ing and enhancing the attack model, for example, taking the runtime of DROIDECHO. into account dynamic features of attacks. Discussion Related work Our attack detection is guided with the semantic attack Attack representation models, which describe the essential attack elements com- Chen et al. (Chen et al. 2013) present permission event bined in a logic order. In this way, our approach is general graphs (PEG) to depict API- and permission-related such that we could detect several types of attacks as well as behaviors occurring on Android. In addition, to express their variations on Android. Although it is hard to include the sequence of occurrence of events, they add the tem- exhaustive attack types, considering that zeroday attacks poral order and leverage the LTL to depict a policy occur from time to time, each augmentation of the attack specification. Combining static analysis, model check- model can enhance and increase the ability of detecting ing and runtime monitoring, they are able to detect the attacks significantly. On the other hand, we have improved violation of contextual policies of Android applications. the conventional static analysis on Java with taking into Gunadi and Tiu (Gunadi and Tiu 2013)proposeasecu- account the new features provided by the Android plat- rity policy specification language to describe privilege form. It helps to produce a more complete and compre- escalation on Android. The language is based on met- hensive communication graph for Android applications, ric linear-time temporal logic (MTL) plus an extension of and thereby makes the attack detection more accurate and recursive definitions. It can help to figure out the context- effective. However, considering the flaws of static analysis sensitive privilege for one application. By monitoring the and the experiment results we got, D ROIDECHO still has chain of privilege in runtime, they manage to find out some shortcomings in detecting attacks: the elevated privilege and detect collusion attacks. Aim- ing at privacy issues on Android, Arzt et al. (Arzt et al. Transformation attacks 2014b)reducethemintoanIFDS(Reps et al. 1995)prob- It is a kind of attacks against anti-malware tools and lem, and construct a flow- and context-sensitive graph approaches, with transforming a malware into different to present the entire behavioral system by static analy- forms, but reserving the original logic (Rastogi et al. sis. Graph reachability and value evaluation are performed 2013). Our approach has a sufficient resistance against to figure out whether the messages being sent out are trivial transformation attacks and transformation attacks tainted as sensitive information. Yang et al. (Yang et al. detectable by static analysis (DSA). However, transfor- 2014) propose a two-level behavioral graph (Component mation attacks non-detectable by static analysis (NSA), Dependency Graph and Component Behavior Graph) to e.g., reflection and bytecode encryption, can paralyze our express the program logic. At first, they leverage an unsu- approach, which is also a common issue in static analysis. pervised mining approach to mine the program logic in malware automatically. Based on the mined graphs, they Vulnerability exploits search crawled applications from marketplaces whether We put more attentions on the attacks which invoke they contain any of malicious behaviors or not. Mariconti Android APIs. There exist a kind of attacks which exploit et al. (Mariconti et al. 2016) propose to use Markov Chain the vulnerabilities of Android, and trigger the vulnerabili- to represent malicious behaviors in Android malware, ties by crafting a special input or executing some code in and employ static analysis to identify malicious behaviors. a certain order. It is more difficult when the exploits are AppContext (Yang et al. 2015) proposes two heuristics Meng et al. Cybersecurity (2018) 1:4 Page 15 of 17 (i.e., activating and guarding conditions) to identify mal- 2014) takes into account the inter-component communi- ware, while not classifying malware in terms of attack cation on Android, and constructs an inter-component targets. call graph to link up all components of the application to A handful of works are devoted to identifying user- detect malware of privacy leakage with crafted signatures. intended behaviors. In a PEG, Chen et al. (Chen et al. Different from these two approaches, our approach pro- poses to detect attacks based on semantic model of attacks 2013) define pre-conditions either with or without users’ consents. Although it only focuses on GUI operations, it and use dynamic analysis to confirm their maliciousness. provides a new prospective of learning the essential char- Our approach combines two approaches, static and acteristics of malware. AppIntent, proposed by Yang et al. dynamic analysis, and achieves both advantages of two (Yang et al. 2013), is another work to extract a sequence aspects. We first employ static analysis based on seman- of GUI operations which causes data transmission. They tic models of attacks to quickly find out the potential first reduce the search space by static analysis to avoid malicious applications, with the trigger and the predicates time-consuming, but useless, searching; then the event which cause the occurrence of attacks. Then we leverage sequence is generated after running symbolic execution dynamic analysis to confirm the attacks to reduce false guided by the reduced space. positives. As a result, our approach is effective on large- Our attack model combines the program-level behav- scale tests and reduces the false positive rate via dynamic iors and external inputs (i.e., triggers) to model attacks. attack confirmation. First, on the program level, we consider the combination Conclusion of assets, actions and flows to model a complete attack In this paper, we introduce a novel attack model to depict behavior. In addition, our model is not on an abstract the essential characteristics and features. In addition, we level as most of the previous studies do. It thus can be build a transformation from an Android application to directly mapped to the real implementation of the tested a directed graph, called the inter-component communi- applications, without missing critical details of the attacks. cation graph. ICCG captures all structure information Second, the triggers are taken into our consideration, of application, including call relationships and commu- which can effectively differentiate the benign behaviors nication between different methods, and it contains all from malicious ones. control flow information for each method. Then we pro- pose an effective algorithm to search attacks in ICCG. Attack detection The approach is proved to be feasible and effective in the Attack detection via program analysis can be roughly experiments. In future, we expect to extend our detect divided into two categories: dynamic analysis and static algorithm to handle more complicated obfuscation or analysis. TaintDroid (Enck et al. 2010) tracks the propa- encryption techniques, and will continue enriching the gation of sensitive information on a customized Android, attack model in order to handle more variants or new and determines whether there exists any attack of privacy attacks. leakage. DroidScope (Yan and Yin 2012) and VetDroid Acknowledgments (Zhang et al. 2013) both reconstruct malicious behaviors Kai Chen was supported in part by National Key R&D Program of China (No. by collecting information during the dynamic analysis. 2016QY04W0805), NSFC U1536106, 61728209, National Top-notch Youth However, the difficulty of the deployment of the mon- Talents Program of China, Youth Innovation Promotion Association CAS, Beijing Nova Program and a research grant from Ant Financial. This work is also itor system restricts the scale of attack detection; and partly supported by International Cooperation Program on CyberSecurity, exhaustive test inputs are nearly impossible, which means administered by SKLOIS, Institute of Information Engineering, Chinese attacks may not be triggered and detected sometimes due Academy of Sciences, China (No. SNSBBH-2017111036). to insufficient inputs. Authors’ contributions As a result, more researchers focus on detecting attacks All authors read and approved the final manuscript. via static analysis. FlowDroid (Arzt et al. 2014b)per- Competing interests forms static analysis, specifically dataflow analysis, on the The authors declare that they have no competing interests. code of applications to check if they contain behaviors of Publisher’s Note privacy leakage. IccTa (Li et al. 2015)incorporatesICC Springer Nature remains neutral with regard to jurisdictional claims in analysis to achieve a more complete and accurate detec- published maps and institutional affiliations. tion of privacy leakage. However, these two approaches Author details are only focusing on the attack detection of privacy leak- 1 SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, age. DroidSIFT (Zhang et al. 2014) analyzes the code of Beijing, China. Nanyang Technological University, Singapore, Singapore. 3 4 Singapore Institute of Technology, Singapore, Singapore. School of Cyber applications and constructs behavior graphs to denote the Security, University of Chinese Academy of Sciences, Beijing, China. program logic. Taking the behavior graphs as signatures, DroidSIFT builds a classification system to distinguish Received: 4 January 2018 Accepted: 17 April 2018 benign applications from malware. Apposcopy (Feng et al. Meng et al. Cybersecurity (2018) 1:4 Page 16 of 17 References Luo W, Xu S, Jiang X (2013) Real-time Detection and Prevention of Android SMS Arzt S, Bodden E (2016) StubDroid: Automatic Inference of Precise Data-flow Permission Abuses. In: Proceedings of the First International Workshop on Summaries for the Android Framework. In: Proceedings of the 38th Security in Embedded Systems and Smartphones, SESP ’13. ACM, New York International Conference on Software Engineering. pp 725–735 Mariconti E, Onwuzurike L, Andriotis P, Cristofaro ED, Ross GJ, Stringhini G Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, (2016) Mamadroid: Detecting android malware by building markov chains McDaniel P (2014) FlowDroid: Precise Context, Flow, Field, Object-sensitive of behavioral models. CoRR abs/1612:04433 and Lifecycle-aware Taint Analysis for Android Apps. In: Proceedings of the Octeau D, McDaniel P, Jha S, Bartel A, Bodden E, Klein J, Traon YL (2013) Effective 35th ACM SIGPLAN Conference on Programming Language Design and Inter-Component Communication Mapping in Android: An Essential Step Implementation, Edinburgh. pp 259–269 Towards Holistic Security Analysis. In: Proceedings of the 22Nd USENIX Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, Conference on Security, SEC’13. USENIX Association, Berkeley. pp 543–558 McDaniel P (2014) Flowdroid: Precise context, flow, field, object-sensitive Oliner AJ, Iyer A, Lagerspetz E, Tarkoma S (2012) Collaborative Energy and lifecycle-aware taint analysis for android apps. In: Proceedings of the Debugging for Mobile Devices. In: the 8th Workshop on Hot Topics in 35th ACM SIGPLAN Conference on Programming Language Design and System Dependability. USENIX, Berkeley Implementation, PLDI ’14. ACM, New York. pp 259–269 Orthacker C, Teufl P, Kraxberger S, Lackner G, Gissing M, Marsalek A, Au KWY, Zhou Y, Huang Z, Lie D (2012) PScout: Analyzing the Android Leibetseder J, Prevenhueber O (2011) Android Security Permissions - Can Permission Specification. In: Proceedings of the 2012 ACM Conference on We Trust Them? In: Security and Privacy in Mobile Information and Computer and Communications Security, CCS ’12. ACM, New York. Communication Systems. Springer Berlin Heidelberg, Berlin. pp 40–51 pp 217–228 Pathak A, Hu YC, Zhang M Bootstrapping Energy Debugging on Smartphones: Bosu A, Liu F, Yao DD, Wang G (2017) Collusive Data Leak and More: A First Look at Energy Bugs in Mobile Devices. In: Proceedings of the 10th Large-scale Threat Analysis of Inter-app Communications. In: Proceedings ACM Workshop on Hot Topics in Networks, HotNets-X. ACM, New York. of the 2017 ACM on Asia Conference on Computer and Communications pp 5:1–5:6. https://doi.org/10.1145/2070562.2070567 Security, Abu Dhabi. pp 71–85 Pathak A, Hu YC, Zhang M (2012) Where is the energy spent inside my app? Chen KZ, Johnson NM, D’Silva V, Dai S, MacNamara K, Magrino TR, Wu EX, Fine-grained Energy Accounting on Smartphones with Eprof. In: Rinard M, Song DX (2013) Contextual Policy Enforcement in Android Proceedings of the 7th ACM European Conference on Computer Systems, Applications with Permission Event Graphs. In: 20th Annual Network and EuroSys ’12. ACM, New York. pp 29–42. https://doi.org/10.1145/2168836. Distributed System Security Symposium, NDSS ’13, San Diego. http:// internetsociety.org/doc/contextual-policy-enforcement-android- Prince B New Android Malware Targets Banking Apps, Phone Information: applications-permission-event-graphs Fireeye. http://www.securityweek.com/new-android-malware-targets- Chen QA, Qian Z, Mao ZM (2014) Peeking into Your App without Actually banking-apps-phone-information-fireeye. Accessed 05 Oct 2017 Seeing It: UI State Inference and Novel Android Attacks. In: Proceedings of ProGuard (2017). http://developer.android.com/tools/help/proguard.html. the 23rd USENIX Conference on Security Symposium, SEC’14. USENIX Accessed 03 Dec 2017 Association, Berkeley. pp 1037–1052 Qu Z, Rastogi V, Zhang X, Chen Y, Zhu T, Chen Z (2014) AutoCog: Measuring Enck W, Gilbert P, Chun B-G, Cox LP, Jung J, McDaniel P, Sheth AN (2010) the Description-to-permission Fidelity in Android Applications. In: TaintDroid: An Information-flow Tracking System for Realtime Privacy Proceedings of the 2014 ACM SIGSAC Conference on Computer and Monitoring on Smartphones. In: Proceedings of the 9th USENIX Communications Security. pp 1354–1365 Conference on Operating Systems Design and Implementation, OSDI’10. Rastogi V, Chen Y, Jiang X (2013) DroidChameleon: Evaluating Android USENIX Association, Berkeley. pp 393–407 Anti-malware Against Transformation Attacks. In: Proceedings of the 8th Enck W, Octeau D, McDaniel P, Chaudhuri S (2011) A Study of Android ACM SIGSAC Symposium on Information, Computer and Communications Application Security. In: Proceedings of the 20th USENIX Conference on Security, ASIA CCS ’13. ACM, New York. pp 329–334 Security, SEC’11. USENIX Association, Berkeley. pp 21–21 Reps TW, Horwitz S, Sagiv S (1995) Precise Interprocedural Dataflow Analysis Enck W, Ongtang M, McDaniel PD (2009) Understanding Android Security. IEEE via Graph Reachability. In: Conference Record of POPL’95: 22nd ACM Secur Priv 7(1):50–57 SIGPLAN-SIGACT Symposium on Principles of Programming Languages, F-Secure Lab (2013) Mobile Threat Report, January - March 2013. Technical San Francisco. https://doi.org/10.1145/199448.199462 report Schlegel R, Zhang K, Zhou X, Intwala M, Kapadia A, Wang X (2011) Feng Y, Anand S, Dillig I, Aiken A (2014) Apposcopy: Semantics-Based Soundcomber: A Stealthy and Context-Aware Sound Trojan for Detection of Android Malware Through Static Analysis. ACM, New Year. Smartphones. In: 18th Annual Network and Distributed System Security https://doi.org/10.1145/2635868.2635869 Symposium Shabtai A, Fledel Y, Kanonov U, Elovici Y, Dolev S, Glezer C (2010) Google Grace MC, Zhou Y, Wang Z, Jiang X (2012) Systematic Detection of Capability Android: A Comprehensive Security Assessment. IEEE Secur Priv 8(2):35–44 Leaks in Stock Android Smartphones. In: 19th Annual Network & Distributed System Security Symposium. http://dblp.uni-trier.de/rec/bib/ Symantec Inc. (2017) Internet Security Threat Report. Technical report conf/ndss/GraceZWJ12 Vallée-Rai R, Co P, Gagnon E, Hendren L, Lam P, Sundaresan V (1999) Soot - a Gunadi H, Tiu A (2013) Efficient runtime monitoring with metric temporal Java Bytecode Optimization Framework. In: Proceedings of the 1999 logic: A case study in the android operating system. CoRR abs/1311.2362. Conference of the Centre for Advanced Studies on Collaborative Research, http://arxiv.org/abs/1311.2362 CASCON ’99. IBM Press. p 13. http://dl.acm.org/citation.cfm?id=781995. Hao S, Li D, Halfond WGJ, Govindan R (2013) Estimating Mobile Application 782008 Energy Consumption Using Program Analysis. In: Proceedings of the 2013 Vekris P, Jhala R, Lerner S, Agarwal Y (2012) Towards Verifying Android Apps for International Conference on Software Engineering, ICSE ’13. IEEE Press, the Absence of No-Sleep Energy Bugs. In: Proceedings of the 2012 USENIX Piscataway. pp 92–101 Conference on Power-Aware Computing and Systems, HotPower’12. USENIX Association, Berkeley. pp 3–3 Hilgers C, Macht H, Müller T, Spreitzenbarth M (2014) Post-Mortem Memory Analysis of Cold-Booted Android Devices. In: Proceedings of Wei F, Roy S, Ou X, Robby (2014) Amandroid: A Precise and General the 2014 Eighth International Conference on IT Security Incident Inter-component Data Flow Analysis Framework for Security Vetting of Management & IT Forensics, IMF ’14. IEEE Computer Society, Washington. Android Apps. In: Proceedings of the 2014 ACM SIGSAC Conference on pp 62–75 Computer and Communications Security. pp 1329–1341 Lhoták O, Hendren L (2003) Scaling Java Points-to Analysis Using SPARK. In: Wong MY, Lie D (2016) IntelliDroid: A Targeted Input Generator for the Proceedings of the 12th International Conference on Compiler Dynamic Analysis of Android Malware. In: 23rd Annual Network & Construction, CC’03. Springer-Verlag, Berlin. pp 153–169 Distributed System Security Symposium Li L, Bartel A, Bissyandé TF, Klein J, Traon YL, Arzt S, Rasthofer S, Bodden E, Xing L, Pan X, Wang R, Yuan K, Wang X (2014) Upgrading Your Android, Octeau D, McDaniel PD (2015) IccTA: Detecting Inter-Component Privacy Elevating My Malware: Privilege Escalation Through Mobile OS Updating. Leaks in Android Apps. In: 37th IEEE/ACM International Conference on In: IEEE Security & Privacy Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Xu K, Li Y, Deng RH (2016) ICCDetector: ICC-Based Malware Detection on Volume 1. pp 280–291 Android. IEEE Trans Inf Forensics Secur 11(6):1252–1264 Meng et al. Cybersecurity (2018) 1:4 Page 17 of 17 Xuxian J, Yajin Z (2013) Android Malware. SpringerBriefs in Computer Science Yan LK, Yin H (2012) DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis. In: USENIX Security. USENIX Association, Berkeley. pp 29–29 Yang C, Xu Z, Gu G, Yegneswaran V, Porras PA (2014) DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications. In: 19th European Symposium on Research in Computer Security. Springer International Publishing. pp 163–182 Yang W, Xiao X, Andow B, Li S, Xie T, Enck W (2015) AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context. Proceedings of the 37th International Conference on Software Engineering. pp. 303–313 Yang Z, Yang M, Zhang Y, Gu G, Ning P, Wang XS (2013) AppIntent: Analyzing Sensitive Data Transmission in Android for Privacy Leakage Detection. In: Proceedings of the 2013 ACM SIGSAC conference on Computer and Communications Security, CCS ’13. ACM, New York. pp 1043–1054 Zhang M, Duan Y, Yin H, Zhao Z (2014) Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In: Proceedings of the 21th ACM Conference on Computer and Communications Security, CCS ’14, Scottsdale Zhang M, Yin H (2014) Efficient, Context-aware Privacy Leakage Confinement for Android Applications Without Firmware Modding. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security (ASIACCS’14), Kyoto Zhang Y, Yang M, Xu B, Yang Z, Gu G, Ning P, Wang XS, Zang B (2013) Vetting Undesirable Behaviors in Android Apps with Permission Use Analysis. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS ’13. ACM, New York. pp 611–622. https:// doi.org/10.1145/2508859.2516689 Zhou Y, Jiang X (2011) An Analysis of the AnserverBot Trojan. Technical report. http://www.csc.ncsu.edu/faculty/jiang/pubs/AnserverBot_Analysis.pdf Zhou Y, Jiang X (2012) Dissecting Android Malware: Characterization and Evolution. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12. IEEE Computer Society, Washington. pp 95–109

Journal

CybersecuritySpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off