In order to emphasize the originality of our research, we investigate topics related to malware detection and collusion detection techniques. Specifically, in this section, we review related work in two areas: machine learning-based Android malware detection and machine learning-based Android app-collusion detection.
Machine learning-based android malware detection
Many studies have explored the use of machine learning for detecting Android malware. Typically, these detection methods involve extracting features from the Android application package20. These features are obtained through static analysis, dynamic analysis, or a combination of both, referred to as hybrid analysis.
In static analysis, features are extracted from application components without executing the application21. This process typically involves decompressing the APK file to access various objects for analysis. Key objects include the AndroidManifest.xml file, which provides information on app permissions, API calls, package names, referenced libraries, and application components such as intents, activities, services, and broadcast receivers. Another crucial object is the classes.dex file, containing all the compiled Android classes20,22. Among the most commonly used features for detecting Android malware are app permissions. In 2014, Tchakounte et al.23 introduced a method for characterizing and detecting Android malware based on permissions. Nida et al.24 utilized machine learning algorithms to classify Android applications as either malware or benign based on permissions and API-based features. Chrysikos et al.25 developed a machine learning framework to analyze and classify malicious applications into families based on their permissions.
Dynamic analysis involves executing an application in either a real or virtual environment to gather behavioral features21. These features can include network traffic, battery consumption, CPU usage, IP addresses, and opcodes, among others. For instance, DATDroid26 utilized an emulator to collect runtime data such as system calls, CPU and memory usage, and network packets from Android applications1. This data was then analyzed using a Random Forest algorithm to differentiate between malicious and benign apps.
Deep learning models excel in adapting to the evolving landscape of cyber threats through feature representation learning. Millar et al.27 extracted input features from raw data, including low-level opcodes, app permissions, and proprietary Android API package usage. They employed deep learning models to select, rank, and refine these input features. The three sets of derived features were then combined and fed into a multilayer perceptron (MLP) to predict malicious software. Arvind et al.9 developed a technique for selecting features that are then used for detecting Android malware. Sana et al.10 present a fast, scalable, and accurate mechanism for obfuscated Android malware detection based on the Deep learning algorithm using real and emulator-based platforms.
Recent research has explored using Convolutional Neural Networks (CNNs) for Android malware detection by converting app binaries into images. In 2023, Tchakounte et al.22 extracted opcode sequences from DEX files, split them using n-grams, and encoded them into m-bit vectors with SimHash. These vectors were converted into gray-scale images and analyzed using Singular Value Decomposition (SVD) to create feature vectors for malware detection with CNNs. In 2024, Benedict et al.28 used Hilbert space-filling curves to transform Bytecode extracted from Dalvik Executable (DEX) into grayscale images, achieving high accuracy. However, using entire DEX files for image generation can introduce significant noise.
While machine learning approaches for malware detection have proven accurate and efficient in detecting Android malware, they do not take into account the existence of colluding android applications and will hence classify them benign applications.
Static analysis android app-collusion detection techniques
Many approaches in the literature use static analysis for Android app-collusion detection. Static analysis involves the examination of code without its execution29. In the context of Android apps, this analysis is conducted by inspecting the source code without running the application21. Using static analysis techniques, it becomes possible to perform behavioral analysis on an application, thereby detecting whether it is benign or malicious.
Bugiel et al.30 proposed XManDroid, which is the first approach developed for detecting collusion attacks in Android platforms. XMandroid specifically focuses on detecting privilege escalation in scenarios involving pending intents and transmission channels between dynamically constructed components, such as broadcast receivers. FUSE31 is a tool that addresses the limitations of single-app static analysis over multi-app analysis. The approach described by the authors begins with the analysis of individual apps and the storage of relevant information. Subsequently, this information is combined to detect collusion based on a restricted policy engine. In their work6, Liu et al. introduce MR-Droid, a framework designed to identify inter-app communication threats, including intent hijacking, intent spoofing, and collusion. MR-Droid proposes a scalable approach based on the MapReduce paradigm to facilitate compositional app analysis on a larger scale. IccTA32 is a static taint analyzer designed to identify privacy leaks that occur between components within Android apps. Bhandar et al.33 introduce an automaton framework for detecting intent-based collusion among apps. The framework includes a static inter-app analysis tool that can analyze multiple apps simultaneously and detect potentially colluding apps. Casolare et al.34,35, introduce a method that relies on model checking. This method involves representing an Android application as an automaton and leveraging a set of logical properties to minimize the need for comparisons. Additionally, they use another set of properties, which are automatically generated, to effectively identify colluding applications.
Static analysis is faster as it does not require code execution, and allows inspecting all app code paths and components. However, static analysis suffers from code segments that are only executed under certain conditions/inputs, dynamic code loading20,36, and different obfuscation methods20,37. In addition, Static analysis relies on predefined rules and patterns, making it challenging to detect unknown or novel patterns that were not anticipated during the rule creation. Machine learning algorithms have the potential to discover and adapt to new patterns based on the data they are trained on.
Machine learning-based android app-collusion detection
A range of studies have explored machine learning techniques for permission-based Android app-collusion detection.
Asavoae et al.38 mentioned that collusion can cause information theft, money theft or service misuse. They defined collusion between apps as some set of actions executed by the apps that can lead to a threat. They proposed two approaches to identify candidates for collusion. One is a statistical approach using machine learning, which estimates the likelihood of collusion within a set of apps and other is a rule-based approach developed in Prolog.
Kalutarage et al.8 first described the ML-based technique to identify Android app-collusion. There are two components to the procedure. The first section uses a simple informative naïve Bayes classifier with a beta before calculating the collusion threat. The second section ascertains whether two or more apps are communicating with one another.
A technique for identifying app collusion using audio signals is presented by Casolare et al.12. Using audio signal processing techniques, Casolare’s method entails turning an executable application into an audio file and extracting a set of numerical attributes from each sample. Using this data, they develop various machine learning models and assess how well they detect app collusion.
Some studies have explored a two-stage classifier for detecting app-collusion in Android smartphones. Faiz et al.39 introduced a detection approach for colluding app-pairs by combining the naıve Bayes algorithm with the likelihood method. Additionally, the authors proposed an alternative method that consists of a perceptron and logistic regression to detect Android app-collusion40. In40, Faiz et al. used 13 critical permissions frequently requested by both Android malware and colluding app-pairs. In the first stage, they used a dataset that consisted of 5000 benign and 3000 malicious applications to train the model and a test set of 2000 benign and 207 malicious applications. The obtained model was further tested on three sets of malicious applications of sizes 1260, 247, and 154 obtained from41. In the second stage, for testing of app-pairs, they used two sets of 120 colluding app-pairs obtained from42. They then detect application collusion using the parameter vector and a basic judgment algorithm.
Faiz et al.43 proposed a system for detecting Android malware using a hybrid classification approach involving K-means clustering and Support Vector Machine (SVM). Two datasets42,44 were used in the first stage, resulting in Data1 (13,176 training apps and 1860 test apps) and Data2 (12,028 training apps and 3008 test apps). In the second stage, a dataset of 120 colluding app-pairs was considered, as the researchers believed these pairs could pose similar risks as Android malware. Application collusion was identified using a parameter vector and a basic judgment algorithm.
Other approaches involve monitoring system parameters such as memory consumption and CPU clock speed to detect anomalies indicative of collusion attacks (Khokhlov et al.46). Various machine learning techniques, including feed-forward neural networks and long short-term memory models, have been explored for this purpose (Khokhlov et al.46).
Table 1 serves as a concise summary of existing approaches that use machine learning for Android App-collusion Detection. In previous research, many techniques relied on permission-based feature sets for detecting collusion. Permission-based methods are commonly employed due to their efficiency and high detection accuracy. Analyzing permissions before app installation could prevent harm to the device. Permissions play a crucial role in the swift identification of colluding applications. However, there is a need to focus on utilizing only the essential permissions to enhance detection accuracy. Additionally, reducing the inclusion of ineffective permission features can decrease computational complexity. Also existing techniques do not take into account the existence of generic malware applications (single-app malware) and will hence classify them as benign applications.
Among these studies, our research is distinctive as it employs a comprehensive set of classifiers including ANN, DNN, and ensemble classifiers, and uniquely attempts to distinguish between generic malware (single-app malware) and colluding Android applications. Notably, it stands out by achieving high performance metrics such as an accuracy of 96.91%, all while using fewer permissions based features. Furthermore, our study underscores the effectiveness of a multi-classifier approach, validated through rigorous performance metrics and comparison with other studies.
Summary of literature review
The literature review section encompasses three main areas: Machine Learning-based Android Malware Detection, Static Analysis Android App-collusion Detection Techniques, and Machine Learning-based Android App-collusion Detection.
Firstly, in the domain of Machine Learning-based Android Malware Detection, various methodologies have been explored, highlighting the advantages of machine learning in accurately identifying malware. However, existing techniques do not take into account the existence of colluding android application and hence will classify them as benign applications.
Secondly, the review of Static Analysis Android App-collusion Detection Techniques reveals that while static analysis provides a computationally efficient way to detect app collusion, it often fails against sophisticated obfuscation techniques and cannot dynamically analyze app behavior.
Lastly, Machine Learning-based Android App-collusion Detection shows promise with advanced algorithms, but it should focus on essential permissions to improve accuracy and reduce computational complexity. Also, Current techniques often overlook generic malware (single-app malware), and will hence classify them as benign applications.
To address these issues, we propose a scheme that leverages the power of permissions to enable a more computationally efficient and rapid detection approach.