R2-D2


TonTon Huang*, and Hung-Yu Kao, R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections, arXiv:1705.04448v1

This work would not have been possible without the valuable dataset offered by Leopard Mobile, Cheetah Mobile and Security Master (CM Security(a.k.a CM Security, an Android application). Special thanks to Dr. Chia-Mu Yu for his support on this research.

This research had been accepted on OWASP AppSec USA 2017 , September 19th-22nd, 2017, Orlando, FL | September 22, 10:30am~11:15am (GMT -4)

R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

Machine Learning (ML) has found it particularly useful in malware detection. However, as the malware evolves very fast, the stability of the feature extracted from malware serves as a critical issue in malware detection. The recent success of deep learning in image recognition, natural language processing, and machine translation indicates a potential solution for stabilizing the malware detection effectiveness. We present a color-inspired convolutional neural network-based Android malware detection, R2-D2, which can detect malware without extracting pre-selected features (e.g., the control-flow of op-code, classes, methods of functions and the timing they are invoked etc.) from Android apps. In particular, we develop a color representation for translating Android apps into rgb color code and transform them to a fixed-sized encoded image. After that, the encoded image is fed to convolutional neural network for automatic feature extraction and learning, reducing the expert’s intervention.We have run our system over 800k malware samples and 800k benign samples through our back-end (60 million monthly active users and 10k new malware samples per day), showing that R2-D2 can effectively detect the malware. Furthermore, we will keep our research results on http://R2D2.TWMAN.ORG if there any update.

The latest version of this research had been accepted on RuxCon 2017, October 21-22, Melbourne, Australia | 2017/10/22, 15:00 (GMT +11)

Look Ransomware is There: Large Scale Ransomware Detection with Naked Eye

Ransomware such as WannaCrypt and Petya have caused significant financial loss and even have endangered human life (e.g., ransomware attack on UK hospitals). Ransomware on desktop has gained much attention from academic and industry. However, we see that the number of ransomware on Android phones remains steady increasing, but gains much less attention. As Android has been the most popular smartphone OS and a substantial number of credentials are kept only in smartphones, the data loss incurs serious inconvenience and loss. Here, we present our deep learning-based ransomware detection system, coloR-inspired convolutional neuRal network-based androiD ransomware Detection (R2D2). R2D2 was originally developed to sweep the malware, but we found it particularly useful in detecting ransomware. A unique feature is its end-to-end training, without human intervention. Such an end-to-end training points out a direction that we no longer need tedious search for roust ransomware features for detection. Most importantly, based on R2D2, we develop techniques to encode ransomware as so-called ransomware image, such that the ransomware from the same family exhibit the same pattern and even non-experts can detect and even determine the ransomware family with their the naked eye.


Traditional Chinese: http://www.cmcm.com/blog/tw/security/2017-05-31/1062.html
Simplified Chinese: http://www.cmcm.com/blog/cn/security/2017-05-31/1063.html


Android Color images (Sample Datasets): Malware / Benign



R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections



TonTon Huang*, and Hung-Yu Kao**

Leopard Mobile Inc.*
Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan* **

TonTon (at) TWMAN.ORG*

Abstract

Machine Learning (ML) has found it particularly useful in malware detection. However, as the malware evolves very fast, the stability of the feature extracted from malware serves as a critical issue in malware detection. The recent success of deep learning in image recognition, natural language processing, and machine translation indicates a potential solution for stabilizing the malware detection effectiveness. We present a color-inspired convolutional neural network-based Android malware detection, R2-D2, which can detect malware without extracting pre-selected features (e.g., the control-flow of op-code, classes, methods of functions and the timing they are invoked etc.) from Android apps. In particular, we develop a color representation for translating Android apps into rgb color code and transform them to a fixed-sized encoded image. After that, the encoded image is fed to convolutional neural network for automatic feature extraction and learning, reducing the expert’s intervention.We have run our system over 1 million malware samples and 1 million benign samples through our back-end (600 million monthly active users and 10k new malware samples per day), showing that R2-D2 can effectively detect the malware.


Smartphones have been widespread worldwide. Among them, Android is the most commonly used operating system (OS), and is still expanding its territory. According to the report by International Data Corporation (IDC), Android phones have gained a market share from 84.3% in 2015Q2 to 86.8% in 2016Q3 (see Fig. 1) [1]. Android is featured by its openness; users can choose to download apps from Google Play and third-party marketplace. However due to the popularity and openness, Android has attracted attacker’s attention. In particular, malicious software (malware) can easily spread and infect benign devises. The Security Report of AV-TEST Institute reports that while the number of malware increased from 17 million in 2005 to over 600 million in 2016, the percentage of Android malware has a significant increase from 3.19% in 2015 to 7.48% in 2016Q2. Among them, Trojans targeting at stealing user data occupy 97.49%.

We also found that Android malware has dominated 99.87% on the number of malware on mobile platform [2]. Fig. 2 shows the statistics collected from our back-end system in January 2017, where for countries such as US, UK and France etc., there were more than 50,000 users infected daily. Moreover, from our experience, the number of Android malware sharply increased from 1,000,000 in 2012 to 2.8 million in 2014. Due to the serious security problem caused by Android malware, we propose the color-inspired convolutional neural networks (CNN)-based Android malware detection, R2-D2, to detect Android malware. R2-D2 is different from existing solutions. It is featured by its end-to-end learning process. More specifically, in contrast to the prior solutions that require manual process of feature selection and parameter configuration, R2-D2 can reduce the amount of manpower and time. Fig. 3 shows the trend of different malware families particularly in China and India. Fig. 4 shows the market share of different cellphone models. Based on the above statistics, we confirm that even the same malware family will exhibit different behaviours in different geographic regions. Our dataset can handle this problem.

 
Figure 1 / Figure 2

 
Figure 3 / Figure 4

As mentioned above, the majority of the malware detection still relies on the static analysis of source code and the dynamic analysis through monitoring the execution of malware. However, these approaches are known to have better detection accuracy for the same family of malware only. In reality, Android malware has dramatically grown in numbers and mutates with fast speed and with various anti-analysis techniques. All of these characteristics make the accurate detection extremely difficult. Thus, we attempt to find out the hidden relationship between the program execution logic and the order of function calls behind the malware by taking advantage of the deep learning in order to accurately detect known and even unknown malware.

In fact, a huge amount of human labor will instead perform feature engineering and model before the detection model built. To ease the model training, we adopt deep learning to construct an end-to-end learning-based Android malware detection, which is termed as R2-D2 (R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections). Finally, we reach a color image (shown in Fig. 5 and 6), and the images are fed to CNN to train a model to detect Android malware. Fig. 7 is our system architecture.

Our proposed R2-D2 possesses the following advantages:

R2-D2 translates classes.dex, the core of the execution logic of Android apps, into RGB color images, without modifying the original Android apps and without extracting features from apps. In our experiments, only 0.4 seconds suffice to translate an app into a color image. Such translation is also featured by the fact that more information in apps can be preserved in the color image compared to the grayscale image.

Though the fully connected layer of DNN can be used to handle the fast mutation, the CNN in R2-D2 actually is more suitable for capturing the malware, because of its features such as local receptive fields and shared weights that can not only significantly reduce the number of model parameters but also represent the complex structure of Android malware.

The filter, pooling, and non-linear activation functions in CNN do not extract features from image for pattern recognition. Instead, the raw pixels are represented by multi-dimensional matrices.

 
Figure 5 / Figure 6


Figure 7

Based on our collected data, we evaluated the detection accuracy and performance with different optimization model techniques. Note that the learning rate is fixed to be 0.01. The model optimization techniques are stochastic gradient descent (SGD), Nesterov Accelerated Gradient (NAG), AdaDelta and AdaGrad. From our experiment, we found that Inception-v3 (shown in Fig. 9) is almost always better than AlexNet (shown in Fig. 10). With such observation, we further compared Inception-v3 with AdaDelta and Inception-v3 with AdaGrad, and it is found that SGD is best suitable for our use. In particular, it results in the sharpest increase in accuracy and sharpest decrease in loss. In the end, we reached 98.4225% and 97.7081% accuracy.


 
Figure 9 / Figure 10

 Google Play Sample Analysis Results | VirusTotal Benign Sample Analysis Results | CM Security Benign Sample Analysis Results

 Google Play Sample Analysis Results | VirusTotal Malware Sample Analysis Results | CM Security Malware Sample Analysis Results

  


 
Figure 11 / Figure 12

The evaluation metrics in our experiment include True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN), Accucarcy (Acc), Precision (Prec), Recall (Detection Rate, DR), False Positive Rate (FPR) and F1-score (F-measure). The  evaluation results are shown in Figure 13. 


Figure 13

This research adopts deep learning to construct an end-to-end learning-based Android malware detection and proposed a color-inspired convolutional neural network (CNN)-based Android Malware detection, labelled as R2-D2. The proposed proof-of-concept system has been tested in our internal environment. The results show that our detection system works well in detecting known Android malware and even unknown Android malware. Also, we have published the system to our core product to provide convenient usage scenarios for end users or enterprises. The future work is to reduce the complex task and train for higher performance in confronting the Android malware, avoiding from a huge amount of computation burden. The experiment material and research results are shown on the website http://R2D2.TWMAN.ORG if there are any updates.