TonTon H.-D. Huang, C.-M. Yu, "Poster: Adaptive Data-Driven and Region-Aware Detection for Deceptive Advertising", IEEE Symposium on Security and Privacy 2016, San Jose, CA, May 23-25, 2016. 
TonTon H.-D. Huang, C.-M. Yu, and H.-Y. Kao, "Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection", 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2017), Taipei, Taiwan, Dec. 1-3, 2017.
This research project has been presented at 
2016 CSA Taiwan Summit and published on

Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection

TonTon H.-D. Huang*, 
 Chia-Mu Yu**, and Hung-Yu Kao***

Leopard Mobile (Cheetah Mobile Taiwan Agency)*
Department of Computer Science and Engineering, National Chung Hsing University, Taiwan**
Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan* ***
TonTon (at) TWMAN.ORG*

The advance of mobile equipment and network technology boosts the need of mobile marketing and mobile advertising. As the number of installed Cheetah Mobile featured Apps has reached approximately 3 billions while the number of active users is roughly 700 millions, Cheetah Mobile Security Research Lab found that the deceptive advertising (deceptive ads) with the use of false or misleading statements in advertising texts and figures varying with regions and time zones. The deceptive ads tricks users to install unnecessary Apps and will cause the reputation loss of the advertisers. However, the detection of such deceptive ads is a challenging task; deceptive ads exhibit fast-flux behavior and therefore is more difficult to be caught. As Alpha GO has proved the value of deep learning in pattern recognition and text analysis. So, Cheetah Mobile Security Research Lab, based on our customized feature extraction and fine tune a model, will shares with you in this talk our experience in developing effective mechanism for detecting deceptive ads. Our proposed system has been deployed in our testbed and featured products for intensive analysis and has shown that such hybrid approach yields acceptable results based on our massive real dataset. Moreover, we will also shares with you how to do the right things to increase the number of active users with the right ways to do the mobile advertisements.

The Fig. as follow shows the average of collection of URLs in 10 days. Take day 1 as an example, there are 178,255 unique URLs and 175,926 of them are successful screenshots. The number of failed screenshots, unique screenshots, and repetitive screenshots is 2,329, 170,512 and 5,414, respectively, etc. Usually, we can collect 150 thousand URLs and screenshots per day.


The Fig. shows our experiment results, which are analyzed by Logistic Regression, Decision Tree, Random Forest and SVM algorithms with Inception-V3. According to our experiment, we can perceive except for the arithmetic mean of Inception-v3 has reached 90%, the rests of the Logistic Regression, Decision Tree, Random Forest algorithm only reach between 70% to 85%, and SVM only reached 50%. Moreover, the standard deviation is applied to evaluate the stability of each learning method. The Inception-V3 has a low standard deviation which indicates the data points tend to be close to the mean. With the consideration of long term defense and system maintenance of deceptive ads and the consideration of the detection accuracy of Inception-V3, we are pretty sure that the adapted deep learning approach is better than conventional machine learning approaches. Furthermore, our field test shows that Kaspersky, AVG, Avast, ESET, Chrome fail to detect the deceptive ads.

We have publish to our core product to provide convenient usage scenarios for end-users or enterprise (show as follow).