TonTon H.-D. Huang, C.-M. Yu, "Poster: Adaptive Data-Driven and Region-Aware Detection for Deceptive Advertising", IEEE Symposium on Security and Privacy 2016, San Jose, CA, May 23-25, 2016. 

TonTon H.-D. Huang, C.-M. Yu, and H.-Y. Kao, "Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection", 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2017), Taipei, Taiwan, Dec. 1-3, 2017.

This research project has been presented at HadoopCon 2016 
and published on

Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection

TonTon H.-D. Huang*, Chia-Mu Yu**, and Hung-Yu Kao***

Leopard Mobile (Cheetah Mobile Taiwan Agency)*
Department of Computer Science and Engineering, National Chung Hsing University, Taiwan**
Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan* ***
TonTon (at) TWMAN.ORG*

Federal Trade Commission (FTC) indicated that the phone scam is the most popular type of scams in United States and there are more than 150 millions of disputes. In particular, in 2014, the amount of monetary loss from the phone scam is more than 1.7 billion. 46% of victims can clearly indicate how they are tricked but 54% of victims claimed to be tricked by phone scam. At the same time, the strike against the robocalls by FTC is supported by primary companies such as AT&T and Google, etc. On the other hand, there are cases where the college students in China are tricked to give away their tuition fees. According to the report, approximately 1.6 millions of people are conducting the business of the scam and the revenue of this business is more than 110 billion RMB. In addition, there are similar cases in Japan where the amount of monetary loss is more than 50 billion Yen. As shown in Fig. 1, in the case of non-repeated calling numbers, all of the phone scams intensively occur during the weekdays. More specifically, compared to such an intensive amount of phone scams during the weekdays, only approximately one third of phone scams occurs during the weekend in US, India, and Taiwan. In the case of repeated calling numbers, the situation remains unchanged; very little portion of phone scams occurs during the weekend.

FTC announced a list of dialing numbers for reference; once the call is from those numbers, it is likely to be a harassing call. Nonetheless, the reality is far more complicated; for example, the list announced by FTC is not working for the call from the non-US area. Actually, phone scams can be categorized as follows:
1) Free Vacations and Prizes
2) Loan Scams
3) Phony Debt Collectors
4) Fake Charities
5) Medical Alert/Scams
6) Targeting Seniors
7) Warrant Threats
8) IRS Calls

For phone scams, our core product has 80 million daily active users, and 23 million daily active users have received phone call from strangers which including 10%  phone scams. Fig. 13 and Fig. 14 show the average of collection of phone numbers in 1 day. We found that during the working time, the so-called malicious calling such as harassing, telemarketing, scams and insurance could be a huge portion. We also found that the normal calling from, for example, express, service center, occupy only less than 10%. According to the data above, we create and adapt our DNN model. 
Fig. 13 / Fig. 14

Fig. 15 shows our mini experiment results, which are analyzed by Logistic Regression, Decision Tree, Random Forest and SVM algorithms with DNN. We found that accuracy and precision of DNN are more stable. Thus we include more countries in out experiment, according to our experiment, we can perceive except for the arithmetic mean of deep neural network has reached 85%, the rests of the Logistic Regression, Decision Tree, Random Forest algorithm only reach between 70% to 80%, and SVM only reached 75% (show as Fig. 16).
Fig. 15 | Fig. 16

Moreover, the standard deviation is applied to evaluate the stability of each learning method. Our DNN model has a low standard deviation which indicates the data points tend to be close to the mean. With the consideration of long term defense and system maintenance of phone scams and the consideration of the detection accuracy of deep neural network, we are pretty sure that the adapted deep learning approach is better than conventional machine learning approaches.

The future work is the improvement of our deep learning model and reduce the complex task and train a high performance to faced the phone scams from a huge amount of computation burden.