基于随机森林的恶意代码检测

发布时间:2019-10-23 01:13:00

基于随机森林的恶意代码检测戴逸辉,殷旭东(常熟理工学院计算机科学与工程学院,江苏常熟 215500)摘 要:面对恶意代码高速增长、变种繁多的现状,使用了基于随机森林的恶意代码分类方法。通过将IDA反汇编工具生成的ASM文件,利用灰度共生矩阵提取ASM恶意代码灰度图的纹理特征,通过ASM 文件OpCode 序列的3-gram特征,再结合随机森林算法对特征进行分类。对9种恶意代码家族的样本进行实验认证,获得混淆矩阵,分析随机森林的分类效果,并与朴素贝叶斯算法和K近邻分类算法进行比较。实验结果表明:随机森林算法是一个优秀的用于恶意代码分类检测的算法,上述两类特征抽取的方法均能有效地进行恶意代码的检测工作,且将两种特征的随机森林结合时,其分类效果更佳。关键词:随机森林;恶意代码检测;多种特征;机器学习中图分类号:TP391文献标识码:AMalicious code detection based on random forestDai Yihui,Yin Xudong(S ch o ol of Com p u te r Sc i en ce a nd Eng in ee ring, C hangs hu Inst i tute of Technology, JiangsuChangshu 215500)Abstract: A method in the detection of malicious code based on random forest is used in this paper in the face of the rapid growth and variations of malicious code. The texture features of ASM code grayscale map through GLCM and 3-gram features through ASM OpCode list were obtained by using the ASM files generated by the IDA disassembler tool. Then using random forest algorithm and these features, malicious code can be classified. The samples of nine malicious code families were used for the experiment to get the confusion matrix, analyze the classification effect of random forest .The random forest is also compared with the Naive Bayes and the k-Nearest Neighbour in this experiment. The results show that the random forest algorithm is an excellent algorithm for classification detection of malicious code. Both of two feature extraction methods above can works effectively, and even works better when they are combined.Key words: random forest; malicious code detection; a variety of features; machine learning1 引言随着技术的发展,恶意代码的种类、数量、规模在不断增加。据国家计算机网络应急技术处理协调中心的监测数据[1],2017年4月,境内感染网络病毒的终端数为近144万个,较上月增长12.7%;在捕获的新增网络病毒中,按名称统计新增10个;按恶意代码家族统计新增3个,较上月增长50.0%;境内8978万余个用户感染移动互联网恶意程序,恶意程序累计传播次数75万余次。虽然恶意代码的规模不断增大,但是大多70

基于随机森林的恶意代码检测

相关推荐