从图像的二进制数据中提取特征的工具 [英] Tools for Feature Extraction from Binary Data of Images

查看:199
本文介绍了从图像的二进制数据中提取特征的工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一个项目中,我的图像文件格式不正确(模糊不清,即其图像数据已更改).这些文件在各种平台上呈现时会导致来自平台的警告/崩溃/通过报告.

I am working on a project where I am have image files that have been malformed (fuzzed i.e their image data have been altered). These files when rendered on various platforms lead to warning/crash/pass report from the platform.

我正在尝试使用无监督的机器学习来构建防护罩,这将有助于我将这些映像识别/分类为恶意还是非恶意.我有这些文件的二进制数据,但是我不知道可以从中识别出哪些featureSet/样式,因为从视觉上看这些图像可以是任何东西. (我需要能够从二进制数据中找到功能集)

I am trying to build a shield using unsupervised machine learning that will help me identify/classify these images as malicious or not. I have the binary data of these files, but I have no clue of what featureSet/patterns I can identify from this, because visually these images could be anything. (I need to be able to find feature set from the binary data)

我需要一些有关从二进制数据中自动提取特征的工具/方法的建议;我可以在无监督学习算法(例如Kohenen的SOM等)中使用的功能集.

I need some advise on the tools/methods I could use for automatic feature extraction from this binary data; feature sets which I can use with unsupervised learning algorithms such as Kohenen's SOM etc.

对此我是陌生的,任何帮助都会很棒!

I am new to this, any help would be great!

推荐答案

我认为这不可行.

问题在于这些都是古老漏洞利用,而对其进行的培训不会告诉您有关 future 漏洞利用的更多信息.因为这是一个极其不平衡的问题:没有一个漏洞利用相同的东西.因此,即使您生成相同类型的多个文件,最终也可能会为每个漏洞利用一个相关的单个培训案例.

The problem is that these are old exploits, and training on them will not tell you much about future exploits. Because this is an extremely unbalanced problem: no exploit uses the same thing as another. So even if you generate multiple files of the same type, you will in the end have likely a relevant single training case for example for each exploit.

尽管如此,您需要做的是从文件元数据中提取功能.这是漏洞利用的地方,而不是实际的形象.因此,解析文件已经是问题所在,并且您的检测工具可能很容易受到这种攻击.

Nevertheless, what you need to do is to extract features from the file meta data. This is where the exploits are, not in the actual image. As such, parsing the files is already much the area where the problem is, and your detection tool may become vulnerable to exactly such an exploit.

由于数据可能会被压缩,因此幼稚的二进制功能也将无法正常工作.

As the data may be compressed, a naive binary feature thing will not work, either.

这篇关于从图像的二进制数据中提取特征的工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆