使用文件签名的文件类型/扩展名 [英] File types / Extensions using a file signature

查看:453
本文介绍了使用文件签名的文件类型/扩展名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我目前正在编写一种搜索工具,用于搜索存储在目录中的文件.现在,我正在尝试使该工具尽可能准确.因此,如果您虽然使用GetFileExtension却返回了文件扩展名,则如果用户已将其手动更改为其他扩展名,则它将不起作用.

因此,我想尝试使用文件签名从搜索中得出更准确的结果.因此,即使用户在文件名的末尾更改了扩展名,该工具仍将检测例如是否为.jpg.

现在,我在Google和其他各种论坛上搜索过高低,但我似乎找不到任何可能对我有帮助的信息.我不是最擅长编程的人.

有没有人有任何想法或示例代码,或者我可能会错过的任何网站链接,它们可能会为我指明正确的方向?

有人告诉我使用字节流?从来没有使用过...我比文件编程方面更了解文件签名及其工作原理等.

在此先多谢,我希望有人可以帮助我

非常感谢
Hyp

P.S.:



我希望使用文件签名而不使用GetFileExtension的原因是因为它旨在查找用户可能试图隐藏的文件.例如...用户可能正在其系统上存储电影或音乐,但将扩展名更改为.txt.我打算使用文件签名,以便显示这些文件.

我只是想弄清楚如何让我的程序读取文件签名,并将其存储在temporay数组中?然后将其与我保存在... xml文档中的文件签名的现有列表进行比较?还是数组...我正在考虑xml文档,因为如果以后有其他文件签名,我可以轻松地添加它们.

那有意义吗?我有点从设计的角度知道它..但是当涉及到编码时..im就像在没有钥匙的锁着的门上.不能去任何地方.

非常感谢您的快速回复. :)

问候,
Hyp

Hi Guys,

I am currently coding a search tool that searches for files stored in a directory. now, i am trying to make the tool as accurate as possible. so if u was to use GetFileExtension although it returns the file extension, if a user has manually changed the extension to something else, it doesnt work.

so i want to try and use File Signatures to form a more accurate result from the search. so even if the user has changed the extension at the end of the file name, the tool will still detect if its a .jpg or not for example.

now, i have searched high and low on google and various other forums, but i cant seem to find any kind of information that may help me. i am not the greatest at programming.

Does anyone have any ideas or sample codes or maybe any weblinks that i may have missed that may point me in the right direction?

I have been told to use a bytestream? never used before... I am more knowledgeable on a file signature and how it works, etc rather than the programming side of things.

many thanks in advance and i hope someone may be able to help me

Many thanks
Hyp

P.S.:

Hi,

the reason i wish to use file signatures rather than use the GetFileExtension is because its intended to locate files that a user may be trying to hide. for example... a user may be storing movies or music on their system but changing the extension to .txt. i am intending to use file signatures so it will show these files.

Im just trying to work out how i can get a program to read the file signature, store it in a temporay array? and then compare it to an existing list of file signatures i have saved in... an xml document? or an array... im thinking in an xml document because then if theres other file signatures at a later time, i can add them easily.

does that make sense? i kind of know it in the design point of view.. but when it comes to coding it.. im like at a locked door with no key. cant go anywhere.

many thanks for your fast reply Smile | :)

Regards,
Hyp

推荐答案

首先要了解的是:从文件系统的角度来看,没有扩展名(惊奇吗?).
旧系统使用了它们,但是很久以前.只有在Shell级别才能生存的扩展是文件类型"的特殊性,而文件类型"是扩展"和软件应用程序之间的关联.扩展名只是文件名的一部分(如Unix中的文件名).

顺便说一句,您还应该在搜索中支持Posix,并考虑硬链接和软链接(重新解析点).不考虑软链接可能会导致无限循环搜索(带有软链接的Posix文件系统不再是一棵树了,很奇怪!)

由于文件搜索与命令行管理程序无关,因此您永远不要使用扩展的概念.您应该使用通用口罩.例如,您应该能够使用掩码"my * .jp *".有关正确功能的良好示例,请参见Total Commander: http://en.wikipedia.org/wiki/Total_Commander [^ ].

不过,这是一个重要警告:请特别小心! System.IO.Directory.GetFiles并没有您所期望的那样工作!看到此问题的解释: Directory.Get.Files搜索模式问题 [ ^ ].

最后,您可以在软件中使用文件签名,但是没有一个通常可以接受的文件签名,不是它们被识别为文件系统.您只能单独存储签名或将其用于其他搜索条件.另外,请查看它在Total Commander中的工作方式(虽然没有按签名搜索).使用.NET,您可以使用可用的哈希函数( http://en.wikipedia.org/wiki/Cryptographic_hash_function [ ^ ])MD5(不建议用于安全性目的,请参见 http://en.wikipedia.org/wiki/MD5 [ http://en.wikipedia.org/wiki/SHA-2 [ ^ ]):请参见System.Security.Cryptography.MD5System.Security.Cryptography.SHA1System.Security.Cryptography.SHA256.


请记住,您不仅应使用"*"通配符,还应使用?".


关于后续问题:

如果要识别某种签名以检测文件类型,则不能使用它.文件系统被设计为将具有所有有效名称的所有文件视为相等.如果尝试根据名称或上下文中的部分对它们进行分类,则始终会出现假阳性和假阴性.

例如,存在可执行文件的签名:"MZ",或者很少有"ZM". Unicode文本文件可以包含BOM( http://unicode.org/faq/utf_bom.html [
另请参阅以下我的评论.

—SA
First thing to understand: from the file system''s stand point, there are no extensions (surprise?).
Old systems used them, but long time ago. The extensions only survive is the Shell level as a peculiarity of "file type" which is an association between "extension" and a software application. Extension here is just a part if file name (like in Unix).

By the way, you should also support Posix in your search and take into account hard links and soft links (re-parse points). Not taking soft links into account may lead to infinite circular search (Posix file system with soft links is not a tree anymore, surprise!)

As file search has nothing to do with the Shell, you should never use the notion of extension. You should use general-purpose masks. For example, you should be able to use mask "my*.jp*". For a good sample of right functionality, please see Total Commander: http://en.wikipedia.org/wiki/Total_Commander[^].

Here is a big warning though: be extra careful! System.IO.Directory.GetFiles does not work as you would expect! See this for explanation of the problem: Directory.Get.Files search pattern problem[^].

Finally, you can use file signatures in your software, but there is no one commonly acceptable file signatures, not they are recognized be the file system. You can only store signature separately or used them for additional search criteria. Also, see how it works in Total Commander (there is not search by signature though). With .NET you can use available hash functions (http://en.wikipedia.org/wiki/Cryptographic_hash_function[^]) MD5 (not recommended for security purpose, by the way, see http://en.wikipedia.org/wiki/MD5[^]) or SHA family (http://en.wikipedia.org/wiki/SHA-2[^]): see System.Security.Cryptography.MD5, System.Security.Cryptography.SHA1 and System.Security.Cryptography.SHA256.


Remember, you should use not just "*" wild card, also "?".


On the follow-up Question:

If you mean to recognize some signature to detect type of the file, it cannot be used. The file system is designed to treat all files with all valid names as equal. If you try to classify them based on name or part if its context, you will always get false positives and false negatives.

For example, there is a signature for executables: "MZ" or, rarely, "ZM". Unicode text file can contain BOM (http://unicode.org/faq/utf_bom.html[^]). There are also a variety of signatures for all those media containers, sound, image and video, already must less reliable.. None of that provided 100% reliable signature, by definition. You say, you want "a more accurate result from the search". An attempt to do any classification based on file content can only reduce your accuracy.

See also my comments below.

—SA


我在MSDN上看到了有关该主题的讨论:
检测文件类型 [ ^ ]

但是我不确定搜索工具中是否需要这样做.即使您以某种方式为您的工具编写了此功能,谁将使用它?我的意思是我将拥有此功能的强大能力吗?
抱歉,如果我不明白您的想法...
I saw the discussion about that on MSDN:
Detect file type[^]

But I''m not sure that this is necessary in search tools. Even if you somehow write this functionality for your tool, who will use it? I mean what is the great ability I will have with this functionality?
Sorry if I don''t understand you idea...


这篇关于使用文件签名的文件类型/扩展名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆