读取文件签名并区分 zip 文件和 docx 文件 [英] Reading a file signature and telling the difference between a zip file and a docx file

查看:24
本文介绍了读取文件签名并区分 zip 文件和 docx 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个上传例程,我将前几个字节读入数组并将其转换为十六进制字符串以获取文件签名.

I have an upload routine where I read the first few bytes into an array and convert it to a hex string to get the file signature.

我一直在将前 4 个字节读入数组,一切似乎都很顺利,直到我遇到了 .zip 文件和 .docx 文件的问题.它们的前 4 个字节都具有相同的签名:50-4b-03-04".

I have been reading the first 4 bytes into the array and everything seemed to be going fine until I ran across a problem with a .zip file and a .docx file. They both have the same signature in the first 4 bytes: "50-4b-03-04".

所以我查看了下一个字节,对于 .docx,它是14";但它也存在于一些 .zip 文件中.我查了一下这个文件签名,发现这个序列适用于很多文件类型,包括 JAR、ZIP、DOCX、XSLX 和 Open Office 文档.

So I looked at the next byte and for .docx it is "14" but it was on some .zip files as well. I got looked up this file signature and found this sequence is for a lot of file types including JAR, ZIP, DOCX, XSLX, and Open Office documents.

有谁知道读取文件签名并准确确定文件类型的好方法吗?Windows 如何知道其中的区别?它必须不仅仅是前 4 个字节.我希望读取文件上传的文件签名,以确保只允许上传批准的文件类型.

Does anyone know of a good way to read the file signature and determine the file type accurately? How does Windows know the difference? It has to be more than just the first 4 bytes. I'm looking to read the file signatures for file uploads to ensure only approved file types are allowed to be uploaded.

推荐答案

我所做的是将文件签名放入数据库,放入文件类型的签名长度和扩展名.如果文件没有扩展名,则不会上传.如果文件扩展名已从签名更改,则例程将拒绝该文件.这是提取签名并进行比较的例程中的代码:

What I did was put the file signatures into a database, put the signature length of file type and the extension. If the file doesn't have an extension, it isn't uploaded. If the file extension has changed from the signature, the routine will reject the file. Here is the code in the routine that pulls the signatures and does a compare:

using var fileStream = file.OpenReadStream();
var signature = _context.FileSignatures.Select(f => new { f.FileSignature, f.AllowedFileType.FileExtension, f.SignatureLength })
                                       .Where(x => x.FileExtension == fileType);

byte[] bytes = new byte[signature.Max(x => x.SignatureLength)];
fileStream.Read(bytes, 0, signature.Max(x => x.SignatureLength));

string hexData = BitConverter.ToString(bytes);
var foundFile = await signature.FirstAsync(x => x.FileSignature == hexData);

return foundFile.FileExtension;

文件签名像这样存储在表中:

File signatures are stored in the table like this:

File Extension           FileSignature        SignatureLength
.PDF                     25-50-44-46          4

这样我就可以确保读取签名的最大字节数并获得扩展名.如果我想包含更多文件,我只需将它们添加到数据库中即可.

This way I can make sure the read the max number of bytes for the signature and get the extension. If I want to include more files, I just add them to the database.

这篇关于读取文件签名并区分 zip 文件和 docx 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆