C#二进制文件查找字符串 [英] C# binary files finding strings

查看:80
本文介绍了C#二进制文件查找字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我有一些数据文件(〜2Mb大小),其中包含以小端字节序和大端字节序混合编码的数据.数据块由用作数据字段标签的字符串(UTF8)分隔.由于.NET的二进制阅读器不支持此类混合数据流,因此我尝试实现自定义阅读器.为了找到特定数据字段的偏移量,我尝试了如下操作:

Hello All,

I have some data files (~2Mb size) which contain data encoded with mixed little- and big-endian byte-ordering. Data-Chunks are delimited by Strings (UTF8) which act as labels for data fields. Since .NET''s binary reader doesn''t support such mixed data streams I have tried to implement a custom reader. To find the offset to a particular data-field I have tried the some thing like the following:

byte[] byteBuffer = File.ReadAllBytes("SomeFilePath");
string byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer);
Int32 offset1 = byteBufferAsString.IndexOf("StringToFind");



但是,这似乎有可变的结果.有时,偏移值正好指向缓冲区中StringToFind文本的起始位置,而有时偏移量将指向实际起始位置之前的两个字节,即指向一个Int16,该Int16指示紧随其后的字符串的字节长度.

有没有人有过类似的经历?否则,有人对处理二进制文件和搜索字符串位置有什么建议吗?

cheers



However this seems to have variable results. Sometimes the offset value point exactly to the start-position of the StringToFind text in the buffer and other times it will point two bytes in front of the actual start position i.e. pointing to a Int16 which indicates the byte-length of string immediately following.

Has anyone had similar experience? Otherwise does anyone have any advice for working with binary-files and searching for string positions?

cheers

推荐答案

我认为这一步
I think this step
报价:

字符串byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer)

string byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer)

较弱.

相反,您应该执行相反的操作:获取代表搜索字符串的字节数组,然后在数据缓冲区内进行搜索.

is weak.

You should instead do the opposite: get the array of bytes representing the search string and search it inside the data buffer.


您需要以二进制形式搜索UTF-8字符串.这样的东西(未经测试):

You need to search the UTF-8 string as binary. Something like this (not tested):

byte[] ByteBuffer = File.ReadAllBytes("SomeFilePath");
byte[] StringBytes = Encoding.UTF8.GetBytes("StringToFind");
for (i = 0; i <= (ByteBuffer.Length - StringBytes.Length); i++)
{
    if (ByteBuffer[i] == StringBytes[0])
    {
        for (j = 1; j < StringBytes.Length && ByteBuffer[i + j] == StringBytes[j]; j++) ;
        if (j == StringBytes.Length)
            Console.WriteLine("String was found at offset {0}", i);
    }
}



请注意,这是区分大小写的搜索!



Please note that this is a case-sensitive search!


好吧,我没有这种经验,只是因为我完全避免处理极端愚蠢,所以我唯一的建议是:放弃;处理所有以这种怪异方式构造的数据的软件,并确保您将来避免此类谬论;编写全新的软件,它将使用一些合理的持久性;并节省大量时间和精力.如果此建议似乎不适合您,非常欢迎您自行解决问题".

我严重怀疑您会从有类似经验"的人那里获得更好的建议.一些经验并没有真正的帮助.

—SA
Well, I don''t have such experience, just because I thoroughly avoid dealing with extreme stupidities, so my only advice would be: give up; through out all software dealing with the data structured in this weird way and make sure you prevent such fallacies in future; write brand new software, which will use some reasonable persistence; and save huge amount of time and nerve. If this advice seems to be not suitable for you, you are very welcome to ram the "problem" on your own.

I seriously doubt you can get better advice from anyone who "had similar experience". Some experiences are not really helpful.

—SA


这篇关于C#二进制文件查找字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆