无法获得正确的索引返回 [英] Can't get the proper Index to return

查看:26
本文介绍了无法获得正确的索引返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,首先我要感谢大家在过去几周对我的帮助很大,这是另一个!!!

Alright, so firstly I want to thank everyone for helping me so much in the last couple weeks, here's another one!!!

我有一个文件,我正在使用正则表达式来查找术语TamedName"出现的次数.这是最简单的部分:)

I have a file and I'm using Regex to find how many times the term "TamedName" comes up. That's the easy part :)

本来我是这样设置的

            StreamReader ff = new StreamReader(fileName);
        String D = ff.ReadToEnd();
        Regex rx = new Regex("TamedName");
        foreach (Match Dino in rx.Matches(D))
        {
            if (richTextBox2.Text == "")
                richTextBox2.Text += string.Format("{0} - {1:X} - {2}", Dino.Value, Dino.Index, ReadString(fileName, (uint)Dino.Index));
            else
                richTextBox2.Text += string.Format("\n{0} - {1:X} - {2}", Dino.Value, Dino.Index, ReadString(fileName, (uint)Dino.Index));
        }

它返回了完全不正确的索引点,如图所示

and it was returning completely incorrect index points, as pictured here

我相当有信心我知道为什么要这样做,可能是因为将所有内容从二进制文件转换为字符串,显然并非所有字符都将被转换,因此会忽略实际的索引计数,因此尝试将其联系起来返回根本不起作用......问题,我不知道如何将正则表达式与二进制文件一起使用并正确翻译:(

I'm fairly confident I know why it's doing this, probably because converting everything from a binary file to string, obviously not all the characters are going to translate, so that throws off the actual index count, so trying to relate that back doesn't work at all... The problem, I have NO clue how to use Regex with a binary file and have it translate properly :(

我使用正则表达式与简单的搜索功能,因为每次出现的TamedName"之间的差异太大而无法编码到函数中.

I'm using Regex vs a simple search function because the difference between each occurrence of "TamedName" is WAY too vast to code into a function.

真的希望你们能帮我解决这个问题:(我的想法快用完了!!

Really hope you guys can help me with this one :( I'm running out of ideas!!

推荐答案

问题在于您正在读取一个二进制文件,而 streamreader 在将其读入 Unicode 字符串时会进行一些解释.需要按字节处理.

The problem is that you are reading in a binary file and the streamreader does some interpretation when it reads it into a Unicode string. It needed to be dealt with as bytes.

我的代码如下.(仅供参考,您需要启用不安全编译来编译代码 - 这是为了快速搜索二进制数组)

My code is below.(Just as an FYI, you will need to enable unsafe compilation to compile the code - this was to allow a fast search of the binary array)

只是为了适当的归属,我从这个 SO answer 借用了 IndexOf 的字节版本迪伦·尼科尔森

Just for proper attribution, I borrowed the byte version of IndexOf from this SO answer by Dylan Nicholson

namespace ArkIndex
{
    class Program
    {
        static void Main(string[] args)
        {
            string fileName = "TheIsland.ark";
            string searchString = "TamedName";
            byte[] bytes = LoadBytesFromFile(fileName);
            byte[] searchBytes = System.Text.ASCIIEncoding.Default.GetBytes(searchString);

            List<long> allNeedles = FindAllBytes(bytes, searchBytes);    
        }

        static byte[] LoadBytesFromFile(string fileName)
        {
            FileStream fs = new FileStream(fileName, FileMode.Open);
            //BinaryReader br = new BinaryReader(fs);
            //StreamReader ff = new StreamReader(fileName);

            MemoryStream ms = new MemoryStream();
            fs.CopyTo(ms);
            fs.Close();
            return ms.ToArray();   
        }

        public static List<long> FindAllBytes(byte[] haystack, byte[] needle)
        {
            long currentOffset = 0;
            long offsetStep = needle.Length;
            long index = 0;
            List<long> allNeedleOffsets = new List<long>();
            while((index = IndexOf(haystack,needle,currentOffset)) != -1L)
            {
                allNeedleOffsets.Add(index);
                currentOffset = index + offsetStep;
            }
            return allNeedleOffsets;
        }

        public static unsafe long IndexOf(byte[] haystack, byte[] needle, long startOffset = 0)
        {
            fixed (byte* h = haystack) fixed (byte* n = needle)
            {
                for (byte* hNext = h + startOffset, hEnd = h + haystack.LongLength + 1 - needle.LongLength, nEnd = n + needle.LongLength; hNext < hEnd; hNext++)
                    for (byte* hInc = hNext, nInc = n; *nInc == *hInc; hInc++)
                        if (++nInc == nEnd)
                            return hNext - h;
                return -1;
            }
        }    
    }
}

这篇关于无法获得正确的索引返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆