C#的string.IndexOf如何这么快地执行,比普通for循环查找快10倍? [英] How can C#'s string.IndexOf perform so fast, 10 times faster than ordinary for loop find?

查看:312
本文介绍了C#的string.IndexOf如何这么快地执行,比普通for循环查找快10倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常长的字符串(大小为 60MB ),我需要在其中查找多少对<"和>"都在里面.

I have a very long string (60MB in size) in which I need to find how many pairs of '<' and '>' are in there.

我首先尝试了自己的方式:

I have first tried my own way:

        char pre = '!';
        int match1 = 0;
        for (int j = 0; j < html.Length; j++)
        {
            char c = html[j];
            if (pre == '<' && c == '>') //find a match
            {
                pre = '!';
                match1++;
            }
            else if (pre == '!' && c == '<')
                pre = '<';
        }

上面的代码在我的字符串上运行大约 1000毫秒.

The above code runs on my string for roughly 1000 ms.

然后我尝试使用string.IndexOf

        int match2 = 0;
        int index = -1;
        do
        {
            index = html.IndexOf('<', index + 1);
            if (index != -1) // find a match
            {
                index = html.IndexOf('>', index + 1);
                if (index != -1)
                   match2++;
            }
        } while (index != -1);

上面的代码仅运行 150毫秒.

我想知道使string.IndexOf如此快地运行 的魔力是什么?

I am wondering what is the magic that makes string.IndexOf runs so fast?

任何人都可以启发我吗?

Anyone can inspire me?

修改

好吧,根据@BrokenGlass的回答.我以不检查配对的方式修改了代码,而是检查了多少<"在字符串中.

Ok, according to @BrokenGlass's answer. I modified my code in the way that they don't check the pairing, instead, they check how many '<' in the string.

        for (int j = 0; j < html.Length; j++)
        {
            char c = html[j];
            if (c == '>')
            {
                match1++;
            }
        }

上面的代码运行大约 760毫秒.

使用IndexOf

        int index = -1;
        do
        {
            index = html.IndexOf('<', index + 1);
            if (index != -1)
            {
                match2++;
            }
        } while (index != -1);

上面的代码运行大约 132毫秒. 仍然非常快.

The above code runs for about 132 ms. still very very fast.

编辑2

在阅读@Jeffrey Sax的评论后,我意识到我是在VS的Debug模式下运行的.

After read @Jeffrey Sax comment, I realised that I was running in VS with Debug mode.

然后我以发布模式构建并运行,好吧,IndexOf仍然更快,但不再更快.

Then I built and ran in release mode, ok, IndexOf is still faster, but not that faster any more.

以下是结果:

对于配对计数: 207ms VS 144ms

对于正常的一个字符数: 141ms VS 111ms .

For the normal one char count: 141ms VS 111ms.

我自己的代码的性能确实得到了改善.

My own codes' performance was really improved.

经验教训:当您进行基准测试时,请在发布模式下进行!

推荐答案

您是否正在Visual Studio中运行计时?如果是这样,仅出于这个原因,您的代码运行速度就会大大降低.

Are you running your timings from within Visual Studio? If so, your code would run significantly slower for that reason alone.

除此之外,您在某种程度上还比较了苹果和橙子.两种算法的工作方式不同.

Aside from that, you are, to some degree, comparing apples and oranges. The two algorithms work in a different way.

IndexOf版本在寻找仅用 的右方括号和只用 的右方括号之间切换.您的代码将遍历整个字符串,并保留一个状态标志,该标志指示它是在寻找开括号还是右括号.这需要更多的工作,并且预计会更慢.

The IndexOf version alternates between looking for an opening bracket only and a closing bracket only. Your code goes through the whole string and keeps a status flag that indicates whether it is looking for an opening or a closing bracket. This takes more work and is expected to be slower.

以下是一些代码,它们的比较方式与您的IndexOf方法相同.

Here's some code that does the comparison the same way as your IndexOf method.

int match3 = 0;
for (int j = 0; j < html.Length; j++) {
    if (html[j] == '<') {
        for (; j < html.Length; j++)
            if (html[j] == '>')
                match3++;
    }
}

在我的测试中,这实际上比IndexOf方法快大约3倍.原因?字符串实际上并不像单个字符的序列那么简单.有标记,重音符号等.String.IndexOf可以正确处理所有这些额外的复杂性,但这是有代价的.

In my tests this is actually about 3 times faster than the IndexOf method. The reason? Strings are actually not quite as simple as sequences of individual characters. There are markers, accents, etc. String.IndexOf handles all that extra complexity properly, but it comes at a cost.

这篇关于C#的string.IndexOf如何这么快地执行,比普通for循环查找快10倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆