快速自定义字符串分割 [英] fast custom string splitting

查看:168
本文介绍了快速自定义字符串分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写一个自定义字符串分割。这将在拆分点()不是由奇数反斜杠的前面( \ )。

I am writing a custom string split. It will split on a dot(.) that is not preceded by an odd number of backslashes (\).

«string» -> «IEnemerable<string>»
"hello.world" -> "hello", "world"
"abc\.123" -> "abc\.123"
"aoeui\\.dhtns" -> "aoeui\\","dhtns"



我想知道是否有一个子这将重新使用原来的字符串(速度),或者是有一个现有的拆分,可以快速做到这一点?

I would like to know if there is a substring that will reuse the original string (for speed), or is there an existing split that can do this fast?

这是我,但比 input.Split慢2-3倍('。') //其中输入字符串。 (我知道这是一个(稍微复杂的问题,但没有那么多)

This is what I have but is 2—3 times slower than input.Split('.') //where input is a string. (I know it is a (slightly more complex problem, but not that much)

    public IEnumerable<string> HandMadeSplit(string input)
    {
        var Result = new LinkedList<string>();
        var word = new StringBuilder();
        foreach (var ch in input)
        {
            if (ch == '.')
            {
                Result.AddLast(word.ToString());
                word.Length = 0;
            }
            else
            {
                word.Append(ch);
            }
        }
        Result.AddLast(word.ToString());
        return Result;
    }






它现在使用列表,而不是链表和记录的开始,子端,并使用string.substring创建新的字符串。这确实很多,是几乎一样快string.split,但我已经加入我的调整。(将添加代码)


It now uses List instead of LinkedList, and record beginning and end of substring and use string.substring to create the new substrings. This does a lot and is nearly as fast as string.split but I have added my adjustments. (will add code)

推荐答案

这告诉你是正确的做法,如果你需要性能的循环。 (正则表达式的不会的定)。

The loop that you show is the right approach if you need performance. (Regex wouldn't be).

切换到基于指数环。记得比赛开始的索引。不要追加单个字符。相反,记住的字符范围复制出来,做每个项目一个子串电话。

Switch to an index-based for-loop. Remember the index of the start of the match. Don't append individual chars. Instead, remember the range of characters to copy out and do that with a single Substring call per item.

另外,不使用的LinkedList 。它比慢列表除外随机存取突变的几乎所有情况。

Also, don't use a LinkedList. It is slower than a List for almost all cases except random-access mutations.

您也可以切换从列表来,你与 Array.Resize 。这导致稍有繁琐的代码(因为你已经内联列表类的一部分变成你的方法),但它削减了一些小的开销。

You might also switch from List to a normal array that you resize with Array.Resize. This results in slightly tedious code (because you have inlined a part of the List class into your method) but it cuts out some small overheads.

接下来,不返回的IEnumerable ,因为通过间接访问其项目时,强制调用者。返回一个列表或数组。

Next, don't return an IEnumerable because that forces the caller through indirection when accessing its items. Return a List or an array.

这篇关于快速自定义字符串分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆