(C#)提高自定义getBetweenAll的速度 [英] (C#) Improving speed of custom getBetweenAll

查看:54
本文介绍了(C#)提高自定义getBetweenAll的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在c#中编写了一个自定义扩展方法,它是对扩展方法string [] getBetweenAll(string source,string startstring,string endstring)的改进.

I've written a custom extension method in c# that is an improvement of the extensionmethod string[] getBetweenAll(string source, string startstring, string endstring);

最初,此扩展方法找到了两个字符串之间的所有子字符串,例如:

Originally this extensionmethod found all substrings between two strings, for example:

string source = "<1><2><3><4>";
source.getBetweenAll("<", ">");
//output: string[] {"1", "2", "3", "4"}

但是如果您再次发生<一开始它只会介于整个字符串之间

But if you had another occurrence of < in the beginning it would just get between that and the whole string

string source = "<<1><2><3><4>";
source.getBetweenAll("<", ">");
//output: string[] {"<1><2><3><4"}

因此,我将其重新写得更精确些,并从>"向后搜索以找到<"

So I re-wrote it to be more exact and search backwards from ">" to find the first occurrence of "<"

现在我可以使用它了,但是这里的问题是它太慢了,因为搜索方法每次出现时都会跳过整个字符串的每个字符.您知道如何提高此功能的速度吗?还是不可能?

Now I got it working, but the problem here is that it is way too slow because the search method skips back every character of the whole string for each occurrence. Do you know how I could improve the speed of this function? Or is it not possible?

这是到目前为止的完整代码 http://pastebin.com/JEZmyfSG 我在需要改进代码的地方添加了注释

Here is the entire code so far http://pastebin.com/JEZmyfSG I've added comments where the code needs speed improvement

public static List<int> IndexOfAll(this string main, string searchString)
{
    List<int> ret = new List<int>();
    int len = searchString.Length;
    int start = -len;
    while (true)
    {
        start = main.IndexOf(searchString, start + len);
        if (start == -1)
        {
            break;
        }
        else
        {
            ret.Add(start);
        }
    }
    return ret;
}

public static string[] getBetweenAll(this string main, string strstart, string strend, bool preserve = false)
{
    List<string> results = new List<string>();
    List<int> ends = main.IndexOfAll(strend);
    foreach (int end in ends)
    {
        int start = main.previousIndexOf(strstart, end);  //This is where it has to search the whole source string every time
        results.Add(main.Substring(start, end - start) + (preserve ? strend : string.Empty));
    }
    return results.ToArray();
}

//This is the slow function (depends on main.Length)
public static int previousIndexOf(this string main, string find, int offset)
{
    int wtf = main.Length ;
    int x = main.LastIndexOf(find, wtf);
    while (x > offset)
    {
        x = main.LastIndexOf(find, wtf);
        wtf -= 1;
    }
    return x;
}

我想另一种做PreviousIndexOf(string,int searchfrom)的方法;这样可以提高速度.像IndexOf()一样,除了向后并具有提供的起始偏移量

推荐答案

作为原始的GetBetweenAll,我们可以使用正则表达式.为了仅匹配封闭字符串的最短内部"外观,我们必须在起始字符串上使用负前瞻,对内容使用非贪婪量词.

As the original GetBetweenAll, we can use a regular expression. To match only the shortest "inner" appearances of the enclosing strings, we have to use a negative lookahead on the start string and a non-greedy quantifier for the content.

public static string[] getBetweenAll(this string main, 
    string strstart, string strend, bool preserve = false)
{
    List<string> results = new List<string>();

    string regularExpressionString = string.Format("{0}(((?!{0}).)+?){1}", 
        Regex.Escape(strstart), Regex.Escape(strend));
    Regex regularExpression = new Regex(regularExpressionString, RegexOptions.IgnoreCase);

    var matches = regularExpression.Matches(main);

    foreach (Match match in matches)
    {
        if (preserve)
        {
            results.Add(match.Value);
        }
        else
        {
            results.Add(match.Groups[1].Value);
        }
    }

    return results.ToArray();
}

这篇关于(C#)提高自定义getBetweenAll的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆