如何保持字符串数据,以通过StartsWith和EndsWith查询通过LINQ选择字符串列表时具有最佳性能 [英] How to keep string data for having best performance when selecting string list via LINQ by StartsWith and EndsWith queries

查看:696
本文介绍了如何保持字符串数据,以通过StartsWith和EndsWith查询通过LINQ选择字符串列表时具有最佳性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在的问题是很难。我有一个像下面这样的linq查询

Now the question is pretty hard. I have a linq queries like the way below

var lstSimilars = from similarWords in lstAllWords

where similarWords.StartsWith(srWordLocal)

select similarWords;

foreach (string srVar in lstSimilars)
{
string srTempWord = srVar.Replace(srWordLocal, "");
if (dtWords.ContainsKey(srTempWord) == true)
{
csWords.updateSplitWord(srWordLocal + ";" + srTempWord, dtWords[srVar]);
}
}

lstSimilars = from similarWords in lstAllWords

where similarWords.EndsWith(srWordLocal)

select similarWords;

foreach (string srVar in lstSimilars)
{
string srTempWord = srVar.Replace(srWordLocal, "");
if (dtWords.ContainsKey(srTempWord) == true)
{
csWords.updateSplitWord(srWordLocal + ";" + srTempWord, dtWords[srVar]);
}
}

现在 lstAllWords 是一个像下面生成的字符串列表变量

Now lstAllWords is a string list variable generated like the way below

        List<string> lstAllWords = new List<string>();


        for (int i = 0; i < dsWordsSplit.Tables[0].Rows.Count; i++)
        {
            lstAllWords.Add(dsWordsSplit.Tables[0].Rows[i]["Word"].ToString());
        }

我的问题是我应该如何保持Words数据以获得最佳的LINQ选择性能中。我的意思是目前我把它作为一个字符串列表。但是我可以保持不同的方式,并有更好的性能吗?

My question is how should i keep that Words data for having best LINQ selection performance. I mean currently i am keeping it as a string list. But can i keep it different way and have better performance ?

dtWords 是一个字典对象

C#C#-4.0 LINQ

C# C#-4.0 LINQ

推荐答案

/ em>查找以给定子字符串开头或结尾的单词,使用 SortedSet 将帮助您在 O(log(N))时间。

If all you want is efficiently finding words that start or end with given substring, employing the SortedSet will help you do that in O(log(N)) time.

这个想法是将文字放在两个 SortedSet s:

The idea is to put words in two SortedSets:


  • 一个用于原始单词和


玩具实施:

class WordSet {

    public WordSet(IEnumerable<string> words) {
        m_OriginalWords = new SortedSet<string>(words);
        m_ReverseWords = new SortedSet<string>(words.Select(ReverseString));
    }

    /// <summary>
    ///     Finds all words that start with given prefix.
    /// </summary>
    public IEnumerable<string> FindPrefix(string prefix) {
        return FindImp(m_OriginalWords, prefix);
    }

    /// <summary>
    ///     Finds all words that end with the given suffix.
    /// </summary>
    public IEnumerable<string> FindSuffix(string suffix) {
        return FindImp(m_ReverseWords, ReverseString(suffix)).Select(ReverseString);
    }

    static IEnumerable<string> FindImp(SortedSet<string> word_set, string s) {
        if (s.CompareTo(word_set.Max) <= 0) {
            foreach (string word in word_set.GetViewBetween(s, word_set.Max)) {
                if (!word.StartsWith(s))
                    break;
                yield return word;
            }
        }
    }

    static string ReverseString(string src) {
        return new string(src.Reverse().ToArray());
    }

    readonly SortedSet<string> m_OriginalWords;
    readonly SortedSet<string> m_ReverseWords;

}

class Program {

    static void TestImp(string s, IEnumerable<string> words) {
        Console.Write(s);
        foreach (var word in words) {
            Console.Write('\t');
            Console.Write(word);
        }
        Console.WriteLine();
    }

    static void TestPrefix(WordSet word_set, string prefix) {
        TestImp(prefix, word_set.FindPrefix(prefix));
    }

    static void TestSuffix(WordSet word_set, string suffix) {
        TestImp(suffix, word_set.FindSuffix(suffix));
    }

    static void Main(string[] args) {

        var word_set = new WordSet(
            new[] {
                "a",
                "b",
                "ba",
                "baa",
                "bab",
                "bba",
                "bbb",
                "bbc",
            }
        );

        Console.WriteLine("Prefixes:");
        TestPrefix(word_set, "a");
        TestPrefix(word_set, "b");
        TestPrefix(word_set, "ba");
        TestPrefix(word_set, "bb");
        TestPrefix(word_set, "bc");

        Console.WriteLine("\nSuffixes:");
        TestSuffix(word_set, "a");
        TestSuffix(word_set, "b");
        TestSuffix(word_set, "ba");
        TestSuffix(word_set, "bb");
        TestSuffix(word_set, "bc");

    }

}

Prefixes:
a       a
b       b       ba      baa     bab     bba     bbb     bbc
ba      ba      baa     bab
bb      bba     bbb     bbc
bc

Suffixes:
a       a       baa     ba      bba
b       b       bab     bbb
ba      ba      bba
bb      bbb
bc      bbc

如果您还必须搜索中缀,需要一个后缀树或数组,但这不是一个能够有效地正确实现的野餐。

If you have to search for infixes as well, then the above is not enough - you'll need a suffix tree or array, but this is no picnic implementing correctly and efficiently.

BTW,如果数据恰好在数据库中,你可以让DBMS做同样的事情:

BTW, If the data happens to be in the database, you can let the DBMS do essentially the same thing by:

  • creating a computed column or virtual column that is reverse of the original word column,
  • indexing both original word column and the reversed word column (or, alternatively, using a function-based index if that's what your DBMS supports).
  • querying for prefix as: ORIGINAL_WORD_COLUMN LIKE 'pefix%'
  • and for suffix as: REVERSED_WORD_COLUMN LIKE 'reversed_suffix%'.

这篇关于如何保持字符串数据,以通过StartsWith和EndsWith查询通过LINQ选择字符串列表时具有最佳性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆