我如何(快)发现在C#中最长的匹配串/。NET [英] How do I (quickly) find the longest matching string in C#/.Net

查看:120
本文介绍了我如何(快)发现在C#中最长的匹配串/。NET的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要做一些查找操作对项目的集合。

I need to do some look-up operations against a collection of items.

首先,我要看看是否有直接的匹配。这是很简单,因为我有一个词典<的条目;弦乐,MyObjectType> ,所以我可以去词典[valuetofind]

First I need to see if there is a direct match. This is quite simple as I have the entries in a Dictionary<String,MyObjectType>, so I can just go dictionary["valuetofind"].

然而,如果没有直接匹配,那么我需要做一个开始,与之相匹配的,但它必须是最长匹配返回:

If however there is no direct match, then I need to do a starts-with match, but it has to be the longest match that is returned:

记录的例子:

String   Record
0        A
01       B
012      D
02       B
03       C

查询例子:

Query         Result 
0             A    - Because 0   is the longest match
01            B    - Because 01  is the longest match
023456        B    - Because 02  is the longest match
012           D    - Because 012 is the longest match
0123456       D    - Because 012 is the longest match
03456         C    - Because 03  is the longest match
04            A    - Because 0   is the longest match
0456          A    - Because 0   is the longest match
1             Null - No Match

是否有框架类,在后台执行散列或树形结构做这样的事情,还是我我需要写自己的东西?

Are there classes in the framework that have hashes or tree structures in the background implementation for doing something like this, or am I needing to write something myself?

修改
我至今是模式字符串的长度排序列表,然后我去了的条目逐一查看该查询与记录开始。该工程确定为大多数的情况下,因为我们没有大型列表(还),但确实有在没有比赛的情况下,昂贵的费用。

EDIT What I have so far is the list sorted by length of the pattern string and then I go over the entries one by one to see if the query starts with the record. This works OK for most of the situations as we do not have large lists (yet), but does have an expensive cost for the situations where there is no match.

我缺乏词汇来获得谷歌给我,不要与哈希集,列表和字典的页面。所有的研究,我发现点,基于树的结构,但没有指出,如果已经有.NET框架或不实现。

I lack the vocabulary to get google to give me pages not relating to hash-sets, lists and dictionaries. All the research I found points at tree based structures, but none point out if there is already an implementation in the .NET Framework or not.

推荐答案

Leppie和富豪是正确的;你想实现如果数据集变大有效地解决这个问题的数据结构是一个线索,或者,如果你真的迷,一个DAWG - 有向无环字图。一个DAWG有如果字符串有很多通用的后缀,但他们更昂贵,难以建立和更新,因此具有字典树开始更好的内存性能。

Leppie and Spender are correct; the data structure you want to implement to solve this problem efficiently if the data set becomes large is a "trie", or, if you're really buff, a DAWG -- a directed acyclic word graph. A DAWG has better memory performance if the strings have many common suffixes but they are more expensive and difficult to build and update, so start with a trie.

您简单的例子会做一个线索,看起来像这样:

Your simple case would make a trie that looks like this:

           ROOT
            |
           0|
            |
            A
          / | \
         /  |  \
       1/  2|  3\
       /    |    \
      /     |     \
     B      B      C
     |
    2|
     |
     D



所以要查找023456,您从根开始,往下走分支标记0找到A,然后再往分支2找到b,有在这一点上没有分支3,这样就大功告成了。

So to look up 023456, you start at the root, go down branch labelled 0 to find A, then go down branch 2 to find B, there is no branch 3 at that point, so you're done.

顺便说一句,这也是数据结构,你会用它来找到给定的字典和一组字母最长的拼字游戏字;它本质上是同样的问题。

Incidentally, this is also the data structure you'd use to find the longest Scrabble word given a dictionary and a set of letters; it's essentially the same problem.

有内置在.NET框架中并没有特里的数据结构,但它不是一个困难的数据结构来构建。我有一个不变的线索趴在这里的地方,我一直在博客;如果我这样做,我会在这里发布的链接。

There's no trie data structure built into the .NET framework, but it is not a difficult data structure to build. I've got an immutable trie lying around here somewhere that I've been meaning to blog about; if I ever do, I'll post a link here.

这篇关于我如何(快)发现在C#中最长的匹配串/。NET的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆