.NET对象列表中的正则表达式样式模式匹配 [英] Regex Style Pattern Matching in .NET Object Lists
问题描述
我收集了具有各种属性的.NET对象.可以说,这是遗传密码中的一连串染色体-尽管对象数据要比这复杂一些.我想在列表中搜索对象的预定义序列.我可以将对象定义为有限数量的感兴趣的独特类型.R,B,D,并在一个庞大的列表中,我想找到某些对象序列:
I've got a collection of .NET objects with various properties. Lets say its a chain of Chromosomes in a genetic code - although the objects data is a little more complex than that. I want to search the list for predefined sequences of objects. I can define objects as a finite number of unique types of interest. R,B,D and in a massive list I want to find certain sequences of objects:
一个大大简化的版本是:
A massively simplified version would be:
public class Chromosome {
public ChromosomeType CromosomeType {
get {
// Some logic that works out and returns the correct chromosome type
}
}
}
public enum ChromosomeType {
R, B, D
}
因此,考虑到这些类型的大量集合.我想匹配某些序列
So given a large collection of these types. I want to match certain sequences
例如"R + B {3} D +"
因此,在上面的正则表达式"中,以下子序列将在列表中匹配:$$$$ BBBDD
So in the "regex" above, the following subsequence would be matched in a list: RRRBBBDD
我需要能够从很长的对象列表中返回所有匹配项.
I need to be able to return all matches from a very long list of Objects.
显然正则表达式非常适合此操作,但实际上我没有字符串,我有对象集合.
Clearly regex is perfect for this, but I don't actually have strings, I've got collections of objects.
搜索对象集合以获取预定义序列的最佳方法是什么?
Whats the best way to search a collection of objects for predefined sequences?
更新
最后,我采用了柯林的解决方案.效果很好.我对其进行了更新,以便能够处理多个匹配项,并使用数组以使其尽可能快
Colin's solution is the one I went with in the end. It works great. I updated it to be able to handle multiple matches, and to use Arrays in order to be as fast as possible
这是最终的解决方案:
public static class ChromosomesExtensions
{
public static IEnumerable<Chromosome[]> FindBySequence(this Chromosome[] chromosomes, string patternRegex)
{
var sequenceString
= String.Join(
String.Empty, //no separator
(
from c in chromosomes
select c.CromosomeType.ToString()
)
);
MatchCollection matches = Regex.Matches(sequenceString, patternRegex);
foreach (Match match in matches)
{
Chromosome[] subset = new Chromosome[match.Value.Length];
var j = 0;
for (var i = match.Index; i < match.Index + match.Length; i++)
{
subset[j++] = chromosomes[i];
}
yield return subset;
}
}
}
[TestFixture]
public class TestClass
{
[Test]
public void TestMethod()
{
var chromosomes =
new[]
{
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 1},
new Chromosome(){ CromosomeType = ChromosomeType.R, Id = 2 },
new Chromosome(){ CromosomeType = ChromosomeType.R, Id = 3 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 4 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 5 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 6 },
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 7 },
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 8 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 9 },
new Chromosome(){ CromosomeType = ChromosomeType.R, Id = 10 },
new Chromosome(){ CromosomeType = ChromosomeType.R, Id = 11 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 12 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 13 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 14 },
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 15 },
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 16 },
new Chromosome(){ CromosomeType = ChromosomeType.R, Id = 17 },
new Chromosome(){ CromosomeType = ChromosomeType.R, Id = 18 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 19 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 20 },
new Chromosome(){ CromosomeType = ChromosomeType.B, Id = 21 },
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 22 },
new Chromosome(){ CromosomeType = ChromosomeType.D, Id = 23 },
};
var matchIndex = 0;
foreach (Chromosome[] match in chromosomes.FindBySequence("R+B{3}D+"))
{
Console.WriteLine($"Match {++matchIndex}");
var result = new String(match.SelectMany(x => string.Join("", $"id: {x.Id} Type: {x.CromosomeType.ToString()}\n")).ToArray());
Console.WriteLine(result);
}
}
}
输出:
Match 1
id: 2 Type: R
id: 3 Type: R
id: 4 Type: B
id: 5 Type: B
id: 6 Type: B
id: 7 Type: D
id: 8 Type: D
Match 2
id: 10 Type: R
id: 11 Type: R
id: 12 Type: B
id: 13 Type: B
id: 14 Type: B
id: 15 Type: D
id: 16 Type: D
Match 3
id: 17 Type: R
id: 18 Type: R
id: 19 Type: B
id: 20 Type: B
id: 21 Type: B
id: 22 Type: D
id: 23 Type: D
推荐答案
使用扩展方法(实际上支持通过正则表达式进行搜索)的一种简单,干净的方法.
A simple, clean way using extension methods (that actually supports searching via Regex).
课程:
public static class ChromosomesExtensions
{
public static IEnumerable<Chromosome> FindBySequence(this IEnumerable<Chromosome> chromosomes, string patternRegex)
{
var sequenceString
= String.Join(
String.Empty, //no separator
(
from c in chromosomes
select c.CromosomeType.ToString()
)
);
var match = Regex.Match(sequenceString, patternRegex);
//returns empty if no match is found
return chromosomes.ToList().GetRange(sequenceString.IndexOf(match.Value), match.Value.Length);
}
}
用法:
var chromosomes =
new[]
{
new Chromosome(){ CromosomeType = ChromosomeType.D },
new Chromosome(){ CromosomeType = ChromosomeType.R },
new Chromosome(){ CromosomeType = ChromosomeType.R },
new Chromosome(){ CromosomeType = ChromosomeType.B },
new Chromosome(){ CromosomeType = ChromosomeType.B },
new Chromosome(){ CromosomeType = ChromosomeType.B },
new Chromosome(){ CromosomeType = ChromosomeType.D },
new Chromosome(){ CromosomeType = ChromosomeType.D },
new Chromosome(){ CromosomeType = ChromosomeType.B },
};
var queryResult = chromosomes.FindBySequence("R+B{3}D+");
这篇关于.NET对象列表中的正则表达式样式模式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!