解析重复行的特定实例的定界数据 [英] Parsing delimited data for specific instance of repeated line
问题描述
我有以下格式的字符串数组,其中每个字符串都以一系列三个字符开头,指示其包含的数据类型.例如:
I have an array of strings in the following format, where each string begins with a series of three characters indicating what type of data it contains. For example:
ABC | .....
DEF | ...
RHG | 1 ........
RHG | 2 ........
RHG | 3 ........
XDF | ......
ABC|.....
DEF|...
RHG|1........
RHG|2........
RHG|3........
XDF|......
我想找到任何重复的行(在此示例中为RHG),并用特殊字符标记最后一行:
I want to find any repeating lines (RHG in this example) and mark the last line with a special character:
> RHG | 3 .........
>RHG|3.........
执行此操作的最佳方法是什么?我当前的解决方案提供了一种方法来计算行标题,并创建具有标题计数的字典.
What's the best way to do this? My current solution has a method to count the line headers and create a dictionary with the header counts.
protected Dictionary<string, int> CountHeaders(string[] lines)
{
Dictionary<string, int> headerCounts = new Dictionary<string, int>();
for (int i = 0; i < lines.Length; i++)
{
string s = lines[i].Substring(0, 3);
int value;
if (headerCounts.TryGetValue(s, out value))
headerCounts[s]++;
else
headerCounts.Add(s, 1);
}
return headerCounts;
}
在主要解析方法中,我选择重复的行.
In the main parsing method, I select the lines that are repeated.
var repeats = CountHeaders(lines).Where(x => x.Value > 1).Select(x => x.Key);
foreach (string s in repeats)
{
// Get last instance of line in lines and mark it
}
据我所知.我想我可以用另一个LINQ查询做我想做的事,但我不太确定.另外,我不禁感到有一个更好的解决方案.
This is as far as I've gotten. I think I can do what I want with another LINQ query but I'm not too sure. Also, I can't help but feel that there's a more optimal solution.
推荐答案
您可以使用LINQ来实现.
You can use LINQ to achieve that.
输入字符串:
var input = @"ABC|.....
DEF|...
RHG|1........
RHG|2........
RHG|3........
XDF|......";
LINQ
查询:
var results = input.Split(new[] { Environment.NewLine })
.GroupBy(x => x.Substring(0, 3))
.Select(g => g.ToList())
.SelectMany(g => g.Count > 1 ? g.Take(g.Count - 1).Concat(new[] { string.Format(">{0}", g[g.Count - 1]) }) : g)
.ToArray();
在以后的查询步骤中,我使用Select(g => g.ToList())
投影进行g.Count
O(1)操作.
I used Select(g => g.ToList())
projection to make g.Count
O(1) operation in further query steps.
您可以使用String.Join
方法将Join
结果数组分成一个字符串:
You can Join
result array into one string using String.Join
method:
var output = String.Join(Environment.NewLine, results);
这篇关于解析重复行的特定实例的定界数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!