C# Regex.Split:删除空结果 [英] C# Regex.Split: Removing empty results
问题描述
我正在开发一个导入数千行的应用程序,其中每一行的格式如下:
I am working on an application which imports thousands of lines where every line has a format like this:
|* 9070183020 |04.02.2011 |107222 |M/S SUNNY MEDICOS |GHAZIABAD | 32,768.00 |
我正在使用以下 Regex
将行拆分为我需要的数据:
I am using the following Regex
to split the lines to the data I need:
Regex lineSplitter = new Regex(@"(?:^|*||)s*(.*?)s+(?=|)");
string[] columns = lineSplitter.Split(data);
foreach (string c in columns)
Console.Write("[" + c + "] ");
这给了我以下结果:
[] [9070183020] [] [04.02.2011] [] [107222] [] [M/S SUNNY MEDICOS] [] [GHAZIABAD] [] [32,768.00] [|]
现在我有两个问题.
<强>1.如何删除空结果.我知道我可以使用:
Now I have two questions.
1. How do I remove the empty results. I know I can use:
string[] columns = lineSplitter.Split(data).Where(s => !string.IsNullOrEmpty(s)).ToArray();
但是是否有任何内置方法可以删除空结果?
<强>2.如何移除最后一个管道?
谢谢你的帮助.
问候,
约格什.
but is there any built in method to remove the empty results?
2. How can I remove the last pipe?
Thanks for any help.
Regards,
Yogesh.
我想我的问题有点被误解了.这从来都不是关于我怎么做.这只是关于如何通过更改上述代码中的Regex
来做到这一点.
I think my question was a little misunderstood. It was never about how I can do it. It was only about how can I do it by changing the Regex
in the above code.
我知道我可以通过多种方式做到这一点.我已经用上面提到的带有 Where
子句的代码和另一种方式完成了它,该方式也更快(超过两倍):
I know that I can do it in many ways. I have already done it with the code mentioned above with a Where
clause and with an alternate way which is also (more than two times) faster:
Regex regex = new Regex(@"(^|*s*)|(s*|s*)");
data = regex.Replace(data, "|");
string[] columns = data.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
其次,作为测试用例,我的系统在原始方法中可以在不到 1.5 秒的时间内解析 92k+ 这样的行,在第二种方法中不到 700 毫秒,我永远不会在实际中找到超过几千情况,所以我认为我不需要考虑这里的速度.在我看来,在这种情况下考虑速度是过早优化.
Secondly, as a test case, my system can parse 92k+ such lines in less than 1.5 seconds in the original method and in less than 700 milliseconds in the second method, where I will never find more than a couple of thousand in real cases, so I don't think I need to think about the speed here. In my opinion thinking about speed in this case is Premature optimization.
我找到了第一个问题的答案:它不能用 Split
来完成,因为没有内置这样的选项.
I have found the answer to my first question: it cannot be done with Split
as there is no such option built in.
仍在寻找我的第二个问题的答案.
Still looking for answer to my second question.
推荐答案
Regex lineSplitter = new Regex(@"[s**]*|[s**]*");
var columns = lineSplitter.Split(data).Where(s => s != String.Empty);
或者你可以简单地做:
string[] columns = data.Split(new char[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string c in columns) this.textBox1.Text += "[" + c.Trim(' ', '*') + "] " + "
";
不,没有选项可以像 String.Split 一样删除 RegEx.Split 的空条目.
And no, there is no option to remove empty entries for RegEx.Split as is for String.Split.
您也可以使用火柴.
这篇关于C# Regex.Split:删除空结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!