删除字符串中分隔符之间的文本(使用正则表达式?) [英] Remove text in-between delimiters in a string (using a regex?)
问题描述
考虑需要找到一对匹配的字符集,并删除它们之间的任何字符,以及那些字符/分隔符.
Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well as those characters/delimiters.
这里是分隔符集:
[] square brackets
() parentheses
"" double quotes
'' single quotes
以下是一些应该匹配的字符串示例:
Here are some examples of strings that should match:
Given: Results In:
-------------------------------------------
Hello "some" World Hello World
Give [Me Some] Purple Give Purple
Have Fifteen (Lunch Today) Have Fifteen
Have 'a good'day Have day
以及一些不应该匹配的字符串示例:
And some examples of strings that should not match:
Does Not Match:
------------------
Hello "world
Brown]co[w
Cheese'factory
如果给定的字符串不包含一组匹配的分隔符,则不会对其进行修改.输入字符串可能有许多匹配的分隔符对.如果一组 2 个分隔符重叠(即 he[llo "worl]d"
),那将是我们可以忽略的边缘情况.
If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e. he[llo "worl]d"
), that'd be an edge case that we can ignore here.
算法看起来像这样:
string myInput = "Give [Me Some] Purple (And More) Elephants";
string pattern; //some pattern
string output = Regex.Replace(myInput, pattern, string.Empty);
问题:您将如何使用 C# 实现这一目标?我倾向于使用正则表达式.
Question: How would you achieve this with C#? I am leaning towards a regex.
奖励: 是否有简单的方法来匹配常量或某种列表中的开始和结束分隔符?我正在寻找的解决方案很容易更改分隔符,以防业务分析师提出新的分隔符集.
Bonus: Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.
推荐答案
简单的正则表达式是:
string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\[.*\])|(".*")|('.*')|(\(.*\))";
string output = Regex.Replace(input, regex, "");
至于以自定义方式构建正则表达式,您只需要构建部分:
As for doing it a custom way where you want to build up the regex you would just need to build up the parts:
('.*') // example of the single quote check
然后将每个单独的正则表达式部分与一个 OR(正则表达式中的 |)连接,就像我原来的例子一样.构建正则表达式字符串后,只需运行一次即可.关键是让正则表达式成为单一检查,因为对一个项目执行多个正则表达式匹配,然后遍历大量项目可能会导致性能显着下降.
Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.
在我的第一个示例中,它将代替以下行:
In my first example that would take the place of the following line:
string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");
我相信有人会发布一个很酷的 linq 表达式来基于要匹配的分隔符对象数组或其他东西来构建正则表达式.
I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.
这篇关于删除字符串中分隔符之间的文本(使用正则表达式?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!