正则表达式拆分特定字模式的字符串 [英] Regex Split String at particular word pattern
问题描述
我想分割一个字符串,如下所示:
国际复兴开发银行(NAICS:928120; SIC :6081)世界银行(NAICS:928120; SIC:6081)
加入
国际复兴开发银行
世界银行
/ p>
国际复兴开发银行
国际复兴开发银行(SIC:6081)
国际复兴银行&Development(NAICS:928120)
添加到此
国际复兴开发银行
可以有任何数量的比赛。
我尝试过几个事情,使用负字符类不起作用:
[^ \ NAICS:(\d +); \)] +
我使用的是C#Regex。 / p>
如果你只是想要一个正则表达式拆分这可能工作 \([^)] * (?:(?:SIC | NAICS):[^)] *)+ \)
我将采用find_all正则表达式方法。
(?! \s * $)(。*?) \\([^)] *(?:(?:SIC | NAICS):[^)] *)+ \)| $)
修饰符: $ b
警告,这将允许在标题中允许非(SIC:/ NAICS :)。
但是,它们不是测距仪的权利?
编辑
我的道歉。这两个正则表达式可以缩短为
\([^)] *(?:SIC | NAICS):[ \)
和
\s * $)(。*?)(?: \([^)] *(?:SIC | NAICS):[^)] * \)| $)
p>
I am trying to split a string that could look like this:
International Bank for Reconstruction & Development (NAICS: 928120; SIC: 6081) World Bank (NAICS: 928120; SIC: 6081)
into this
International Bank for Reconstruction & Development World Bank
or any of this:
International Bank for Reconstruction & Development International Bank for Reconstruction & Development (SIC: 6081) International Bank for Reconstruction & Development (NAICS: 928120)
into this
International Bank for Reconstruction & Development
there could be any number of matches.
I've tried a few things, using negative characters classes doesn't work:
[^\(NAICS: (\d+);\)]+
I'm using C# Regex.
If you just want a regex to split on this might work \([^)]*(?:(?:SIC|NAICS):[^)]*)+\)
You could do it without split. I would take a find_all regex approach.
(?!\s*$)(.*?)(?:\([^)]*(?:(?:SIC|NAICS):[^)]*)+\)|$)
Modifiers: s (dot allows newline) and g (global)
Be warned, this will allow non '(SIC:/NAICS:)' to be allowed in the Title.
But, they aren't the delimeter right?
edit
My apologies. Those two regexs' can be shortened to
\([^)]*(?:SIC|NAICS):[^)]*\)
and
(?!\s*$)(.*?)(?:\([^)]*(?:SIC|NAICS):[^)]*\)|$)
这篇关于正则表达式拆分特定字模式的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!