正则表达式拆分特定字模式的字符串 [英] Regex Split String at particular word pattern

查看:259
本文介绍了正则表达式拆分特定字模式的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想分割一个字符串,如下所示:

 
国际复兴开发银行(NAICS:928120; SIC :6081)世界银行(NAICS:928120; SIC:6081)

加入

 
国际复兴开发银行
世界银行

/ p>

 
国际复兴开发银行
国际复兴开发银行(SIC:6081)
国际复兴银行&Development(NAICS:928120)

添加到此

 
国际复兴开发银行

可以有任何数量的比赛。



我尝试过几个事情,使用负字符类不起作用:

  [^ \ NAICS:(\d +); \)] + 



我使用的是C#Regex。 / p>

解决方案

如果你只是想要一个正则表达式拆分这可能工作 \([^)] * (?:(?:SIC | NAICS):[^)] *)+ \)



我将采用find_all正则表达式方法。

 (?! \s * $)(。*?) \\([^)] *(?:(?:SIC | NAICS):[^)] *)+ \)| $)
修饰符: $ b

警告,这将允许在标题中允许非(SIC:/ NAICS :)。

但是,它们不是测距仪的权利?



编辑



我的道歉。这两个正则表达式可以缩短为



\([^)] *(?:SIC | NAICS):[ \)





\s * $)(。*?)(?: \([^)] *(?:SIC | NAICS):[^)] * \)| $) p>

I am trying to split a string that could look like this:

International Bank for Reconstruction & Development (NAICS: 928120; SIC: 6081) World Bank (NAICS: 928120; SIC: 6081)

into this

International Bank for Reconstruction & Development
World Bank

or any of this:

International Bank for Reconstruction & Development
International Bank for Reconstruction & Development (SIC: 6081)
International Bank for Reconstruction & Development (NAICS: 928120)

into this

International Bank for Reconstruction & Development

there could be any number of matches.

I've tried a few things, using negative characters classes doesn't work:

[^\(NAICS: (\d+);\)]+

I'm using C# Regex.

解决方案

If you just want a regex to split on this might work \([^)]*(?:(?:SIC|NAICS):[^)]*)+\)

You could do it without split. I would take a find_all regex approach.

(?!\s*$)(.*?)(?:\([^)]*(?:(?:SIC|NAICS):[^)]*)+\)|$)
Modifiers: s (dot allows newline) and g (global)

Be warned, this will allow non '(SIC:/NAICS:)' to be allowed in the Title.
But, they aren't the delimeter right?

edit

My apologies. Those two regexs' can be shortened to

\([^)]*(?:SIC|NAICS):[^)]*\)

and

(?!\s*$)(.*?)(?:\([^)]*(?:SIC|NAICS):[^)]*\)|$)

这篇关于正则表达式拆分特定字模式的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆