从字符串中提取除方括号中的单词以外的所有单词 [英] Extract all words from string except words in square brackets

查看:179
本文介绍了从字符串中提取除方括号中的单词以外的所有单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

同样,我完全陷入了创建正则表达式的麻烦之中.

Again I'm totally stuck in creating a regular expression.

我有一个字符串模式,例如:

I have a string pattern like:

str = ' wordA [] wordAB [xyz] wordABC [x] '

因此,在方括号[ ... ]或空方括号[]中总会有一个单词.单词的长度,前后空格以及方括号内的字符数是随机的.重复此序列的频率也是随机的.

So there is always a word followed by something in brackets [ ... ] or empty brackets []. The length of the words, the leading and trailing white spaces and the number of chars inside the brackets is random. Also random is how often this sequence is repeated.

我只想提取不带括号的单词:

I'd like to extract just the words without brackets:

output = 

    'wordA'    'wordBC'    'wordABC'

我认为问题在于方括号,因为它们是正则表达式的功能字符.我尝试过类似的

I think the problem are the square brackets as they are functional characters for regular expressions. I tried something like

output = regexp(str,'^\[.+\]$','split')

并且没有成功的变体.

有任何提示吗?

推荐答案

我们可以使用\w+正则表达式选择所有单词.但是它将选择所有单词(在括号中包括那些单词).括号之外的单词在其前后都有空格,因此我们可以在(?<=\s)之后添加正向后缀-确保单词之前有空格,而在-c5>之间则应添加正向lookahead-确保单词之后有空格.另外,第一个单词之前没有空格,因此我们还需要包含条件以包括字符串开头,从而使我们在(?<=\s|^)后面拥有积极的眼光.最后,我们有完整的正则表达式:

We can select all words using \w+ regex. But it would select all words (include those ones in brackets). Words outside of brackets have spaces before and after them, so we can add positive lookbehind (?<=\s) - be sure that there is space before the word, and positive lookahead (?=\s) - be sure that there is space after the word. Additionally first word doesn't have space before it, so we need to include condition to include start of string as well, giving us positive lookbehind (?<=\s|^). Finally we have full regex:

(?<=\s|^)\w+(?=\s)

如果可以使用wordA[]字符串(无空格),则需要将[添加到正向超前.

In case if you can have wordA[] string (no spaces), then you need to add [ to positive lookahead.

(?<=\s|^)\w+(?=\s|\[)

如果可以使用wordA [ xyz ]字符串(方括号内的空格),则上述正则表达式将不起作用,我们需要采取不同的策略-查找之前没有[的单词.但是我们不能只说在[em之前没有[em ]的单词,因为它将与[xyz]中的yz相匹配,我们需要说的是,我们需要的单词不以[开头,也不使用.

In case if you can have wordA [ xyz ] strings (spaces within brackets), the above regex wouldn't work and we need different strategy - find words not having [ before. But we cannot just say words without [ before them, because it would match yz in [xyz], we need to say that we need words not leaded by [ and symbols other than ].

(?<!\[[^]]*)\w+

这篇关于从字符串中提取除方括号中的单词以外的所有单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆