为什么这个正则表达式会产生四个项目? [英] Why does this regex result in four items?
问题描述
我想用 、
->
、=>
或用多个空格包裹的那些分割字符串,这意味着我分割后的字符串可以得到she
和he
两项:"she he", "she he", "she he", "she he", "she->he", "she->he", "she=>he", "she=> he", " she-> he ", " she => he \n"
I want to split a string by ,
->
, =>
, or those wrapped with several spaces, meaning that I can get two items, she
and he
, from the following strings after being split:
"she he", "she he", "she he ", "she he ", "she->he", "she ->he", "she=>he", "she=> he", " she-> he ", " she => he \n"
我试过用这个:
re.compile("(?<!^)((\\s*[-=]>\\s*)|[\\s+\t])(?!$\n)(?=[^\s])").split(' she -> he \n')
我得到的是一个包含四个项目的列表:[' she', ' ->', ' ->', '他\n']
.
What I get is a list with four items: [' she', ' -> ', ' -> ', 'he \n']
.
为此,
re.compile("(?<!^)((\\s*[-=]>\\s*)|[\\s+\t])(?!$\n)(?=[^\s])").split('she he')
我明白了:['she', ' ', None, 'he']
.
为什么有四个项目?如果没有中间两个,我怎么能只得到两个?
Why are there four items? And how can I get only two without the middle two?
推荐答案
如果你能去掉你的输入字符串.根据您的描述,您只需要在 \s+
或 \s*->\s*
或 \s*=> 上拆分单词即可.\s*
If you can just strip your input string. From your description, all you need is to split the words on either \s+
or \s*->\s*
or \s*=>\s*
所以这是我的解决方案:
So here is my solution:
p = re.compile(r'\s*[-=]>\s*|\s+')
input1 = "she he"
input2 = " she -> he \n".strip()
print p.split(input1)
print p.split(input2)
您的输出将只是她"和他":
Your output would be just 'she' and 'he':
['she', 'he']
这篇关于为什么这个正则表达式会产生四个项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!