查找 2 个或更多换行符 [英] Find 2 or more Newlines
问题描述
我的字符串看起来像:
'我看到一只小寄居蟹\r\n他的颜色太单调了\r\n\r\n很难看到蝴蝶\r\n因为他飞过天空\r\n\r\n听到鹅的鸣叫声\r\n我想他对驼鹿很生气\r\n\r\'
而且我需要在有两个或更多换行符
的地方拆分它.
当然是使用 re
模块.
在这个特定的字符串 re.split(r'\r\n\r\n+', text)
上有效,但它不会捕获 \r\n\r\n\r\n
,对吧?
我尝试过 re.split(r'(\r\n){2,}', text)
,它在 每一行 和 处拆分re.split(r'\r\n{2,}', text)
,它创建了一个 len()
1.
不应该re.split(r'(\r\n){2,}', text) == re.split(r'\r\n\r\n', text)
是 True
对于其中没有连续出现超过 2 个 \r\n
?
re.split(r'(\r\n){2,}', text)
不会每次都拆分线.它完全符合您的要求,除了它保留了 \r\n
的一次出现,因为您已将其包含在一个捕获组中.改用非捕获组:
(?:\r\n){2,}
在这里你可以看到区别是什么:
<预><代码>>>>re.split(r'(?:\r\n){2,}', 'foo\r\n\r\nbar')['foo', 'bar']>>>re.split(r'(\r\n){2,}', 'foo\r\n\r\nbar')['foo', '\r\n', 'bar']My string looks like:
'I saw a little hermit crab\r\nHis coloring was oh so drab\r\n\r\nIt\u2019s hard to see the butterfly\r\nBecause he flies across the sky\r\n\r\nHear the honking of the goose\r\nI think he\u2019s angry at the moose\r\n\r\'
And I need to split it wherever there are two or more newlines
.
Am using the re
module, of course.
On this particular string re.split(r'\r\n\r\n+', text)
works, but it wouldn't catch \r\n\r\n\r\n
, right?
I have tried re.split(r'(\r\n){2,}', text)
, which splits at every line and re.split(r'\r\n{2,}', text)
, which creates a list of len()
1.
Shouldn't re.split(r'(\r\n){2,}', text) == re.split(r'\r\n\r\n', text)
be True
for a string in which there are no consecutive occurrences of more than 2 \r\n
?
re.split(r'(\r\n){2,}', text)
doesn't split at every line. It does exactly what you want, except it preserves one occurence of \r\n
because you've enclosed it in a capturing group. Use a non-capturing group instead:
(?:\r\n){2,}
Here you can see what the difference is:
>>> re.split(r'(?:\r\n){2,}', 'foo\r\n\r\nbar')
['foo', 'bar']
>>> re.split(r'(\r\n){2,}', 'foo\r\n\r\nbar')
['foo', '\r\n', 'bar']
这篇关于查找 2 个或更多换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!