删除单个换行符但保留多个换行符的最佳方法 [英] Best way of removing single newlines but keeping multiple newlines
问题描述
从字符串中删除单个换行符但保留多个换行符的最pythonic 的方法是什么?
如在
"foo\n\nbar\none\n\rtwo\rthree\n\n\nhello"
变成
"foo\n\nbar 一二三\n\n\nhello"
我正在考虑使用 splitlines(),然后用 "\n"
替换空行,然后再次连接所有内容,但我怀疑有更好/更简单的方法.也许使用正则表达式?
>>>re.sub('(?
这会查找 \r?\n
或 \n?\r
并使用后视和前瞻断言来防止任何一侧出现换行符.>
就其价值而言,在野外发现了三种类型的行尾:
\n
在 Linux、Mac OS X 和其他 Unices 上\r\n
在 Windows 和 HTTP 协议中\r
在 Mac OS 9 及更早版本上
前两个是迄今为止最常见的.如果您想将可能性限制为仅这三个,您可以这样做:
<预><代码>>>>re.sub('(?当然,如果你不关心 Mac 的行尾,那就去掉 |\r
,这种情况很少见.
What would be the most pythonic way of removing single newlines but keeping multiple newlines from a string?
As in
"foo\n\nbar\none\n\rtwo\rthree\n\n\nhello"
turning into
"foo\n\nbar one two three\n\n\nhello"
I was thinking about using splitlines(), then replacing empty lines by "\n"
and then concatenating everything back again, but I suspect there is a better/simpler way. Maybe using regexes?
>>> re.sub('(?<![\r\n])(\r?\n|\n?\r)(?![\r\n])', ' ', s)
'foo\n\nbar one two three\n\n\nhello'
This looks for \r?\n
or \n?\r
and uses lookbehind and lookahead assertions to prevent there from being a newline on either side.
For what it's worth, there are three types of line endings found in the wild:
\n
on Linux, Mac OS X, and other Unices\r\n
on Windows, and in the HTTP protocol\r
on Mac OS 9 and earlier
The first two are by far the most common. If you want to limit the possibilities to just those three, you could do:
>>> re.sub('(?<![\r\n])(\r?\n|\r)(?![\r\n])', ' ', s)
'foo\n\nbar one two three\n\n\nhello'
And of course, get rid of the |\r
if you don't care about Mac line endings, which are rare.
这篇关于删除单个换行符但保留多个换行符的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!