在 python 中解析字符串:如何在忽略引号内的换行符的同时拆分换行符 [英] parsing a string in python: how to split newlines while ignoring newline inside quotes
问题描述
我有一个文本需要在 python 中解析.
I have a text that i need to parse in python.
这是一个字符串,我想将其拆分为行列表,但是,如果换行符 (\n) 在引号内,那么我们应该忽略它.
It is a string where i would like to split it to a list of lines, however, if the newlines (\n) is inside quotes then we should ignore it.
例如:
abcd efgh ijk\n1234 567"qqqq\n---" 890\n
应该被解析为以下几行的列表:
should be parsed into a list of the following lines:
abcd efgh ijk
1234 567"qqqq\n---" 890
我已经尝试过使用 split('\n')
,但我不知道如何忽略引号.
I've tried to it with split('\n')
, but i don't know how to ignore the quotes.
有什么想法吗?
谢谢!
推荐答案
这里有一个更简单的解决方案.
Here's a much easier solution.
匹配 (?:"[^"]*"|.)+
组.即引号中的内容或不是换行符的内容".
Match groups of (?:"[^"]*"|.)+
. Namely, "things in quotes or things that aren't newlines".
示例:
import re
re.findall('(?:"[^"]*"|.)+', text)
<小时>
注意:这将几个换行符合并为一个,因为空白行被忽略.为避免这种情况,还要给出一个空值:(?:"[^"]*"|.)+|(?!\Z)
.
NOTE: This coalesces several newlines into one, as blank lines are ignored. To avoid that, give a null case as well: (?:"[^"]*"|.)+|(?!\Z)
.
(?!\Z)
是一种令人困惑的表示不是字符串的结尾"的方式.(?!
)
是负前瞻;\Z
是字符串的结尾"部分.
The (?!\Z)
is a confusing way to say "not the end of a string". The (?!
)
is negative lookahead; the \Z
is the "end of a string" part.
测试:
import re
texts = (
'text',
'"text"',
'text\ntext',
'"text\ntext"',
'text"text\ntext"text',
'text"text\n"\ntext"text"',
'"\n"\ntext"text"',
'"\n"\n"\n"\n\n\n""\n"\n"'
)
line_matcher = re.compile('(?:"[^"]*"|.)+')
for text in texts:
print("{:>27} → {}".format(
text.replace("\n", "\\n"),
" [LINE] ".join(line_matcher.findall(text)).replace("\n", "\\n")
))
#>>> text → text
#>>> "text" → "text"
#>>> text\ntext → text [LINE] text
#>>> "text\ntext" → "text\ntext"
#>>> text"text\ntext"text → text"text\ntext"text
#>>> text"text\n"\ntext"text" → text"text\n" [LINE] text"text"
#>>> "\n"\ntext"text" → "\n" [LINE] text"text"
#>>> "\n"\n"\n"\n\n\n""\n"\n" → "\n" [LINE] "\n" [LINE] "" [LINE] "\n"
这篇关于在 python 中解析字符串:如何在忽略引号内的换行符的同时拆分换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!