python用引号和空格分隔文本 [英] python split text by quotes and spaces
问题描述
我有以下文字
text = 'This is "a simple" test'
我需要以两种方式将其拆分,首先用引号将其分隔,然后将其分隔为:
And I need to split it in two ways, first by quotes and then by spaces, resulting in:
res = ['This', 'is', '"a simple"', 'test']
但是使用 str.split()
,我只能使用引号或空格作为分隔符.有多个分隔符的内置函数吗?
But with str.split()
I'm only able to use either quotes or spaces as delimiters. Is there a built in function for multiple delimiters?
推荐答案
您可以使用 shlex.split
,方便解析带引号的字符串:
You can use shlex.split
, handy for parsing quoted strings:
>>> import shlex
>>> text = 'This is "a simple" test'
>>> shlex.split(text, posix=False)
['This', 'is', '"a simple"', 'test']
以 non-posix 模式执行此操作可防止从拆分结果中删除内部引号.默认情况下, posix
设置为 True
:
Doing this in non-posix mode prevents the removal of the inner quotes from the split result. posix
is set to True
by default:
>>> shlex.split(text)
['This', 'is', 'a simple', 'test']
如果您有多行这种类型的文本,或者您正在从流中读取内容,则可以使用
If you have multiple lines of this type of text or you're reading from a stream, you can split efficiently (excluding the quotes in the output) using csv.reader
:
import io
import csv
s = io.StringIO(text.decode('utf8')) # in-memory streaming
f = csv.reader(s, delimiter=' ', quotechar='"')
print list(f)
# [['This', 'is', 'a simple', 'test']]
如果在Python 3上,您不需要将字符串解码为unicode,因为所有字符串都已经是unicode.
这篇关于python用引号和空格分隔文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!