Python 用空格分割字符串,除非在引号中,但保留引号 [英] Python split string by spaces except when in quotes, but keep the quotes
问题描述
我想拆分以下字符串:
数量 [*,'EXTRA 05',*]
Quantity [*,'EXTRA 05',*]
想要的结果是:
["数量", "[*,'额外 05',*]"]
["Quantity", "[*,'EXTRA 05',*]"]
我发现最接近的是使用 shlex.split,但是这会删除内部引号,结果如下:
The closest I have found is using shlex.split, however this removes the internal quotes giving the following result:
['数量', '[*,EXTRA 05,*]']
['Quantity', '[*,EXTRA 05,*]']
任何建议将不胜感激.
还需要多次拆分,例如:
Will also require multiple splits such as:
"数量 [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
致:
["数量", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
["Quantity", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
推荐答案
处理字符串,基本的方法是正则表达式工具(模块 re
)
To treat string, the basic way is the regular expression tool ( module re
)
鉴于您提供的信息(这意味着它们可能不够),以下代码可以完成这项工作:
Given the infos you give (this mean they may be unsufficient) the following code does the job:
import re
r = re.compile('(?! )[^[]+?(?= *\[)'
'|'
'\[.+?\]')
s1 = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s1)
print '---------------'
s2 = "'zug hug'Quantity boondoggle 'fish face monkey "\
"dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s2)
结果
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
---------------
["'zug hug'Quantity boondoggle 'fish face monkey dung'", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
必须按如下方式理解正则表达式模式:
The regular expression pattern must be undesrtood as follows:
'|'
表示 OR
所以正则表达式模式表达了两个部分 RE:(?! )[^[]+?(?= *\[)
和\[.+?\]
So the regex pattern expresses two partial RE:
(?! )[^[]+?(?= *\[)
and
\[.+?\]
核心是[^[]+
括号定义一组字符.符号 ^
在第一个括号 [
之后,这意味着该集合被定义为不是符号 ^<之后的所有字符/代码>.
目前 [^[]
表示 任何不是左括号 [ 的字符,并且由于在 set 的定义之后有一个 +
,[^[]+
表示其中的字符序列没有左括号.
The core is [^[]+
Brackets define a set of characters. The symbol ^
being after the first bracket [
, it means that the set is defined as all the characters that aren't the ones that follow the symbol ^
.
Presently [^[]
means any character that isn't an opening bracket [ and, as there's a +
after this definition of set, [^[]+
means sequence of characters among them there is no opening bracket.
现在,[^[]+
后面有一个问号:这意味着捕获的序列必须停在问号后面的符号之前.
这里,跟在 ?
后面的是 (?= *\[)
,它是一个先行断言,由 (?=....)
表示它是一个积极的前瞻断言和 *\[
,这最后一部分是前面的序列,捕获的序列必须停止. *\[
表示:零,一个或多个空格直到左括号(需要反斜杠 \
消除 [
作为开头的含义)一组字符).
Now, there is a question mark after [^[]+
: it means that the sequence catched must stop before what is symbolized just after the question mark.
Here, what follows the ?
is (?= *\[)
which is a lookahead assertion, composed of (?=....)
that signals it is a positive lookahead assertion and of *\[
, this last part being the sequence in front of which the catched sequence must stop. *\[
means: zero,one or more blanks until the opening bracket (backslash \
needed to eliminate the meaning of [
as the opening of a set of characters).
核心前面还有(?!)
,这是一个否定的前瞻断言:有必要让这个部分RE只捕获以空白开头的序列,因此避免捕获连续的空白.去掉这个(?!)
,你就会看到效果.
There's also (?! )
in front of the core, it's a negative lookahead assertion: it is necessary to make this partial RE to catch only sequences beginning with a blank, so avoiding to catch successions of blanks. Remove this (?! )
and you'll see the effect.
\[.+?\]
表示:左括号字符 [ ,由 .+?
捕获的字符序列(点匹配除 \n
) 之外的任何字符,此序列必须停在结束括号字符 ] 之前,这是要捕获的最后一个字符.
\[.+?\]
means : the opening bracket characater [ , a sequence of characters catched by .+?
(the dot matching with any character except \n
) , this sequence must stop in front of the ending bracket character ] that is the last character to be catched.
.
string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
import re
print re.split(' (?=\[)',string)
结果
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
!!
这篇关于Python 用空格分割字符串,除非在引号中,但保留引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!