Python 用空格分割字符串,除非在引号中,但保留引号 [英] Python split string by spaces except when in quotes, but keep the quotes

查看:84
本文介绍了Python 用空格分割字符串,除非在引号中,但保留引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拆分以下字符串:

数量 [*,'EXTRA 05',*]

Quantity [*,'EXTRA 05',*]

想要的结果是:

["数量", "[*,'额外 05',*]"]

["Quantity", "[*,'EXTRA 05',*]"]

我发现最接近的是使用 shlex.split,但是这会删除内部引号,结果如下:

The closest I have found is using shlex.split, however this removes the internal quotes giving the following result:

['数量', '[*,EXTRA 05,*]']

['Quantity', '[*,EXTRA 05,*]']

任何建议将不胜感激.

还需要多次拆分,例如:

Will also require multiple splits such as:

"数量 [*,'EXTRA 05',*] [*,'EXTRA 09',*]"

"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"

致:

["数量", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

["Quantity", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

推荐答案

处理字符串,基本的方法是正则表达式工具(模块 re )

To treat string, the basic way is the regular expression tool ( module re )

鉴于您提供的信息(这意味着它们可能不够),以下代码可以完成这项工作:

Given the infos you give (this mean they may be unsufficient) the following code does the job:

import re

r = re.compile('(?! )[^[]+?(?= *\[)'
               '|'
               '\[.+?\]')


s1 = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s1)
print '---------------'      

s2 = "'zug hug'Quantity boondoggle 'fish face monkey "\
     "dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s2)

结果

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]  
---------------
["'zug hug'Quantity boondoggle 'fish face monkey dung'", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

必须按如下方式理解正则表达式模式:

The regular expression pattern must be undesrtood as follows:

'|' 表示 OR

所以正则表达式模式表达了两个部分 RE:
(?! )[^[]+?(?= *\[)

\[.+?\]

So the regex pattern expresses two partial RE:
(?! )[^[]+?(?= *\[)
and
\[.+?\]

核心是[^[]+
括号定义一组字符.符号 ^ 在第一个括号 [ 之后,这意味着该集合被定义为不是符号 ^<之后的所有字符/代码>.
目前 [^[] 表示 任何不是左括号 [ 的字符,并且由于在 set 的定义之后有一个 +[^[]+ 表示其中的字符序列没有左括号.

The core is [^[]+
Brackets define a set of characters. The symbol ^ being after the first bracket [ , it means that the set is defined as all the characters that aren't the ones that follow the symbol ^.
Presently [^[] means any character that isn't an opening bracket [ and, as there's a + after this definition of set, [^[]+ means sequence of characters among them there is no opening bracket.

现在,[^[]+ 后面有一个问号:这意味着捕获的序列必须停在问号后面的符号之前.
这里,跟在 ? 后面的是 (?= *\[) ,它是一个先行断言,由 (?=....) 表示它是一个积极的前瞻断言和 *\[,这最后一部分是前面的序列,捕获的序列必须停止. *\[ 表示:零,一个或多个空格直到左括号(需要反斜杠 \ 消除 [ 作为开头的含义)一组字符).

Now, there is a question mark after [^[]+ : it means that the sequence catched must stop before what is symbolized just after the question mark.
Here, what follows the ? is (?= *\[) which is a lookahead assertion, composed of (?=....) that signals it is a positive lookahead assertion and of *\[, this last part being the sequence in front of which the catched sequence must stop. *\[ means: zero,one or more blanks until the opening bracket (backslash \ needed to eliminate the meaning of [ as the opening of a set of characters).

核心前面还有(?!),这是一个否定的前瞻断言:有必要让这个部分RE只捕获以空白开头的序列,因此避免捕获连续的空白.去掉这个(?!),你就会看到效果.

There's also (?! ) in front of the core, it's a negative lookahead assertion: it is necessary to make this partial RE to catch only sequences beginning with a blank, so avoiding to catch successions of blanks. Remove this (?! ) and you'll see the effect.

\[.+?\] 表示:左括号字符 [ ,由 .+? 捕获的字符序列(点匹配除 \n) 之外的任何字符,此序列必须停在结束括号字符 ] 之前,这是要捕获的最后一个字符.

\[.+?\] means : the opening bracket characater [ , a sequence of characters catched by .+? (the dot matching with any character except \n) , this sequence must stop in front of the ending bracket character ] that is the last character to be catched.

.

string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
import re
print re.split(' (?=\[)',string)

结果

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

!!

这篇关于Python 用空格分割字符串,除非在引号中,但保留引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆