Python:使用多字符定界符分割字符串,除非在引号内 [英] Python: split string by a multi-character delimiter unless inside quotes

查看:83
本文介绍了Python:使用多字符定界符分割字符串,除非在引号内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的情况下,定界符字符串为''(连续3个空格,但答案应适用于任何多字符定界符),要搜索的边缘情况文本可能是这样的:/p>

In my case the delimiter string is ' ' (3 consecutive spaces, but the answer should work for any multi-character delimiter), and an edge case text to search in could be this:

'Coord="GLOB"AL   Axis=X   Type="Y   ZR"   Color="Gray Dark"   Alt="Q   Z"qz   Loc=End'

解决方案应返回以下字符串:

The solution should return the following strings:

Coord="GLOB"AL
Axis=X
Type="Y   ZR"
Color="Gray Dark"
Alt="Q   Z"qz
Loc=End

我一直在寻找正则表达式解决方案,还评估了反问题(除非在引号内使用匹配多字符定界符),因为Python 3.4的 re.split 命令.3允许通过正则表达式模式轻松地分割文本,但是我不确定是否存在正则表达式解决方案,因此我也对(高效的)非正则表达式解决方案持开放态度.

I've looked for regex solutions, evaluating also the inverse problem (match multi-character delimiter unless inside quotes), since the re.split command of Python 3.4.3 allows to easily split a text by a regex pattern, but I'm not sure there is a regex solution, therefore I'm open also to (efficient) non regex solutions.

我已经看到了使用包含正则表达式的lookahead/lookbehind解决逆问题的一些解决方案,但是它们不起作用,因为Python lookahead/lookbehind(与其他语言引擎不同)需要固定宽度的模式.

I've seen some solution to the inverse problem using lookahead/lookbehind containing regex pattern, but they did not work because Python lookahead/lookbehind (unlike other languages engine) requires fixed-width pattern.

此问题不是正则表达式匹配空间的重复项,但不能出现在字符串"中或类似的其他问题,因为:

This question is not a duplicate of Regex matching spaces, but not in "strings" or similar other questions, because:

  1. 在引号外匹配单个空格是不同的通过匹配多个字符定界符(在我的示例中,定界符是3个空格,但问题是关于任何多字符定界符);
  2. Python regex引擎与C ++或其他稍有不同语言正则表达式引擎;
  3. 与定界符匹配是我的问题(直接问题)的 B边关于拆分字符串.
  1. matching a single space outside quotes is different from matching a multi-character delimiter (in my example the delimiter is 3 spaces, but the question is about any multi-character delimiter);
  2. Python regex engine is slightly different from C++ or other languages regex engines;
  3. matching a delimiter is side B of my question, the direct question is about splitting a string.

推荐答案

x='Coord="GLOB"AL   Axis=X   Type="Y   ZR"   Color="Gray Dark"   Alt="Q   Z"qz   Loc=End'
print re.split(r'\s+(?=(?:[^"]*"[^"]*")*[^"]*$)',x)

您需要使用 lookahead 来查看 space 是否不在"

You need to use lookahead to see if the space it not in between ""

输出 ['Coord ="GLOB" AL','Axis = X','Type ="Y ZR"','Color ="Gray Dark"','Alt ="QZ" qz','Loc = End']

对于通用版本,如果要在" 中不存在的定界符分割,请使用

For a generalized version if you want to split on delimiters not present inside "" use

re.split(r'delimiter(?=(?:[^"]*"[^"]*")*[^"]*$)',x)

这篇关于Python:使用多字符定界符分割字符串,除非在引号内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆