如何在python中拆分但忽略带引号的字符串中的分隔符? [英] How to split but ignore separators in quoted strings, in python?

查看:113
本文介绍了如何在python中拆分但忽略带引号的字符串中的分隔符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用分号分割这样的字符串.但我不想在字符串(' 或 ")内的分号上拆分.我不是在解析文件;只是一个没有换行符的简单字符串.

part 1;"this is ; part 2;";'this is ;part 3';part 4;this "is ; part" 5

结果应该是:

  • 第 1 部分
  • 这是;第 2 部分;"
  • '这是;第 3 部分'
  • 第 4 部分
  • 这个是;部分"5

我想这可以用正则表达式来完成,但如果不是;我愿意接受另一种方法.

解决方案

大多数答案似乎过于复杂.您不需要需要反向引用.您不需要依赖于 re.findall 是否提供重叠匹配.鉴于无法使用 csv 模块解析输入,因此正则表达式是唯一可行的方法,您只需要使用匹配字段的模式调用 re.split 即可.

请注意,此处匹配字段比匹配分隔符要容易得多:

导入重新数据 = """第 1 部分;"这是;第 2 部分;";'这是 ; 第 3 部分';第 4 部分;这"是;部分5""PATTERN = re.compile(r'''((?:[^;"']|"[^"]*"|'[^']*')+)''')打印 PATTERN.split(data)[1::2]

输出为:

['part 1', '"this is ; part 2;"', "'this is ; part 3'", 'part 4', 'this "is ; part" 5']

正如 Jean-Luc Nacif Coelho 正确指出的那样,这将无法正确处理空组.视情况而定,可能重要也可能无关紧要.如果确实重要,可以通过例如将 ';;' 替换为 ';;' 来处理它,其中 <标记> 必须是一些字符串(没有分号),您知道在拆分之前不会出现在数据中.您还需要在以下时间恢复数据:

<预><代码>>>>标记 = ";!$%^&;">>>[r.replace(marker[1:-1],'') for r in PATTERN.split("aaa;;aaa;'b;;b'".replace(';;', marker))[1::2]]['aaa', '', 'aaa', "'b;;b'"]

然而,这是一团糟.有什么更好的建议吗?

I need to split a string like this, on semicolons. But I don't want to split on semicolons that are inside of a string (' or "). I'm not parsing a file; just a simple string with no line breaks.

part 1;"this is ; part 2;";'this is ; part 3';part 4;this "is ; part" 5

Result should be:

  • part 1
  • "this is ; part 2;"
  • 'this is ; part 3'
  • part 4
  • this "is ; part" 5

I suppose this can be done with a regex but if not; I'm open to another approach.

解决方案

Most of the answers seem massively over complicated. You don't need back references. You don't need to depend on whether or not re.findall gives overlapping matches. Given that the input cannot be parsed with the csv module so a regular expression is pretty well the only way to go, all you need is to call re.split with a pattern that matches a field.

Note that it is much easier here to match a field than it is to match a separator:

import re
data = """part 1;"this is ; part 2;";'this is ; part 3';part 4;this "is ; part" 5"""
PATTERN = re.compile(r'''((?:[^;"']|"[^"]*"|'[^']*')+)''')
print PATTERN.split(data)[1::2]

and the output is:

['part 1', '"this is ; part 2;"', "'this is ; part 3'", 'part 4', 'this "is ; part" 5']

As Jean-Luc Nacif Coelho correctly points out this won't handle empty groups correctly. Depending on the situation that may or may not matter. If it does matter it may be possible to handle it by, for example, replacing ';;' with ';<marker>;' where <marker> would have to be some string (without semicolons) that you know does not appear in the data before the split. Also you need to restore the data after:

>>> marker = ";!$%^&;"
>>> [r.replace(marker[1:-1],'') for r in PATTERN.split("aaa;;aaa;'b;;b'".replace(';;', marker))[1::2]]
['aaa', '', 'aaa', "'b;;b'"]

However this is a kludge. Any better suggestions?

这篇关于如何在python中拆分但忽略带引号的字符串中的分隔符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆