如何在Python中分割逗号分隔的字符串，除了引号内的逗号 [英] How do I split a comma delimited string in Python except for the commas that are within quotes

查看：2715 发布时间：2017/2/24 17:02:24 python regex csv

本文介绍了如何在Python中分割逗号分隔的字符串，除了引号内的逗号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在python中拆分逗号分隔的字符串。对我来说，棘手的部分是数据中的一些字段在它们中有一个逗号，它们用引号（或'

示例：）。 p>

  hey，hello ,,hello，world，'hey，world'
  / pre> 
 
 需要拆分为5个部分，如下所示
  ['hey'，'hello'，''，'hello，world'，'hey，world'] 
  
任何想法/想法/建议/帮助如何解决Python中的上述问题将非常感谢。
 
 
 谢谢你，
 Vish 
解决方案
（编辑：原来的答案有空的字段在边缘的麻烦，由于 。
  import re 
 
 def parse_fields（text）：
r
>>>> list（parse_fields（'hey，hello ,,hello，world，\'hey，world \''））
 ['hey'，'hello'，''，'hello，world' 'hey，world'] 
>>>> list（parse_fields（'hey，hello ,,hello，world，\'hey，world\'，'））
 ['hey'，'hello'，''，'hello， ，'hey，world'，''] 
>>>> list（parse_fields（'，hey，hello ,,hello，world，\'hey，world\'，'））
 [''，'hey'，'hello'，' hello，world'，'hey，world'，''] 
>>> list（parse_fields（''））
 [''] 
>>>> list（parse_fields（'，'））
 [''，''] 
>>> list（parse_fields（'testing，quotes not atthebeginning \'of \'the，string'））
 ['testing'，'quote不在' 'the'，'string'] 
>>> list（parse_fields（'testing，unterminated quotes'））
 ['testing'，'unterminated quotes'] 

 pos = 0 
 exp =编译（r（['？）（。*？）\1（，| $））
 while True：
m = exp.search 
 result = m.group（2）
 separator = m.group（3）
 
产生结果
 
如果不是分隔符：
 break 
 
 pos = m.end（0）
 
如果__name__ ==__main__：
 import doctest 
 doctest.testmod（）
  
 （[']？） 
 
 
  （。*？）匹配字符串本身，贪婪匹配，根据需要匹配，而不用整个字符串，这被分配给 result ，这是我们实际产生的结果。
 
 
   \1 是一个反向引用，以匹配我们之前匹配的同一单引号或双引号（如果有）。
 
 
  （，| $）匹配分隔每个条目的逗号或行尾。这被分配给 separator 。
 
 
 如果分隔符为假（例如，空），这意味着没有分隔符，所以我们在字符串的结尾 - 我们完成了。否则，我们根据正则表达式完成的位置更新新的开始位置（ m.end（0）），然后继续循环。
 
I am trying to split a comma delimited string in python. The tricky part for me here is that some of the fields in the data themselves have a comma in them and they are enclosed within quotes (" or '). The resulting split string should also have the quotes around the fields removed. Also, some fields can be empty.

Example:
hey,hello,,"hello,world",'hey,world'
needs to be split into 5 parts like below
['hey', 'hello', '', 'hello,world', 'hey,world']
Any ideas/thoughts/suggestions/help with how to go about solving the above problem in Python would be much appreciated.

Thank You,
Vish
 解决方案 
(Edit: The original answer had trouble with empty fields on the edges due to the way re.findall works, so I refactored it a bit and added tests.)
import re

def parse_fields(text):
    r"""
    >>> list(parse_fields('hey,hello,,"hello,world",\'hey,world\''))
    ['hey', 'hello', '', 'hello,world', 'hey,world']
    >>> list(parse_fields('hey,hello,,"hello,world",\'hey,world\','))
    ['hey', 'hello', '', 'hello,world', 'hey,world', '']
    >>> list(parse_fields(',hey,hello,,"hello,world",\'hey,world\','))
    ['', 'hey', 'hello', '', 'hello,world', 'hey,world', '']
    >>> list(parse_fields(''))
    ['']
    >>> list(parse_fields(','))
    ['', '']
    >>> list(parse_fields('testing,quotes not at "the" beginning \'of\' the,string'))
    ['testing', 'quotes not at "the" beginning \'of\' the', 'string']
    >>> list(parse_fields('testing,"unterminated quotes'))
    ['testing', '"unterminated quotes']
    """
    pos = 0
    exp = re.compile(r"""(['"]?)(.*?)\1(,|$)""")
    while True:
        m = exp.search(text, pos)
        result = m.group(2)
        separator = m.group(3)

        yield result

        if not separator:
            break

        pos = m.end(0)

if __name__ == "__main__":
    import doctest
    doctest.testmod()
(['"]?) matches an optional single- or double-quote.

(.*?) matches the string itself.  This is a non-greedy match, to match as much as necessary without eating the whole string.  This is assigned to result, and it's what we actually yield as a result.

\1 is a backreference, to match the same single- or double-quote we matched earlier (if any).

(,|$) matches the comma separating each entry, or the end of the line.  This is assigned to separator.

If separator is false (eg. empty), that means there's no separator, so we're at the end of the string--we're done.  Otherwise, we update the new start position based on where the regex finished (m.end(0)), and continue the loop.

                        这篇关于如何在Python中分割逗号分隔的字符串，除了引号内的逗号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在Python中分割逗号分隔的字符串，除了引号内的逗号 [英] How do I split a comma delimited string in Python except for the commas that are within quotes

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python中分割逗号分隔的字符串，除了引号内的逗号 [英] How do I split a comma delimited string in Python except for the commas that are within quotes

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭