使用自定义分隔符拆分字符串,尊重并保留引号(单引号或双引号) [英] Split a string with custom delimiter, respect and preserve quotes (single or double)

查看:62
本文介绍了使用自定义分隔符拆分字符串,尊重并保留引号(单引号或双引号)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的字符串:

<预><代码>>>>s = '1,",2, ",,4,,,\',7, \',8,,10,'>>>秒'1,",2,",,4,,,\',7,\',8,,10,'

我想使用不同的分隔符(不仅仅是空格),我也想尊重和保留引号(单引号或双引号).

在分隔符 , 上拆分 s 时的预期结果:

['1', ',2, ', '', '4', '', '', ',7, ', '8', '', '10', '']

解决方案

this 的修改版本(仅处理空格)可以解决问题(引号被删除):

<预><代码>>>>进口重新>>>s = '1,",2, ",,4,,,\',7, \',8,,10,'>>>token = [t for t in re.split(r",?\"(.*?)\",?|,?'(.*?)',?|,", s) 如果 t 不是 None ]>>>令牌['1', ',2, ', '', '4', '', '', ',7, ', '8', '', '10', '']

如果您想保留引号字符:

<预><代码>>>>token = [t for t in re.split(r",?(\".*?\"),?|,?('.*?'),?|,", s) 如果 t 不是 None ]>>>令牌['1', '",2, "', '', '4', '', '', "',7, '", '8', '', '10', '']

如果您想使用自定义分隔符,请将正则表达式中每个出现的 , 替换为您自己的分隔符.

说明:

<代码>|=匹配替代品,例如( |X) = 空格或 X.* = 任何东西X?= x 或没有() = 捕获匹配模式的内容我们有3种选择:1文本"->.*?"->由于转义规则变为 - >\".*?\"2 '文本' ->'.*?'3 分隔符 ->,由于我们要捕获引号内的文本内容,因此我们使用 ():1 \"(.*?)\" (保留引号使用 (\".*?\")2 '(.*?)' (保持引号使用 ('.*?')最后,我们不希望 split 函数报告空匹配,如果分隔符在引号之前和之后,所以我们捕获可能的分隔符也是:1 ,?\"(.*?)\",?2 ,?'(.*?)',?一旦我们使用 |运算符加入我们得到这个正则表达式的 3 种可能性:r",?\"(.*?)\",?|,?'(.*?)',?|,"

I have a string which is like this:

>>> s = '1,",2, ",,4,,,\',7, \',8,,10,'
>>> s
'1,",2, ",,4,,,\',7, \',8,,10,'

I would like to split it using different delimiters (not just white spaces) and I also want to respect and preserve quotes (single or double).

Expected results when splitting s on delimiter ,:

['1', ',2, ', '', '4', '', '', ',7, ', '8', '', '10', '']

解决方案

A modified version of this (which handles only white spaces) can do the trick (quotes are stripped):

>>> import re
>>> s = '1,",2, ",,4,,,\',7, \',8,,10,'

>>> tokens = [t for t in re.split(r",?\"(.*?)\",?|,?'(.*?)',?|,", s) if t is not None ]
>>> tokens
['1', ',2, ', '', '4', '', '', ',7, ', '8', '', '10', '']

And if you like to keep the quotes characters:

>>> tokens = [t for t in re.split(r",?(\".*?\"),?|,?('.*?'),?|,", s) if t is not None ]
>>> tokens
['1', '",2, "', '', '4', '', '', "',7, '", '8', '', '10', '']

If you want to use a custom delimiter replace every occurrence of , in the regexp with your own delimiter.

Explanation:

| = match alternatives e.g. ( |X) = space or X
.* = anything
x? = x or nothing
() = capture the content of a matched pattern

We have 3 alternatives:

1 "text"    -> ".*?" -> due to escaping rules becomes - > \".*?\"
2 'text'    -> '.*?'
3 delimiter ->  ,

Since we want to capture the content of the text inside the quotes, we use ():

1 \"(.*?)\"   (to keep the quotes use (\".*?\")
2 '(.*?)'     (to keep the quotes use ('.*?')

Finally we don't want that split function reports an empty match if a
delimiter precedes and follows quotes, so we capture that possible
delimiter too:

1 ,?\"(.*?)\",?
2 ,?'(.*?)',?

Once we use the | operator to join the 3 possibilities we get this regexp:

r",?\"(.*?)\",?|,?'(.*?)',?|,"

这篇关于使用自定义分隔符拆分字符串,尊重并保留引号(单引号或双引号)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆