如何用引号将逗号分隔的键值对拆分 [英] How to split comma-separated key-value pairs with quoted commas
问题描述
我知道关于解析逗号分隔值的其他文章很多,但是我找不到拆分键-值对并处理引号逗号的文章.
I know there are a lot of other posts about parsing comma-separated values, but I couldn't find one that splits key-value pairs and handles quoted commas.
我有这样的字符串:
age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"
我想得到这个:
{
'age': '12',
'name': 'bob',
'hobbies': 'games,reading',
'phrase': "I'm cool!",
}
我尝试这样使用shlex
:
lexer = shlex.shlex('''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"''')
lexer.whitespace_split = True
lexer.whitespace = ','
props = dict(pair.split('=', 1) for pair in lexer)
问题在于shlex
会将hobbies
条目拆分为两个令牌,即hobbies="games
和reading"
.有没有办法使双引号考虑在内?还是我可以使用另一个模块?
The trouble is that shlex
will split the hobbies
entry into two tokens, i.e. hobbies="games
and reading"
. Is there a way to make it take the double quotes into account? Or is there another module I can use?
修复了whitespace_split
我不依赖于使用shlex
.正则表达式也很好,但是我不知道如何处理匹配的引号.
EDIT 2: I'm not tied to using shlex
. Regex is fine too, but I didn't know how to handle the matching quotes.
推荐答案
您只需要在POSIX模式下使用shlex
词法分析器即可.
You just needed to use your shlex
lexer in POSIX mode.
在创建词法分析器时添加posix=True
.
Add posix=True
when creating the lexer.
(请参见 shlex解析规则)
lexer = shlex.shlex('''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"''', posix=True)
lexer.whitespace_split = True
lexer.whitespace = ','
props = dict(pair.split('=', 1) for pair in lexer)
输出:
{'age': '12', 'phrase': "I'm cool!", 'hobbies': 'games,reading', 'name': 'bob'}
PS:只要输入可以包含带引号的=
或,
字符,则正则表达式将无法解析键值对.即使预处理字符串,也无法使输入由正则表达式进行解析,因为不能将这种输入形式化地定义为正则语言.
PS : Regular expressions won't be able to parse key-value pairs as long as the input can contain quoted =
or ,
characters. Even preprocessing the string wouldn't be able to make the input be parsed by a regular expression, because that kind of input cannot be formally defined as a regular language.
这篇关于如何用引号将逗号分隔的键值对拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!