将键=值对转换回Python字典 [英] Converting key=value pairs back into Python dicts

查看：592 发布时间：2020/5/3 8:31:13 python string parsing dictionary logging

本文介绍了将键=值对转换回Python字典的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一个日志文件，其中文本以空格分隔的key=value对形式出现，并且每一行最初都是从Python dict中的数据序列化的，例如:

There's a logfile with text in the form of space-separated key=value pairs, and each line was originally serialized from data in a Python dict, something like:

' '.join([f'{k}={v!r}' for k,v in d.items()])

键始终只是字符串.值可以是 ast.literal_eval 可以成功解析的任何值，不多不少.

The keys are always just strings. The values could be anything that ast.literal_eval can successfully parse, no more no less.

如何处理此日志文件并将行转换回Python字典?示例:

>>> to_dict("key='hello world'")
{'key': 'hello world'}

>>> to_dict("k1='v1' k2='v2'")
{'k1': 'v1', 'k2': 'v2'}

>>> to_dict("s='1234' n=1234")
{'s': '1234', 'n': 1234}

>>> to_dict("""k4='k5="hello"' k5={'k6': ['potato']}""")
{'k4': 'k5="hello"', 'k5': {'k6': ['potato']}}

以下是有关数据的一些额外信息:

Here is some extra context about the data:

键是有效名称
输入行格式正确(例如，没有悬挂的括号)
数据是受信任的(不安全的函数，例如 eval ， exec ，yaml.load都可以使用)
顺序并不重要.性能并不重要.正确性很重要.

Keys are valid names
Input lines are well-formed (e.g. no dangling brackets)
The data is trusted (unsafe functions such as eval, exec, yaml.load are OK to use)
Order is not important. Performance is not important. Correctness is important.

:根据注释的要求，这是MCVE和无法正常运行的示例代码

As requested in the comments, here is an MCVE and an example code that didn't work correctly

>>> def to_dict(s):
...     s = s.replace(' ', ', ')
...     return eval(f"dict({s})")
... 
... 
>>> to_dict("k1='v1' k2='v2'")
{'k1': 'v1', 'k2': 'v2'}  # OK
>>> to_dict("s='1234' n=1234")
{'s': '1234', 'n': 1234}  # OK
>>> to_dict("key='hello world'")
{'key': 'hello, world'}  # Incorrect, the value was corrupted

推荐答案

ast.literal_eval之类的内容无法方便地解析您的输入，但是可以是

Your input can't be conveniently parsed by something like ast.literal_eval, but it can be tokenized as a series of Python tokens. This makes things a bit easier than they might otherwise be.

=令牌在输入中唯一可以出现的地方是键值分隔符；至少到目前为止，ast.literal_eval不接受带有=令牌的任何内容.我们可以使用=令牌来确定键值对在何处开始和结束，而其余大部分工作都可以由ast.literal_eval处理.使用tokenize模块还可以避免在字符串文字中出现=或反斜杠转义的问题.

The only place = tokens can appear in your input is as key-value separators; at least for now, ast.literal_eval doesn't accept anything with = tokens in it. We can use the = tokens to determine where the key-value pairs start and end, and most of the rest of the work can be handled by ast.literal_eval. Using the tokenize module also avoids problems with = or backslash escapes in string literals.

import ast
import io
import tokenize

def todict(logstring):
    # tokenize.tokenize wants an argument that acts like the readline method of a binary
    # file-like object, so we have to do some work to give it that.
    input_as_file = io.BytesIO(logstring.encode('utf8'))
    tokens = list(tokenize.tokenize(input_as_file.readline))

    eqsign_locations = [i for i, token in enumerate(tokens) if token[1] == '=']

    names = [tokens[i-1][1] for i in eqsign_locations]

    # Values are harder than keys.
    val_starts = [i+1 for i in eqsign_locations]
    val_ends = [i-1 for i in eqsign_locations[1:]] + [len(tokens)]

    # tokenize.untokenize likes to add extra whitespace that ast.literal_eval
    # doesn't like. Removing the row/column information from the token records
    # seems to prevent extra leading whitespace, but the documentation doesn't
    # make enough promises for me to be comfortable with that, so we call
    # strip() as well.
    val_strings = [tokenize.untokenize(tok[:2] for tok in tokens[start:end]).strip()
                   for start, end in zip(val_starts, val_ends)]
    vals = [ast.literal_eval(val_string) for val_string in val_strings]

    return dict(zip(names, vals))

这在您的示例输入以及带有反斜杠的示例中均正确运行:

This behaves correctly on your example inputs, as well as on an example with backslashes:

>>> todict("key='hello world'")
{'key': 'hello world'}
>>> todict("k1='v1' k2='v2'")
{'k1': 'v1', 'k2': 'v2'}
>>> todict("s='1234' n=1234")
{'s': '1234', 'n': 1234}
>>> todict("""k4='k5="hello"' k5={'k6': ['potato']}""")
{'k4': 'k5="hello"', 'k5': {'k6': ['potato']}}
>>> s=input()
a='=' b='"\'' c=3
>>> todict(s)
{'a': '=', 'b': '"\'', 'c': 3}

顺便说一句，我们可能会寻找令牌类型NAME而不是=令牌，但是如果它们向literal_eval添加set()支持，那将会中断.寻找=将来也可能会失败，但是看起来并不像寻找NAME令牌那样容易失败.

Incidentally, we probably could look for token type NAME instead of = tokens, but that'll break if they ever add set() support to literal_eval. Looking for = could also break in the future, but it doesn't seem as likely to break as looking for NAME tokens.

这篇关于将键=值对转换回Python字典的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将键=值对转换回Python字典 [英] Converting key=value pairs back into Python dicts

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将键=值对转换回Python字典 [英] Converting key=value pairs back into Python dicts

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭