从字符串中提取字典 [英] Extract dict from string

查看:55
本文介绍了从字符串中提取字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调用一个函数,该函数返回包含字典的字符串.在记住第一行和最后一行可以包含"{"和}"的情况下,如何提取此字典.

I'm calling a function that returns a string that contains a dict. How can I extract this dict keeping in mind that the first and last lines could contain '{' and '}'.

This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example

我需要将此值提取为dict变量.

I need to extract this value as a dict variable.

{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}

推荐答案

更新后的答案


根据@martineau和@ekhumoro的评论,以下已编辑的代码包含一个功能,该功能可搜索字符串并提取所有有效的 dict .这是我以前回答的一种更健壮的方法,因为现实世界中的 dict 的内容可能会有所不同,而这种逻辑(希望)可以解决这个问题.

Updated Answer


Taking on board comments from @martineau and @ekhumoro, the following edited code contains a function which searches through the string and extracts valid all dicts. This is a more robust approach to my previous answer as the contents of the real-world dict may vary, and this logic (hopes to) account for that.

import json
import re

def extract_dict(s) -> list:
    """Extract all valid dicts from a string.
    
    Args:
        s (str): A string possibly containing dicts.
    
    Returns:
        A list containing all valid dicts.
    
    """
    results = []
    s_ = ' '.join(s.split('\n')).strip()
    exp = re.compile(r'(\{.*?\})')
    for i in exp.findall(s_):
        try:
            results.append(json.loads(i))        
        except json.JSONDecodeError:
            pass    
    return results

测试字符串:

OP的原始字符串已更新,以添加多个 dict ,一个数字值作为最后一个字段以及一个 list 值.

Test String:

The OP's original string has been updated to add multiple dicts, a numeric value as a last field, and a list value.

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": 5
}
{"website": "stackoverflow",
"type": "question",
"date": "2020-09-11"
}
{"website": "stackoverflow",
"type": "question",
"dates": ["2020-09-11", "2020-09-12"]
}
This is a {testing string} example
This {is} a testing {string} example
"""

输出:

如OP所述,字符串中通常只有一个 dict ,因此(显然)可以使用 results [0] 进行访问.

Output:

As the OP states, there is generally only one dict in the string, so this would (obviously) be accessed using results[0].

>>> results = extract_dict(s)

[{'website': 'stackoverflow', 'type': 'question', 'date': 5},
 {'website': 'stackoverflow', 'type': 'question', 'date': '2020-09-11'},
 {'website': 'stackoverflow', 'type': 'question', 'dates': ['2020-09-11', '2020-09-12']}]

原始答案:


忽略此部分.尽管该代码可以工作,但它特别适合OP的要求,并且对于其他用途不可靠.

Original Answer:


Ignore this section. Although the code works, it fits the OP's request specifically and is not robust for other uses.

此示例使用正则表达式来标识dict开头 {" 和dict结束}} 并提取中间值,然后将字符串转换为正确的字典.随着新行的出现和正则表达式的复杂化,我只是拉平了字符串的开头.

This sample uses regex to identify the dict start {" and dict end "} and extracting the middle, then converting the string to a proper dict. As new lines get in the way and complicate the regex, I've just flattened the string to start.

根据@jizhihaoSAMA的评论,我已经更新为使用 json.loads 将字符串转换为 dict ,因为它更干净.如果您不想进行其他导入,则 eval 也可以使用,但不建议这样做.

Per a comment from @jizhihaoSAMA, I've updated to use json.loads to convert the string to a dict, as it's cleaner. If you don't want the additional import, eval will work as well, but not recommended.

import json
import re

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example
"""

s_ = ' '.join(s.split('\n')).strip()
d = json.loads(re.findall(r'(\{\".*\"\s?\})', s_)[0])

>>> d
>>> d['website']

输出:

{"website": "stackoverflow", "type": "question", "date": "10-09-2020"}

'stackoverflow'

这篇关于从字符串中提取字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆