从字符串中提取字典 [英] Extract dict from string
问题描述
我正在调用一个函数,该函数返回包含字典的字符串.在记住第一行和最后一行可以包含"{"和}"的情况下,如何提取此字典.
I'm calling a function that returns a string that contains a dict. How can I extract this dict keeping in mind that the first and last lines could contain '{' and '}'.
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example
我需要将此值提取为dict变量.
I need to extract this value as a dict variable.
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
推荐答案
更新后的答案
根据@martineau和@ekhumoro的评论,以下已编辑的代码包含一个功能,该功能可搜索字符串并提取所有有效的 dict
.这是我以前回答的一种更健壮的方法,因为现实世界中的 dict
的内容可能会有所不同,而这种逻辑(希望)可以解决这个问题.
Updated Answer
Taking on board comments from @martineau and @ekhumoro, the following edited code contains a function which searches through the string and extracts valid all dict
s. This is a more robust approach to my previous answer as the contents of the real-world dict
may vary, and this logic (hopes to) account for that.
import json
import re
def extract_dict(s) -> list:
"""Extract all valid dicts from a string.
Args:
s (str): A string possibly containing dicts.
Returns:
A list containing all valid dicts.
"""
results = []
s_ = ' '.join(s.split('\n')).strip()
exp = re.compile(r'(\{.*?\})')
for i in exp.findall(s_):
try:
results.append(json.loads(i))
except json.JSONDecodeError:
pass
return results
测试字符串:
OP的原始字符串已更新,以添加多个 dict
,一个数字值作为最后一个字段以及一个 list
值.
Test String:
The OP's original string has been updated to add multiple dict
s, a numeric value as a last field, and a list
value.
s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": 5
}
{"website": "stackoverflow",
"type": "question",
"date": "2020-09-11"
}
{"website": "stackoverflow",
"type": "question",
"dates": ["2020-09-11", "2020-09-12"]
}
This is a {testing string} example
This {is} a testing {string} example
"""
输出:
如OP所述,字符串中通常只有一个 dict
,因此(显然)可以使用 results [0]
进行访问.
Output:
As the OP states, there is generally only one dict
in the string, so this would (obviously) be accessed using results[0]
.
>>> results = extract_dict(s)
[{'website': 'stackoverflow', 'type': 'question', 'date': 5},
{'website': 'stackoverflow', 'type': 'question', 'date': '2020-09-11'},
{'website': 'stackoverflow', 'type': 'question', 'dates': ['2020-09-11', '2020-09-12']}]
原始答案:
忽略此部分.尽管该代码可以工作,但它特别适合OP的要求,并且对于其他用途不可靠.
Original Answer:
Ignore this section. Although the code works, it fits the OP's request specifically and is not robust for other uses.
此示例使用正则表达式来标识dict开头 {"
和dict结束}}
并提取中间值,然后将字符串转换为正确的字典
.随着新行的出现和正则表达式的复杂化,我只是拉平了字符串的开头.
This sample uses regex to identify the dict start {"
and dict end "}
and extracting the middle, then converting the string to a proper dict
. As new lines get in the way and complicate the regex, I've just flattened the string to start.
根据@jizhihaoSAMA的评论,我已经更新为使用 json.loads
将字符串转换为 dict
,因为它更干净.如果您不想进行其他导入,则 eval
也可以使用,但不建议这样做.
Per a comment from @jizhihaoSAMA, I've updated to use json.loads
to convert the string to a dict
, as it's cleaner. If you don't want the additional import, eval
will work as well, but not recommended.
import json
import re
s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example
"""
s_ = ' '.join(s.split('\n')).strip()
d = json.loads(re.findall(r'(\{\".*\"\s?\})', s_)[0])
>>> d
>>> d['website']
输出:
{"website": "stackoverflow", "type": "question", "date": "10-09-2020"}
'stackoverflow'
这篇关于从字符串中提取字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!