如何使用python查找文本中的JSON对象 [英] How to find JSON object in text with python
问题描述
我正在尝试使用python正则表达式从文本中解析JSON对象.我找到了这场比赛:
I'm trying to parse JSON object from text with python regex. I found this match:
'\{(?:[^{}]|(?R))*\}'
但是在python中,我收到此错误:
but in python I get this error:
re.error: unknown extension ?R at position 12
推荐答案
您找到了一个正则表达式,该正则表达式使用Python标准库re
模块不支持的语法.
You found a regex that uses syntax that Python standard library re
module doesn't support.
当您查看regex101链接时,您会发现使用 PRCE时,该模式有效库,并且引发错误的有问题的(?R)
语法使用了称为 递归 . 正则表达式引擎的子集仅支持该功能.
When you look at the regex101 link, you'll see that the pattern works when using the PRCE library, and the problematic (?R)
syntax that throws the error uses a feature called recursion. That feature is only supported by a subset of regex engines.
您可以安装 regex
库,这是Python的另一种正则表达式引擎,该引擎明确支持该语法:
You could install the regex
library, an alternative regex engine for Python that explicitly does support that syntax:
>>> import regex
>>> pattern = regex.compile(r'\{(?:[^{}]|(?R))*\}')
>>> pattern.findall('''\
... This is a funny text about stuff,
... look at this product {"action":"product","options":{...}}.
... More Text is to come and another JSON string
... {"action":"review","options":{...}}
... ''')
['{"action":"product","options":{...}}', '{"action":"review","options":{...}}']
另一种选择是尝试使用JSONDecoder.raw_decode()
方法;参见作为示例方法.尽管递归正则表达式可以找到类似JSON的 文本,但解码器方法将仅允许您提取有效 JSON文本.
Another option is to just try and decode any section that starts with {
using the JSONDecoder.raw_decode()
method; see How do I use the 'json' module to read in one JSON object at a time? for an example approach. While the recursive regex can find JSON-like text, the decoder approach would let you extract only valid JSON text.
这是一个生成器函数,可以执行以下操作:
Here is a generator function that does just that:
from json import JSONDecoder
def extract_json_objects(text, decoder=JSONDecoder()):
"""Find JSON objects in text, and yield the decoded JSON data
Does not attempt to look for JSON arrays, text, or other JSON types outside
of a parent JSON object.
"""
pos = 0
while True:
match = text.find('{', pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
演示:
>>> demo_text = """\
This is a funny text about stuff,
look at this product {"action":"product","options":{"foo": "bar"}}.
More Text is to come and another JSON string, neatly delimited by "{" and "}" characters:
{"action":"review","options":{"spam": ["ham", "vikings", "eggs", "spam"]}}
"""
>>> for result in extract_json_objects(demo_text):
... print(result)
...
{'action': 'product', 'options': {'foo': 'bar'}}
{'action': 'review', 'options': {'spam': ['ham', 'vikings', 'eggs', 'spam']}}
这篇关于如何使用python查找文本中的JSON对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!