如何使用python查找文本中的JSON对象 [英] How to find JSON object in text with python

查看:112
本文介绍了如何使用python查找文本中的JSON对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python正则表达式从文本中解析JSON对象.我找到了这场比赛:

I'm trying to parse JSON object from text with python regex. I found this match:

'\{(?:[^{}]|(?R))*\}'

但是在python中,我收到此错误:

but in python I get this error:

re.error: unknown extension ?R at position 12

此regex101示例中查看正则表达式匹配.

推荐答案

您找到了一个正则表达式,该正则表达式使用Python标准库re模块不支持的语法.

You found a regex that uses syntax that Python standard library re module doesn't support.

当您查看regex101链接时,您会发现使用 PRCE时,该模式有效库,并且引发错误的有问题的(?R)语法使用了称为 递归 . 正则表达式引擎的子集仅支持该功能.

When you look at the regex101 link, you'll see that the pattern works when using the PRCE library, and the problematic (?R) syntax that throws the error uses a feature called recursion. That feature is only supported by a subset of regex engines.

您可以安装 regex,这是Python的另一种正则表达式引擎,该引擎明确支持该语法:

You could install the regex library, an alternative regex engine for Python that explicitly does support that syntax:

>>> import regex
>>> pattern = regex.compile(r'\{(?:[^{}]|(?R))*\}')
>>> pattern.findall('''\
... This is a funny text about stuff,
... look at this product {"action":"product","options":{...}}.
... More Text is to come and another JSON string
... {"action":"review","options":{...}}
... ''')
['{"action":"product","options":{...}}', '{"action":"review","options":{...}}']

另一种选择是尝试使用{开头的任何部分进行解码rel ="noreferrer"> JSONDecoder.raw_decode()方法;参见作为示例方法.尽管递归正则表达式可以找到类似JSON的 文本,但解码器方法将仅允许您提取有效 JSON文本.

Another option is to just try and decode any section that starts with { using the JSONDecoder.raw_decode() method; see How do I use the 'json' module to read in one JSON object at a time? for an example approach. While the recursive regex can find JSON-like text, the decoder approach would let you extract only valid JSON text.

这是一个生成器函数,可以执行以下操作:

Here is a generator function that does just that:

from json import JSONDecoder

def extract_json_objects(text, decoder=JSONDecoder()):
    """Find JSON objects in text, and yield the decoded JSON data

    Does not attempt to look for JSON arrays, text, or other JSON types outside
    of a parent JSON object.

    """
    pos = 0
    while True:
        match = text.find('{', pos)
        if match == -1:
            break
        try:
            result, index = decoder.raw_decode(text[match:])
            yield result
            pos = match + index
        except ValueError:
            pos = match + 1

演示:

>>> demo_text = """\
This is a funny text about stuff,
look at this product {"action":"product","options":{"foo": "bar"}}.
More Text is to come and another JSON string, neatly delimited by "{" and "}" characters:
{"action":"review","options":{"spam": ["ham", "vikings", "eggs", "spam"]}}
"""
>>> for result in extract_json_objects(demo_text):
...     print(result)
...
{'action': 'product', 'options': {'foo': 'bar'}}
{'action': 'review', 'options': {'spam': ['ham', 'vikings', 'eggs', 'spam']}}

这篇关于如何使用python查找文本中的JSON对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆