解析一行中的多个json对象 [英] Parse multiple json objects that are in one line

查看:177
本文介绍了解析一行中的多个json对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析包含json对象的文件.问题在于某些文件在一行中有多个对象.例如:

I'm parsing files that containt json objects. The problem is that some files have multiple objects in one line. e.g.:

{"data1": {"data1_inside": "bla{bl\"a"}}{"data1": {"data1_inside": "blabla["}}{"data1": {"data1_inside": "bla{bla"}}{"data1": {"data1_inside": "bla["}}

我已经创建了一个函数,该函数尝试在没有左括号的情况下解析子字符串,但是值中可能包含大括号.我尝试过通过检查引号的开头和结尾来跳过值,但是也有带有转义引号的值.有关如何处理此问题的任何想法?

I've made a function that tries parsing a substring when there are no open brackets left, but there may be curly brackets in values. I've tried skipping values with checking the start and end of quotes, but there are also values with escaped quotes. Any ideas on how to deal with this?

我的尝试:

def get_lines(data):
    lines = []
    open_brackets = 0
    start = 0
    is_comment = False
    for index, c in enumerate(data):
        if c == '"':
            is_comment = not is_comment
        elif not is_comment:
            if c == '{':
                if not open_brackets:
                    start = index
                open_brackets += 1

            if c == '}':
                open_brackets -= 1
                if not open_brackets:
                    lines.append(data[start: index+1])

    return lines

推荐答案

简单但功能较弱的版本:

Simple but less robust version:

>>> import re
>>> s = r'{"data1": {"data1_inside": "bla{bl\"a"}}{"data1": {"data1_inside": "blabla["}}{"data1": {"data1_inside": "bla{bla"}}{"data1": {"data1_inside": "bla["}}'
>>> r = re.split('(\{.*?\})(?= *\{)', s)
['', '{"data1": {"data1_inside": "bla{bl\\"a"}}', '', '{"data1": {"data1_inside": "blabla["}}', '', '{"data1": {"data1_inside": "bla{bla"}}', '{"data1": {"data1_inside": "bla["}}']

如果字符串中包含}{,则此操作将失败

This will fail if }{ is contained in a string

根据其他建议,您可以尝试解析每个元素.如果无效,则应将其与下一个元素一起检查.

As other suggested, you could then try to parse each element. If it's not valid, then we should check this element together with the next one.

请注意,r是上面代码的结果

Note that r is the result of the code above

accumulator = ''
res = []
for subs in r:
    accumulator += subs
    try:
        res.append(json.loads(accumulator))
        accumulator = ''
    except:
        pass

这篇关于解析一行中的多个json对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆