python:如何使用ijson库解析json数组流 [英] python: how do I parse a stream of json arrays with ijson library

查看:1459
本文介绍了python:如何使用ijson库解析json数组流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

传入数据类似于以下内容:

The incoming data resembles the following:

[{
    "foo": "bar"
}]
[{
    "bar": "baz"
}]
[{
    "baz": "foo"
}]

如您所见

,对象阵列串在一起. JSON-ish

as you see, arrays of objects strung together. JSON-ish

ijson能够处理第一个数组,然后我得到:

ijson is able to handle the first array, and then I get:

ijson.common.JSONError: Additional data

当击中后续数组时.我该如何解决?

when it hits the subsequent arrays. How do I get around this?

推荐答案

这是第一个解决问题的方法,该方法至少具有有效的正则表达式替换功能,可以将完整的字符串转换为有效的json.仅当您在解析为json之前可以阅读完整的输入流时,此方法才有效.

Here's a first cut at the problem that at least has a working regex substitution to turn a full string into valid json. It only works if you're ok with reading the full input stream before parsing as json.

import re

input = ''
for line in inputStream:
  input = input + line    
# input == '[{"foo": "bar"}][{"bar": "baz"}][{"baz": "foo"}]'

# wrap in [] and put commas between each ][
sanitizedInput = re.sub(r"\]\[", "],[", "[%s]" % input)
# sanitizedInput == '[[{"foo": "bar"}],[{"bar": "baz"}],[{"baz": "foo"}]]'

# then parse sanitizedInput
parsed = json.loads(sanitizedInput)
print parsed #=> [[{u'foo': u'bar'}], [{u'bar': u'baz'}], [{u'baz': u'foo'}]]

注意:由于您将整个内容读取为字符串,因此可以使用json代替ijson

Note: since you're read the whole thing as a string, you can use json instead of ijson

这篇关于python:如何使用ijson库解析json数组流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆