python:如何使用ijson库解析json数组流 [英] python: how do I parse a stream of json arrays with ijson library
问题描述
传入数据类似于以下内容:
The incoming data resembles the following:
[{
"foo": "bar"
}]
[{
"bar": "baz"
}]
[{
"baz": "foo"
}]
如您所见
,对象阵列串在一起. JSON-ish
as you see, arrays of objects strung together. JSON-ish
ijson能够处理第一个数组,然后我得到:
ijson is able to handle the first array, and then I get:
ijson.common.JSONError: Additional data
当击中后续数组时.我该如何解决?
when it hits the subsequent arrays. How do I get around this?
推荐答案
这是第一个解决问题的方法,该方法至少具有有效的正则表达式替换功能,可以将完整的字符串转换为有效的json.仅当您在解析为json之前可以阅读完整的输入流时,此方法才有效.
Here's a first cut at the problem that at least has a working regex substitution to turn a full string into valid json. It only works if you're ok with reading the full input stream before parsing as json.
import re
input = ''
for line in inputStream:
input = input + line
# input == '[{"foo": "bar"}][{"bar": "baz"}][{"baz": "foo"}]'
# wrap in [] and put commas between each ][
sanitizedInput = re.sub(r"\]\[", "],[", "[%s]" % input)
# sanitizedInput == '[[{"foo": "bar"}],[{"bar": "baz"}],[{"baz": "foo"}]]'
# then parse sanitizedInput
parsed = json.loads(sanitizedInput)
print parsed #=> [[{u'foo': u'bar'}], [{u'bar': u'baz'}], [{u'baz': u'foo'}]]
注意:由于您将整个内容读取为字符串,因此可以使用json
代替ijson
Note: since you're read the whole thing as a string, you can use json
instead of ijson
这篇关于python:如何使用ijson库解析json数组流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!