使用Python发出解析多行JSON文件的问题 [英] Issue parsing multiline JSON file using Python
问题描述
我正在尝试使用Python 2.7中的json
库解析JSON多行文件.下面给出了一个简化的示例文件:
I am trying to parse a JSON multiline file using json
library in Python 2.7. A simplified sample file is given below:
{
"observations": {
"notice": [
{
"copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml",
"copyright_url": "http://www.bom.gov.au/other/copyright.shtml",
"disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml",
"feedback_url": "http://www.bom.gov.au/other/feedback"
}
]
}
}
我的代码如下:
import json
with open('test.json', 'r') as jsonFile:
for jf in jsonFile:
jf = jf.replace('\n', '')
jf = jf.strip()
weatherData = json.loads(jf)
print weatherData
尽管如此,我还是收到如下错误:
Nevertheless, I get an error as shown below:
Traceback (most recent call last):
File "test.py", line 8, in <module>
weatherData = json.loads(jf)
File "/home/usr/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 1 (char 0)
只是做一些测试,所以我修改了代码,以便在删除换行符并去除开头和结尾的空白之后,将内容写入另一个文件(具有json
扩展名).令人惊讶的是,当我读回后一个文件时,我没有收到任何错误并且解析成功.修改后的代码如下:
Just to do some testing, I modified the code such that after removing newlines and striping away the leading and trailing white spaces, I write the contents to another file (with the json
extension). Surprisingly, when I read back the latter file, I do not get any error and the parsing is successful. The modified code is as follows:
import json
filewrite = open('out.json', 'w+')
with open('test.json', 'r') as jsonFile:
for jf in jsonFile:
jf = jf.replace('\n', '')
jf = jf.strip()
filewrite.write(jf)
filewrite.close()
with open('out.json', 'r') as newJsonFile:
for line in newJsonFile:
weatherData = json.loads(line)
print weatherData
输出如下:
{u'observations': {u'notice': [{u'copyright_url': u'http://www.bom.gov.au/other/copyright.shtml', u'disclaimer_url': u'http://www.bom.gov.au/other/disclaimer.shtml', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml', u'feedback_url': u'http://www.bom.gov.au/other/feedback'}]}}
您知道在使用json
库之前删除新行和空白会发生什么情况吗?
Any idea what might be going on when new lines and white spaces are stripped before using json
library?
推荐答案
如果尝试逐行解析json文件,您会变得疯狂. json模块具有帮助程序方法以直接读取文件对象或字符串,即load
和loads
方法. load
接受包含json数据的文件的文件对象(如下所示),而loads
接受包含json数据的字符串.
You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load
and loads
methods. load
takes a file object (as shown below) for a file that contains json data, while loads
takes a string that contains json data.
选项1:-首选
import json
with open('test.json', 'r') as jf:
weatherData = json.load(jf)
print weatherData
选项2:
import json
with open('test.json', 'r') as jf:
weatherData = json.loads(jf.read())
print weatherData
如果您正在寻找性能更高的json解析,请查看 ujson
If you are looking for higher performance json parsing check out ujson
这篇关于使用Python发出解析多行JSON文件的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!