如何读大json? [英] How to read big json?

查看:152
本文介绍了如何读大json?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到带有要在R中分析的数据的json文件,我使用RJSONIO软件包:

I receive json-files with data to be analyzed in R, for which I use the RJSONIO-package:

library(RJSONIO)
filename <- "Indata.json"
jFile <- fromJSON(filename)

当json文件大于约300MB(未压缩)时,我的计算机开始使用交换内存,并继续解析(fromJSON)几个小时。一个200MB的文件只需要一分钟的时间来解析。

When the json-files are larger than about 300MB (uncompressed), my computer starts to use the swap memory and continues the parsing (fromJSON) for hours. A 200MB-file takes only about one minute to parse.

我在Ubuntu 64bit上使用R 2.14(64位),内存为16GB,所以我很惊讶需要交换已经在约300MB的json。

I use R 2.14 (64bit) on Ubuntu 64bit with 16GB RAM, so I'm surprised that swapping is needed already at about 300MB of json.

我可以做什么来读大jsons?在内存设置有什么东西搞砸了吗?我已经重新启动R,只运行上面的三行。 json文件包含2-3个具有短字符串的列,以及10-20个具有从0到1000000的数字的列。

What can I do to read big jsons? Is there something in the memory-settings that mess things up? I have restarted R and run only the three lines above. The json-file contain 2-3 columns with short strings, and 10-20 columns with numbers from 0 to 1000000. I.e. it is the number of rows that makes the large size (more than a million rows in the parsed data).

推荐答案

虽然您的问题没有指定此详细信息,但您可能需要确保在内存中加载整个JSON你要什么。看起来像RJSONIO是一个基于DOM的API。

Although your question doesn't specify this detail, you may want to make sure that loading the entire JSON in memory is actually what you want. It looks like RJSONIO is a DOM-based API.

你需要做什么计算?你能使用流解析器吗? JSON的类SAX流解析器的示例是 yajl

What computation do you need to do? Can you use a streaming parser? An example of a SAX-like streaming parser for JSON is yajl.

这篇关于如何读大json?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆