转换JSON数据时出现性能问题 [英] Performance problem transforming JSON data

查看:128
本文介绍了转换JSON数据时出现性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些想要进行可视化的JSON格式的数据.数据(大约10MB的JSON)加载速度非常快,但是将其重塑成可用的格式需要花费几分钟,而行数不到100,000.我有一些可行的方法,但我认为可以做得更好.

I've got some data in JSON format that I want to do some visualization on. The data (approximately 10MB of JSON) loads pretty fast, but reshaping it into a usable form takes a couple of minutes for just under 100,000 rows. I have something that works, but I think it can be done much better.

从我的样本数据开始可能最容易理解

假设您在/tmp中运行以下命令:

Assuming you run the following command in /tmp:

curl http://public.west.spy.net/so/time-series.json.gz \
    | gzip -dc - > time-series.json

您应该能够在一段时间后看到我想要的输出:

You should be able to see my desired output (after a while) here:

require(rjson)

trades <- fromJSON(file="/tmp/time-series.json")$rows


data <- do.call(rbind,
                lapply(trades,
                       function(row)
                           data.frame(date=strptime(unlist(row$key)[2], "%FT%X"),
                                      price=unlist(row$value)[1],
                                      volume=unlist(row$value)[2])))

someColors <- colorRampPalette(c("#000099", "blue", "orange", "red"),
                               space="Lab")
smoothScatter(data, colramp=someColors, xaxt="n")

days <- seq(min(data$date), max(data$date), by = 'month')
smoothScatter(data, colramp=someColors, xaxt="n")
axis(1, at=days,
     labels=strftime(days, "%F"),
     tick=FALSE)

推荐答案

使用plyr可以使速度提高40倍.这是代码和基准测试的比较.有了数据框后,就可以完成日期转换,因此我从代码中删除了它,以方便进行苹果之间的比较.我确信存在一个更快的解决方案.

You can get a 40x speedup by using plyr. Here is the code and the benchmarking comparison. The conversion to date can be done once you have the data frame and hence I have removed it from the code to facilitate apples-to-apples comparison. I am sure a faster solution exists.

f_ramnath = function(n) plyr::ldply(trades[1:n], unlist)[,-c(1, 2)]
f_dustin  = function(n) do.call(rbind, lapply(trades[1:n], 
                function(row) data.frame(
                    date   = unlist(row$key)[2],
                    price  = unlist(row$value)[1],
                    volume = unlist(row$value)[2]))
                )
f_mrflick = function(n) as.data.frame(do.call(rbind, lapply(trades[1:n], 
               function(x){
                   list(date=x$key[2], price=x$value[1], volume=x$value[2])})))

f_mbq = function(n) data.frame(
          t(sapply(trades[1:n],'[[','key')),    
          t(sapply(trades[1:n],'[[','value')))

rbenchmark::benchmark(f_ramnath(100), f_dustin(100), f_mrflick(100), f_mbq(100),
    replications = 50)

test            elapsed   relative 
f_ramnath(100)  0.144       3.692308     
f_dustin(100)   6.244     160.102564     
f_mrflick(100)  0.039       1.000000    
f_mbq(100)      0.074       1.897436   

编辑. MrFlick的解决方案可将速度提高3.5倍.我已经更新了测试.

EDIT. MrFlick's solution leads to an additional 3.5x speedup. I have updated my tests.

这篇关于转换JSON数据时出现性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆