转换JSON数据时出现性能问题 [英] Performance problem transforming JSON data
问题描述
我有一些想要进行可视化的JSON格式的数据.数据(大约10MB的JSON)加载速度非常快,但是将其重塑成可用的格式需要花费几分钟,而行数不到100,000.我有一些可行的方法,但我认为可以做得更好.
I've got some data in JSON format that I want to do some visualization on. The data (approximately 10MB of JSON) loads pretty fast, but reshaping it into a usable form takes a couple of minutes for just under 100,000 rows. I have something that works, but I think it can be done much better.
假设您在/tmp
中运行以下命令:
Assuming you run the following command in /tmp
:
curl http://public.west.spy.net/so/time-series.json.gz \
| gzip -dc - > time-series.json
您应该能够在一段时间后看到我想要的输出:
You should be able to see my desired output (after a while) here:
require(rjson)
trades <- fromJSON(file="/tmp/time-series.json")$rows
data <- do.call(rbind,
lapply(trades,
function(row)
data.frame(date=strptime(unlist(row$key)[2], "%FT%X"),
price=unlist(row$value)[1],
volume=unlist(row$value)[2])))
someColors <- colorRampPalette(c("#000099", "blue", "orange", "red"),
space="Lab")
smoothScatter(data, colramp=someColors, xaxt="n")
days <- seq(min(data$date), max(data$date), by = 'month')
smoothScatter(data, colramp=someColors, xaxt="n")
axis(1, at=days,
labels=strftime(days, "%F"),
tick=FALSE)
推荐答案
使用plyr
可以使速度提高40倍.这是代码和基准测试的比较.有了数据框后,就可以完成日期转换,因此我从代码中删除了它,以方便进行苹果之间的比较.我确信存在一个更快的解决方案.
You can get a 40x speedup by using plyr
. Here is the code and the benchmarking comparison. The conversion to date can be done once you have the data frame and hence I have removed it from the code to facilitate apples-to-apples comparison. I am sure a faster solution exists.
f_ramnath = function(n) plyr::ldply(trades[1:n], unlist)[,-c(1, 2)]
f_dustin = function(n) do.call(rbind, lapply(trades[1:n],
function(row) data.frame(
date = unlist(row$key)[2],
price = unlist(row$value)[1],
volume = unlist(row$value)[2]))
)
f_mrflick = function(n) as.data.frame(do.call(rbind, lapply(trades[1:n],
function(x){
list(date=x$key[2], price=x$value[1], volume=x$value[2])})))
f_mbq = function(n) data.frame(
t(sapply(trades[1:n],'[[','key')),
t(sapply(trades[1:n],'[[','value')))
rbenchmark::benchmark(f_ramnath(100), f_dustin(100), f_mrflick(100), f_mbq(100),
replications = 50)
test elapsed relative
f_ramnath(100) 0.144 3.692308
f_dustin(100) 6.244 160.102564
f_mrflick(100) 0.039 1.000000
f_mbq(100) 0.074 1.897436
编辑. MrFlick的解决方案可将速度提高3.5倍.我已经更新了测试.
EDIT. MrFlick's solution leads to an additional 3.5x speedup. I have updated my tests.
这篇关于转换JSON数据时出现性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!