去JSON解码是非常缓慢的。什么是更好的方法来做到这一点? [英] Go JSON decoding is very slow. What would be a better way to do it?

查看:1230
本文介绍了去JSON解码是非常缓慢的。什么是更好的方法来做到这一点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Go,Revel WAF和Redis。

我必须在Redis中存储大量的json数据(大概20MB)。

json.Unmarshal()大约需要5秒。



我试过JsonLib,encode / json,ffjson,megajson,但它们都不够快。

b
$ b

我想过使用groupcache,但Json是实时更新的。



这是示例代码:

 包主

导入(
github.com/garyburd/redigo/redis
jsongithub.com/pquerna/ffjson/ffjson


func main(){
c,err:= redis.Dial(tcp,:6379 )
defer c.Close()
pointTable,err:= redis.String(c.Do(GET,data))
var hashPoint map [string] [] float64
json.Unmarshal([] byte(pointTable),& hashPoint)//问题!!!


解决方案

比应该慢。找出原因并向Go作者提交补丁是值得的。

同时,如果您可以避免使用JSON并使用二进制格式,那么您将不会只能避免这个问题;您还将获得时间,您的代码现在花费将ASCII十进制数字表示法解析为它们的二进制IEEE 754等效值(并且可能在此过程中引入舍入误差。)

如果您的发件人和收件人都使用Go编写,我建议您使用Go的二进制格式: gob

执行一个快速测试,生成一个包含2000个条目的映射,每个条目包含1050个简单的浮点数,为我提供20 MB的JSON, 1.16秒来解析我的机器。



对于这些快速基准测试,我尽量使用三次运行,但我确保只测量实际的解析时间, code> t0:= time.Now()在Unmarshal调用和打印之前 time.Now()。Sub(t0) after它使用GOB,相同的地图产生18 MB的数据,这需要115 ms的时间来解析:

使用GOB的十分之一时间



您的资源取决于您在那里有多少实际花车,余量会有所不同。如果你的浮点数有很多有效数字,值得他们的float64表示,那么20MB的JSON将包含比我的200万浮点数少得多的数字。在这种情况下,JSON和GOB的区别将更加明显。



顺便说一句,这证明问题的确存在于JSON解析器中,而不是数据量解析,也不在内存结构中创建(因为两个测试都解析〜20 MB的数据并重新创建相同的浮点片)。用JSON中的字符串替换所有浮点数给我1.02秒的解析时间,确认从字符串表示到二进制浮点数的转换确实需要一定的时间(与仅移动字节相比),但不是主要的元凶。



如果发送者和解析器不是Go,或者如果你想要比GOB更深入的表现,你应该使用你自己定制的二进制格式,或者使用Protocol Buffers或者手动使用encoding / binary和朋友。


I am using Go, Revel WAF and Redis.

I have to store large json data in Redis (maybe 20MB).

json.Unmarshal() takes about roughly 5 seconds. What would be a better way to do it?

I tried JsonLib, encode/json, ffjson, megajson, but none of them were fast enough.

I thought about using groupcache, but Json is updated in real time.

This is the sample code:

package main

import (
 "github.com/garyburd/redigo/redis"
  json "github.com/pquerna/ffjson/ffjson"
)

func main() {
  c, err := redis.Dial("tcp", ":6379")
  defer c.Close()
  pointTable, err := redis.String(c.Do("GET", "data"))
  var hashPoint map[string][]float64
  json.Unmarshal([]byte(pointTable), &hashPoint) //Problem!!!
}

解决方案

Parsing large JSON data does seem to be slower than it should be. It would be worthwhile to pinpoint the cause and submit a patch to the Go authors.

In the meantime, if you can avoid JSON and use a binary format, you will not only avoid this issue; you will also gain the time your code is now spending parsing ASCII decimal representations of numbers into their binary IEEE 754 equivalents (and possibly introducing rounding errors while doing so.)

If both your sender and receiver are written in Go, I suggest using Go's binary format: gob.

Doing a quick test, generating a map with 2000 entries, each a slice with 1050 simple floats, gives me 20 MB of JSON, which takes 1.16 sec to parse on my machine.

For these quick benchmarks, I take the best of three runs, but I make sure to only measure the actual parsing time, with t0 := time.Now() before the Unmarshal call and printing time.Now().Sub(t0) after it.

Using GOB, the same map results in 18 MB of data, which takes 115 ms to parse:
one tenth the time.

Your results will vary depending on how many actual floats you have there. If yours floats have a lot of significant digits, deserving their float64 representation, then 20 MB of JSON will contain much fewer than my two million floats. In that case the difference between JSON and GOB will be ever starker.

BTW, this proves that the problem lies indeed in the JSON parser, not in the amount of data to parse, nor in the memory structures to create (because both tests are parsing ~ 20 MB of data and recreating the same slices of floats.) Replacing all the floats with strings in the JSON gives me a parsing time of 1.02 sec, confirming that the conversion from string representation to binary floats does takes a certain time (compared to just moving bytes around) but is not the main culprit.

If the sender and the parser are not both Go, or if you want to squeeze the performance even further than GOB, you should use your own customised binary format, either using Protocol Buffers or manually with "encoding/binary" and friends.

这篇关于去JSON解码是非常缓慢的。什么是更好的方法来做到这一点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆