结构体到磁盘的高效 Go 序列化 [英] Efficient Go serialization of struct to disk
问题描述
我的任务是将 C++ 代码替换为 Go,而我对 Go API 还是很陌生.我正在使用 gob 将数百个键/值条目编码到磁盘页面,但是 gob 编码有太多不需要的膨胀.
I've been tasked to replace C++ code to Go and I'm quite new to the Go APIs. I am using gob for encoding hundreds of key/value entries to disk pages but the gob encoding has too much bloat that's not needed.
package main
import (
"bytes"
"encoding/gob"
"fmt"
)
type Entry struct {
Key string
Val string
}
func main() {
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
e := Entry { "k1", "v1" }
enc.Encode(e)
fmt.Println(buf.Bytes())
}
这会产生很多我不需要的膨胀:
This produces a lot of bloat that I don't need:
[35 255 129 3 1 1 5 69 110 116 114 121 1 255 130 0 1 2 1 3 75 101 121 1 12 0 1 3 86 97 108 1 12 0 0 0 11 255 130 1 2 107 49 1 2 118 49 0]
我想序列化每个字符串的 len 后跟原始字节,例如:
I want to serialize each string's len followed by the raw bytes like:
[0 0 0 2 107 49 0 0 0 2 118 49]
我正在保存数百万个条目,因此编码中的额外膨胀使文件大小增加了大约 x10.
I am saving millions of entries so the additional bloat in the encoding increases the file size by roughly x10.
如何在不手动编码的情况下将其序列化为后者?
How can I serialize it to the latter without manual coding?
推荐答案
使用 protobuf 有效地编码您的数据.
Use protobuf to efficiently encode your data.
https://github.com/golang/protobuf
您的主要内容如下所示:
Your main would look like this:
package main
import (
"fmt"
"log"
"github.com/golang/protobuf/proto"
)
func main() {
e := &Entry{
Key: proto.String("k1"),
Val: proto.String("v1"),
}
data, err := proto.Marshal(e)
if err != nil {
log.Fatal("marshaling error: ", err)
}
fmt.Println(data)
}
你创建一个文件,example.proto,如下所示:
You create a file, example.proto like this:
package main;
message Entry {
required string Key = 1;
required string Val = 2;
}
您通过运行从 proto 文件生成 go 代码:
You generate the go code from the proto file by running:
$ protoc --go_out=. *.proto
如果您愿意,您可以检查生成的文件.
You can examine the generated file, if you wish.
可以运行并查看结果输出:
You can run and see the results output:
$ go run *.go
[10 2 107 49 18 2 118 49]
这篇关于结构体到磁盘的高效 Go 序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!