GoLang:在goroutine上解压缩bz2,在其他goroutine中消耗 [英] GoLang: Decompress bz2 in on goroutine, consume in other goroutine
问题描述
我为维基百科转储文件构建了一个解析器 - 基本上是一个巨大的bzip2压缩的XML文件(〜50GB未压缩)。
我想做流式解压缩和解析,这听起来很简单。对于解压缩,我确实:
inputFilePath:= flag.Arg(0)
inputReader:= bzip2.NewReader(inputFile)
然后将读者传给XML解析器:
<$ c $然而,由于解压缩和解析都是昂贵的操作,所以我想让它们具有这些功能。解析:解析:= xml.NewDecoder(inputFile) $ b
在单独的Go例程上运行以利用其他核心。我会怎么去在Go上做这件事?
我能想到的唯一事情就是将文件包装成chan []字节,并实现io.Reader接口,但我认为可能有一种内置的方式(和更干净的方式)。
有没有人曾经这样做过?
谢谢!
Manuel
您可以使用 io.Pipe ,然后使用 io.Copy < a>将解压缩的数据推入管道,然后在另一个goroutine中读取它:
package main
导入(
字节
编码/ json
fmt
io
同步
)
func main(){
rawJson:= [] byte(`{
Foo:{
Bar:Baz
}
$``
bzip2Reader:= bytes.NewReader(rawJson)//这代表bzip2.NewReader
var wg sync.WaitGroup
wg.Add(2)
r,w:= io.Pipe()
去func(){
//将所有内容写入管道。减压发生在这个goroutine中。
io.Copy(w,bzip2Reader)
w.Close()
wg.Done()
}()
解码器:= json。 NewDecoder(r)
func(){
for {
t,err:= decoder.Token()
if err!= nil {
()
}
fmt.Println(t)
}
wg.Done()
}()
wg.Wait()
}
http://play.golang.org/p/fXLnfnaWYA
I am a new-grad SWE learning Go (and loving it).
I am building a parser for Wikipedia dump files - basically a huge bzip2-compressed XML file (~50GB uncompressed).
I want to do both streaming decompression and parsing, which sounds simple enough. For decompression, I do:
inputFilePath := flag.Arg(0)
inputReader := bzip2.NewReader(inputFile)
And then pass the reader to the XML parser:
decoder := xml.NewDecoder(inputFile)
However, since both decompressing and parsing are expensive operations, I would like to have them run on separate Go routines to make use of additional cores. How would I go about doing this in Go?
The only thing I can think of is wrapping the file in a chan []byte, and implementing the io.Reader interface, but I presume there might be a built way (and cleaner) way of doing it.
Has anyone ever done something like this?
Thanks! Manuel
You can use io.Pipe, then use io.Copy to push the decompressed data into the pipe, and read it in another goroutine:
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"sync"
)
func main() {
rawJson := []byte(`{
"Foo": {
"Bar": "Baz"
}
}`)
bzip2Reader := bytes.NewReader(rawJson) // this stands in for the bzip2.NewReader
var wg sync.WaitGroup
wg.Add(2)
r, w := io.Pipe()
go func() {
// write everything into the pipe. Decompression happens in this goroutine.
io.Copy(w, bzip2Reader)
w.Close()
wg.Done()
}()
decoder := json.NewDecoder(r)
go func() {
for {
t, err := decoder.Token()
if err != nil {
break
}
fmt.Println(t)
}
wg.Done()
}()
wg.Wait()
}
http://play.golang.org/p/fXLnfnaWYA
这篇关于GoLang:在goroutine上解压缩bz2,在其他goroutine中消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!