GoLang:在goroutine上解压缩bz2,在其他goroutine中消耗 [英] GoLang: Decompress bz2 in on goroutine, consume in other goroutine

查看:173
本文介绍了GoLang:在goroutine上解压缩bz2,在其他goroutine中消耗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我为维基百科转储文件构建了一个解析器 - 基本上是一个巨大的bzip2压缩的XML文件(〜50GB未压缩)。



我想做流式解压缩和解析,这听起来很简单。对于解压缩,我确实:

inputFilePath:= flag.Arg(0)
inputReader:= bzip2.NewReader(inputFile)



然后将读者传给XML解析器:

<$ c $然而,由于解压缩和解析都是昂贵的操作,所以我想让它们具有这些功能。解析:解析:= xml.NewDecoder(inputFile) $ b

在单独的Go例程上运行以利用其他核心。我会怎么去在Go上做这件事?



我能想到的唯一事情就是将文件包装成chan []字节,并实现io.Reader接口,但我认为可能有一种内置的方式(和更干净的方式)。



有没有人曾经这样做过?



谢谢!
Manuel

解决方案

您可以使用 io.Pipe ,然后使用 io.Copy < a>将解压缩的数据推入管道,然后在另一个goroutine中读取它:

  package main 

导入(
字节
编码/ json
fmt
io
同步


func main(){

rawJson:= [] byte(`{
Foo:{
Bar:Baz
}
$``

bzip2Reader:= bytes.NewReader(rawJson)//这代表bzip2.NewReader

var wg sync.WaitGroup
wg.Add(2)

r,w:= io.Pipe()

去func(){
//将所有内容写入管道。减压发生在这个goroutine中。
io.Copy(w,bzip2Reader)
w.Close()
wg.Done()
}()

解码器:= json。 NewDecoder(r)

func(){
for {
t,err:= decoder.Token()
if err!= nil {
()
}
fmt.Println(t)
}
wg.Done()
}()

wg.Wait()
}

http://play.golang.org/p/fXLnfnaWYA


I am a new-grad SWE learning Go (and loving it).

I am building a parser for Wikipedia dump files - basically a huge bzip2-compressed XML file (~50GB uncompressed).

I want to do both streaming decompression and parsing, which sounds simple enough. For decompression, I do:

inputFilePath := flag.Arg(0) inputReader := bzip2.NewReader(inputFile)

And then pass the reader to the XML parser:

decoder := xml.NewDecoder(inputFile)

However, since both decompressing and parsing are expensive operations, I would like to have them run on separate Go routines to make use of additional cores. How would I go about doing this in Go?

The only thing I can think of is wrapping the file in a chan []byte, and implementing the io.Reader interface, but I presume there might be a built way (and cleaner) way of doing it.

Has anyone ever done something like this?

Thanks! Manuel

解决方案

You can use io.Pipe, then use io.Copy to push the decompressed data into the pipe, and read it in another goroutine:

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "sync"
)

func main() {

    rawJson := []byte(`{
            "Foo": {
                "Bar": "Baz"
            }
        }`)

    bzip2Reader := bytes.NewReader(rawJson) // this stands in for the bzip2.NewReader

    var wg sync.WaitGroup
    wg.Add(2)

    r, w := io.Pipe()

    go func() {
        // write everything into the pipe. Decompression happens in this goroutine.
        io.Copy(w, bzip2Reader)
        w.Close()
        wg.Done()
    }()

    decoder := json.NewDecoder(r)

    go func() {
        for {
            t, err := decoder.Token()
            if err != nil {
                break
            }
            fmt.Println(t)
        }
        wg.Done()
    }()

    wg.Wait()
}

http://play.golang.org/p/fXLnfnaWYA

这篇关于GoLang:在goroutine上解压缩bz2,在其他goroutine中消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆