Golang从管道读取大量数据 [英] Golang read from pipe reads tons of data

查看:219
本文介绍了Golang从管道读取大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图读取正在被tarred,流式传输到标准输入的档案,但我不知何故在管道中读取了远远超过tar的数据。



我运行这样的命令:

  tar -cf  -  somefolder | ./my-go-binary 

源代码如下所示:

 包主要

导入(
bufio
io
log
os


//从标准输入读取
func main(){
reader:= bufio.NewReader(os.Stdin)
//从标准输入读取所有数据,将后续读取处理为块。
部分:= 0
对于{
部分++
data:= make([] byte,4 <20)//每次读取4MB
_ ,err:= reader.Read(data)
if err == io.EOF {
break
} else if err!= nil {
log.Fatalf(Problems reading来自输入:%s,err)
}
}
log.Printf(处理的零件总数:%d\\\
,零件)
}

对于100MB的tar文件夹,我得到了1468块4MB(即6.15GB)!此外, data [] byte 数组的大小似乎并不重要:如果我将块大小设置为40MB,我仍然可以获得大约1400块大小为40MB的数据,这是毫无意义的。



有什么我需要做的来读取 os.Stdin 正确使用Go?

解决方案

您的代码效率低下。它每次通过循环分配和初始化 data

  for { 
data:= make([] byte,4<< 20)//每次读取4MB
}

您的读者作为 io.Reader 的代码是错误的。例如,您忽略由 _,err:= reader.Read(data)读取的字节数,并且您不处理 err 错误。


包io

 导入io

类型阅读器

 类型Reader接口{
读(p []字节)(n int,err错误)
}

Reader是包装基本Read方法的接口。

读取读取len (p)字节转换为p。它返回字节数
read(0 <= n <= len(p))和遇到的任何错误。即使Read
返回n < len(p),它可以在
调用期间将所有p用作临时空间。如果有些数据可用但不是len(p)字节,则读取
通常会返回可用的数据,而不是等待更多。



当Read遇到错误或在
成功读取n> 0字节后,文件结束条件返回读取的字节数。
它可能会从相同的调用中返回(非零)错误,或者返回后续调用中的
错误(和n == 0)。这个通用的
情况的一个实例是,在输入流的最后
处返回非零字节数的Reader可以返回err == EOF或err == nil。
next Read应该返回0,EOF不管。



考虑到错误err,调用者应该总是处理在
之前返回的n> 0字节。这样做可以正确处理在读取一些字节以及允许的EOF
行为后发生
的I / O错误。



Read的实现是当len(p)== 0时,除了返回一个零字节
计数和一个零错误外,不鼓励其他人使用它。调用者应该将
a返回0和nil作为表示什么都没有发生; in
特别表示不表示EOF。



实现不得保留p。


这是一个符合 io.Reader 接口的模型文件读取程序:



<$ p $ (
bu)
io
log
os


func main(){
nBytes,nChunks:= int64(0),int64(0)
r:= bufio.NewReader(os.Stdin)
$ bn,err:= r.Read(buf [:cap(buf)])
buf = = make([] byte,0,4 * 1024) buf [:n]
如果n == 0 {
if err == nil {
continue
}
if err == io.EOF {
break

log.Fatal(err)

nChunks ++
nBytes + = int64(len(buf))
// process buf
if err!= nil&& err!= io.EOF {
log.Fatal(err)
}
}
log.Println(Bytes:,nBytes,Chunks:,nChunks)

输出:

 
2014/11/29 10:00:05字节:5589891块:1365


I'm trying to read an archive that's being tarred, streaming, to stdin, but I'm somehow reading far more data in the pipe than tar is sending.

I run my command like this:

tar -cf - somefolder | ./my-go-binary

The source code is like this:

package main

import (
    "bufio"
    "io"
    "log"
    "os"
)

// Read from standard input
func main() {
    reader := bufio.NewReader(os.Stdin)
    // Read all data from stdin, processing subsequent reads as chunks.
    parts := 0
    for {
        parts++
        data := make([]byte, 4<<20) // Read 4MB at a time
        _, err := reader.Read(data)
        if err == io.EOF {
            break
        } else if err != nil {
            log.Fatalf("Problems reading from input: %s", err)
        }
    }
    log.Printf("Total parts processed: %d\n", parts)
}

For a 100MB tarred folder, I'm getting 1468 chunks of 4MB (that's 6.15GB)! Further, it doesn't seem to matter how large the data []byte array is: if I set the chunk size to 40MB, I still get ~1400 chunks of 40MB data, which makes no sense at all.

Is there something I need to do to read data from os.Stdin properly with Go?

解决方案

Your code is inefficient. It's allocating and initializing data each time through the loop.

for {
    data := make([]byte, 4<<20) // Read 4MB at a time
}

The code for your reader as an io.Reader is wrong. For example, you ignore the number of bytes read by _, err := reader.Read(data) and you don't handle err errors properly.

Package io

import "io" 

type Reader

type Reader interface {
        Read(p []byte) (n int, err error)
}

Reader is the interface that wraps the basic Read method.

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered. Even if Read returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.

When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call. An instance of this general case is that a Reader returning a non-zero number of bytes at the end of the input stream may return either err == EOF or err == nil. The next Read should return 0, EOF regardless.

Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.

Implementations of Read are discouraged from returning a zero byte count with a nil error, except when len(p) == 0. Callers should treat a return of 0 and nil as indicating that nothing happened; in particular it does not indicate EOF.

Implementations must not retain p.

Here's a model file read program that conforms to the io.Reader interface:

package main

import (
    "bufio"
    "io"
    "log"
    "os"
)

func main() {
    nBytes, nChunks := int64(0), int64(0)
    r := bufio.NewReader(os.Stdin)
    buf := make([]byte, 0, 4*1024)
    for {
        n, err := r.Read(buf[:cap(buf)])
        buf = buf[:n]
        if n == 0 {
            if err == nil {
                continue
            }
            if err == io.EOF {
                break
            }
            log.Fatal(err)
        }
        nChunks++
        nBytes += int64(len(buf))
        // process buf
        if err != nil && err != io.EOF {
            log.Fatal(err)
        }
    }
    log.Println("Bytes:", nBytes, "Chunks:", nChunks)
}

Output:

2014/11/29 10:00:05 Bytes: 5589891 Chunks: 1365

这篇关于Golang从管道读取大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆