在IEnumerable< byte>上FileStream的Write vs WriteByte的性能 [英] Performance of FileStream's Write vs WriteByte on an IEnumerable<byte>

查看:102
本文介绍了在IEnumerable< byte>上FileStream的Write vs WriteByte的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将IEnumerable<byte>的字节写入文件.
我可以将其转换为数组并使用Write(byte[])方法:

using (var stream = File.Create(path))
    stream.Write(bytes.ToArray());

但是,由于IEnumerable不提供集合的项目计数,因此不建议使用ToArray 解决方案

枚举大量字节是一个过程,该过程会增加大量开销,而这些开销通常是很便宜的:将字节从一个缓冲区复制到下一个缓冲区.

通常,LINQ风格的开销并不重要,但是在普通硬盘上每秒处理1亿字节时,您会发现严重的开销.这不是不是过早的优化.我们可以预见,这将是一个性能热点,因此我们应该积极进行优化.

因此,在周围复制字节时,您可能根本不应该依赖像IEnumerableIList这样的抽象.传递也包含OffsetCount的数组或ArraySegement<byte>.这使您不必再频繁地切片数组.

高吞吐量IO也是致命的一件事,就是每个字节调用一个方法.就像按字节读取和按字节写入一样.这种 kill 性能是因为每秒必须调用这些方法数亿次.我自己也经历过.

始终一次处理至少4096字节的整个缓冲区.根据您正在使用IO的媒体的不同,可以使用更大的缓冲区(64k,256k甚至兆字节).

I need to write bytes of an IEnumerable<byte> to a file.
I can convert it to an array and use Write(byte[]) method:

using (var stream = File.Create(path))
    stream.Write(bytes.ToArray());

But since IEnumerable doesn't provide the collection's item count, using ToArray is not recommended unless it's absolutely necessary.

So I can just iterate the IEnumerable and use WriteByte(byte) in each iteration:

using (var stream = File.Create(path))
    foreach (var b in bytes)
        stream.WriteByte(b);

I wonder which one will be faster when writing lots of data.

I guess using Write(byte[]) sets the buffer according to the array size so it would be faster when it comes to arrays.

My question is when I just have an IEnumerable<byte> that has MBs of data, which approach is better? Converting it to an array and call Write(byte[]) or iterating it and call WriteByte(byte) for each?

解决方案

Enumerating over a large stream of bytes is a process that adds tons of overhead to something that is normally cheap: Copying bytes from one buffer to the next.

Normally, LINQ-style overhead does not matter much but when it comes to processing 100 million bytes per second on a normal hard drive you will notice severe overheads. This is not premature optimization. We can foresee that this will be a performance hotspot so we should eagerly optimize.

So when copying bytes around you probably should not rely on abstractions like IEnumerable and IList at all. Pass around arrays or ArraySegement<byte>'s which also contain Offset and Count. This frees you from slicing arrays too often.

One thing that is a death-sin with high-throughput IO, too, is calling a method per byte. Like reading bytewise and writing bytewise. This kills performance because these methods have to be called hundreds of millions of times per second. I have experienced that myself.

Always process entire buffers of at least 4096 bytes at a time. Depending on what media you are doing IO with you can use much larger buffers (64k, 256k or even megabytes).

这篇关于在IEnumerable&lt; byte&gt;上FileStream的Write vs WriteByte的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆