在IEnumerable< byte>上FileStream的Write vs WriteByte的性能 [英] Performance of FileStream's Write vs WriteByte on an IEnumerable<byte>
问题描述
我需要将IEnumerable<byte>
的字节写入文件.
我可以将其转换为数组并使用Write(byte[])
方法:
using (var stream = File.Create(path))
stream.Write(bytes.ToArray());
但是,由于IEnumerable
不提供集合的项目计数,因此不建议使用ToArray
解决方案
枚举大量字节是一个过程,该过程会增加大量开销,而这些开销通常是很便宜的:将字节从一个缓冲区复制到下一个缓冲区.>
通常,LINQ风格的开销并不重要,但是在普通硬盘上每秒处理1亿字节时,您会发现严重的开销.这不是不是过早的优化.我们可以预见,这将是一个性能热点,因此我们应该积极进行优化.
因此,在周围复制字节时,您可能根本不应该依赖像IEnumerable
和IList
这样的抽象.传递也包含Offset
和Count
的数组或ArraySegement<byte>
.这使您不必再频繁地切片数组.
高吞吐量IO也是致命的一件事,就是每个字节调用一个方法.就像按字节读取和按字节写入一样.这种 kill 性能是因为每秒必须调用这些方法数亿次.我自己也经历过.
始终一次处理至少4096字节的整个缓冲区.根据您正在使用IO的媒体的不同,可以使用更大的缓冲区(64k,256k甚至兆字节).
I need to write bytes of an IEnumerable<byte>
to a file.
I can convert it to an array and use Write(byte[])
method:
using (var stream = File.Create(path))
stream.Write(bytes.ToArray());
But since IEnumerable
doesn't provide the collection's item count, using ToArray
is not recommended unless it's absolutely necessary.
So I can just iterate the IEnumerable
and use WriteByte(byte)
in each iteration:
using (var stream = File.Create(path))
foreach (var b in bytes)
stream.WriteByte(b);
I wonder which one will be faster when writing lots of data.
I guess using Write(byte[])
sets the buffer according to the array size so it would be faster when it comes to arrays.
My question is when I just have an IEnumerable<byte>
that has MBs of data, which approach is better? Converting it to an array and call Write(byte[])
or iterating it and call WriteByte(byte)
for each?
Enumerating over a large stream of bytes is a process that adds tons of overhead to something that is normally cheap: Copying bytes from one buffer to the next.
Normally, LINQ-style overhead does not matter much but when it comes to processing 100 million bytes per second on a normal hard drive you will notice severe overheads. This is not premature optimization. We can foresee that this will be a performance hotspot so we should eagerly optimize.
So when copying bytes around you probably should not rely on abstractions like IEnumerable
and IList
at all. Pass around arrays or ArraySegement<byte>
's which also contain Offset
and Count
. This frees you from slicing arrays too often.
One thing that is a death-sin with high-throughput IO, too, is calling a method per byte. Like reading bytewise and writing bytewise. This kills performance because these methods have to be called hundreds of millions of times per second. I have experienced that myself.
Always process entire buffers of at least 4096 bytes at a time. Depending on what media you are doing IO with you can use much larger buffers (64k, 256k or even megabytes).
这篇关于在IEnumerable< byte>上FileStream的Write vs WriteByte的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!