以固定大小的块读取二进制文件并将所有这些块存储到 Vec 中的正确方法是什么? [英] What is the correct way to read a binary file in chunks of a fixed size and store all of those chunks into a Vec?
问题描述
我在打开文件时遇到问题.大多数示例将文件读入 String
或将整个文件读入 Vec
.我需要将文件读入固定大小的块并将这些块存储到块数组(Vec
)中.
I'm having trouble with opening a file. Most examples read files into a String
or read the entire file into a Vec
. What I need is to read a file into chunks of a fixed size and store those chunks into an array (Vec
) of chunks.
例如,我有一个名为 my_file
的文件,大小正好为 64 KB,我想以 16KB 的块读取它,所以我最终会得到一个 Vec
大小为 4,其中每个元素都是另一个 Vec
,大小为 16Kb(0x4000 字节).
For example, I have a file called my_file
of exactly 64 KB size and I want to read it in chunks of 16KB so I would end up with an Vec
of size 4 where each element is another Vec
with size 16Kb (0x4000 bytes).
在阅读了文档并检查了其他 Stack Overflow 答案后,我得到了这样的结果:
After reading the docs and checking other Stack Overflow answers, I was able to come with something like this:
let mut file = std::fs::File::open("my_file")?;
// ...calculate num_of_chunks 4 in this case
let list_of_chunks = Vec::new();
for chunk in 0..num_of_chunks {
let mut data: [u8; 0x4000] = [0; 0x4000];
file.read(&mut data[..])?;
list_of_chunks.push(data.to_vec());
}
虽然这似乎工作正常,但看起来有点令人费解.我读到:
Although this seems to work fine, it looks a bit convoluted. I read:
- 对于每次迭代,在堆栈上创建一个新数组
- 将块读入数组
- 将数组的内容复制到一个新的
Vec
中,然后moveVec
到list_of_chunks
<代码>Vec.
- For each iteration, create a new array on stack
- Read the chunk into the array
- Copy the contents of the array into a new
Vec
and then move theVec
into thelist_of_chunks
Vec
.
我不确定这是否是惯用的,甚至可能,但我宁愿有这样的东西:
I'm not sure if it's idiomatic or even possible, but I'd rather have something like this:
- 使用
num_of_chunk
元素创建一个Vec
,其中每个元素是另一个大小为 16KB 的Vec
. - 将文件块直接读入正确的
Vec
- Create a
Vec
withnum_of_chunk
elements where each element is anotherVec
of size 16KB. - Read file chunk directly into the correct
Vec
没有复制,我们确保在读取文件之前分配内存.
No copying and we make sure memory is allocated before reading the file.
这种方法可行吗?还是有更好的传统/惯用/正确方法来做到这一点?我想知道 Vec
是否是解决此问题的正确类型.我的意思是,读取文件后我不需要数组增长.
Is that approach possible? or is there a better conventional/idiomatic/correct way to do this?
I'm wondering if Vec
is the correct type for solving this. I mean, I won't need the array to grow after reading the file.
推荐答案
我认为最惯用的方法是使用迭代器.下面的代码(受M-ou-se's answer自由启发):
I think the most idiomatic way would be to use an iterator. The code below (freely inspired by M-ou-se's answer):
- 使用泛型类型处理许多用例
- 将使用预先分配的向量
- 隐藏副作用
- 避免复制数据两次
use std::io::{self, Read, Seek, SeekFrom};
struct Chunks<R> {
read: R,
size: usize,
hint: (usize, Option<usize>),
}
impl<R> Chunks<R> {
pub fn new(read: R, size: usize) -> Self {
Self {
read,
size,
hint: (0, None),
}
}
pub fn from_seek(mut read: R, size: usize) -> io::Result<Self>
where
R: Seek,
{
let old_pos = read.seek(SeekFrom::Current(0))?;
let len = read.seek(SeekFrom::End(0))?;
let rest = (len - old_pos) as usize; // len is always >= old_pos but they are u64
if rest != 0 {
read.seek(SeekFrom::Start(old_pos))?;
}
let min = rest / size + if rest % size != 0 { 1 } else { 0 };
Ok(Self {
read,
size,
hint: (min, None), // this could be wrong I'm unsure
})
}
// This could be useful if you want to try to recover from an error
pub fn into_inner(self) -> R {
self.read
}
}
impl<R> Iterator for Chunks<R>
where
R: Read,
{
type Item = io::Result<Vec<u8>>;
fn next(&mut self) -> Option<Self::Item> {
let mut chunk = Vec::with_capacity(self.size);
match self
.read
.by_ref()
.take(chunk.capacity() as u64)
.read_to_end(&mut chunk)
{
Ok(n) => {
if n != 0 {
Some(Ok(chunk))
} else {
None
}
}
Err(e) => Some(Err(e)),
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
self.hint
}
}
trait ReadPlus: Read {
fn chunks(self, size: usize) -> Chunks<Self>
where
Self: Sized,
{
Chunks::new(self, size)
}
}
impl<T: ?Sized> ReadPlus for T where T: Read {}
fn main() -> io::Result<()> {
let file = std::fs::File::open("src/main.rs")?;
let iter = Chunks::from_seek(file, 0xFF)?; // replace with anything 0xFF was to test
println!("{:?}", iter.size_hint());
// This iterator could return Err forever be careful collect it into an Result
let chunks = iter.collect::<Result<Vec<_>, _>>()?;
println!("{:?}, {:?}", chunks.len(), chunks.capacity());
Ok(())
}
这篇关于以固定大小的块读取二进制文件并将所有这些块存储到 Vec 中的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!