以固定大小的块读取二进制文件并将所有这些块存储到 Vec 中的正确方法是什么? [英] What is the correct way to read a binary file in chunks of a fixed size and store all of those chunks into a Vec?

查看:10
本文介绍了以固定大小的块读取二进制文件并将所有这些块存储到 Vec 中的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在打开文件时遇到问题.大多数示例将文件读入 String 或将整个文件读入 Vec.我需要将文件读入固定大小的块并将这些块存储到块数组(Vec)中.

I'm having trouble with opening a file. Most examples read files into a String or read the entire file into a Vec. What I need is to read a file into chunks of a fixed size and store those chunks into an array (Vec) of chunks.

例如,我有一个名为 my_file 的文件,大小正好为 64 KB,我想以 16KB 的块读取它,所以我最终会得到一个 Vec大小为 4,其中每个元素都是另一个 Vec,大小为 16Kb(0x4000 字节).

For example, I have a file called my_file of exactly 64 KB size and I want to read it in chunks of 16KB so I would end up with an Vec of size 4 where each element is another Vec with size 16Kb (0x4000 bytes).

在阅读了文档并检查了其他 Stack Overflow 答案后,我得到了这样的结果:

After reading the docs and checking other Stack Overflow answers, I was able to come with something like this:

let mut file = std::fs::File::open("my_file")?;
// ...calculate num_of_chunks 4 in this case
let list_of_chunks = Vec::new();

for chunk in 0..num_of_chunks {
    let mut data: [u8; 0x4000] = [0; 0x4000];
    file.read(&mut data[..])?;
    list_of_chunks.push(data.to_vec());
}

虽然这似乎工作正常,但看起来有点令人费解.我读到:

Although this seems to work fine, it looks a bit convoluted. I read:

  • 对于每次迭代,在堆栈上创建一个新数组
  • 将块读入数组
  • 将数组的内容复制到一个新的Vec中,然后move Veclist_of_chunks <代码>Vec.
  • For each iteration, create a new array on stack
  • Read the chunk into the array
  • Copy the contents of the array into a new Vec and then move the Vec into the list_of_chunks Vec.

我不确定这是否是惯用的,甚至可能,但我宁愿有这样的东西:

I'm not sure if it's idiomatic or even possible, but I'd rather have something like this:

  • 使用 num_of_chunk 元素创建一个 Vec,其中每个元素是另一个大小为 16KB 的 Vec.
  • 将文件块直接读入正确的Vec
  • Create a Vec with num_of_chunk elements where each element is another Vec of size 16KB.
  • Read file chunk directly into the correct Vec

没有复制,我们确保在读取文件之前分配内存.

No copying and we make sure memory is allocated before reading the file.

这种方法可行吗?还是有更好的传统/惯用/正确方法来做到这一点?我想知道 Vec 是否是解决此问题的正确类型.我的意思是,读取文件后我不需要数组增长.

Is that approach possible? or is there a better conventional/idiomatic/correct way to do this? I'm wondering if Vec is the correct type for solving this. I mean, I won't need the array to grow after reading the file.

推荐答案

我认为最惯用的方法是使用迭代器.下面的代码(受M-ou-se's answer自由启发):

I think the most idiomatic way would be to use an iterator. The code below (freely inspired by M-ou-se's answer):

  • 使用泛型类型处理许多用例
  • 将使用预先分配的向量
  • 隐藏副作用
  • 避免复制数据两次
use std::io::{self, Read, Seek, SeekFrom};

struct Chunks<R> {
    read: R,
    size: usize,
    hint: (usize, Option<usize>),
}

impl<R> Chunks<R> {
    pub fn new(read: R, size: usize) -> Self {
        Self {
            read,
            size,
            hint: (0, None),
        }
    }

    pub fn from_seek(mut read: R, size: usize) -> io::Result<Self>
    where
        R: Seek,
    {
        let old_pos = read.seek(SeekFrom::Current(0))?;
        let len = read.seek(SeekFrom::End(0))?;

        let rest = (len - old_pos) as usize; // len is always >= old_pos but they are u64
        if rest != 0 {
            read.seek(SeekFrom::Start(old_pos))?;
        }

        let min = rest / size + if rest % size != 0 { 1 } else { 0 };
        Ok(Self {
            read,
            size,
            hint: (min, None), // this could be wrong I'm unsure
        })
    }

    // This could be useful if you want to try to recover from an error
    pub fn into_inner(self) -> R {
        self.read
    }
}

impl<R> Iterator for Chunks<R>
where
    R: Read,
{
    type Item = io::Result<Vec<u8>>;

    fn next(&mut self) -> Option<Self::Item> {
        let mut chunk = Vec::with_capacity(self.size);
        match self
            .read
            .by_ref()
            .take(chunk.capacity() as u64)
            .read_to_end(&mut chunk)
        {
            Ok(n) => {
                if n != 0 {
                    Some(Ok(chunk))
                } else {
                    None
                }
            }
            Err(e) => Some(Err(e)),
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        self.hint
    }
}

trait ReadPlus: Read {
    fn chunks(self, size: usize) -> Chunks<Self>
    where
        Self: Sized,
    {
        Chunks::new(self, size)
    }
}

impl<T: ?Sized> ReadPlus for T where T: Read {}

fn main() -> io::Result<()> {
    let file = std::fs::File::open("src/main.rs")?;
    let iter = Chunks::from_seek(file, 0xFF)?; // replace with anything 0xFF was to test

    println!("{:?}", iter.size_hint());
    // This iterator could return Err forever be careful collect it into an Result
    let chunks = iter.collect::<Result<Vec<_>, _>>()?;
    println!("{:?}, {:?}", chunks.len(), chunks.capacity());

    Ok(())
}

这篇关于以固定大小的块读取二进制文件并将所有这些块存储到 Vec 中的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆