有没有一种快速的方法来读取 dd 中的备用字节 [英] Is there a fast way to read alternate bytes in dd

查看:81
本文介绍了有没有一种快速的方法来读取 dd 中的备用字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在循环中使用 dd 读取二进制文件中的所有其他字节对,但速度慢得无法使用.

I'm trying to read out every other pair of bytes in a binary file using dd in a loop, but it is unusably slow.

我在 BusyBox 嵌入式设备上有一个二进制文件,其中包含 rgb565 格式的数据.每个像素为 2 个字节,我正在尝试读取所有其他像素以进行非常基本的图像缩放以减小文件大小.

I have a binary file on a BusyBox embedded device containing data in rgb565 format. Each pixel is 2 bytes and I'm trying to read out every other pixel to do very basic image scaling to reduce file size.

整体大小为 640x480,我已经能够通过以 960 字节的块大小循环 dd 来读取每隔一个行"像素.但是,即使在我的本地系统上,对通过循环使用 2 字节块大小而剩下的所有其他列"执行相同操作也慢得可笑.

The overall size is 640x480 and I've been able to read every other "row" of pixels by looping dd with a 960 byte block size. But doing the same for every other "column" that remains by looping through with a 2 byte block size is ridiculously slow even on my local system.

i=1
while [[ $i -le 307200 ]]
do
        dd bs=2 skip=$((i-1)) seek=$((i-1)) count=1 if=./tmpfile >> ./outfile 2>/dev/null
        let i=i+2
done

虽然我得到了预期的输出,但这种方法无法使用.

While I get the output I expect, this method is unusable.

是否有一些不太明显的方法可以让 dd 每隔一对字节快速复制?

Is there some less obvious way to have dd quickly copy every other pair of bytes?

遗憾的是,我对编译到 BusyBox 中的内容没有太多控制权.我对其他可能的方法持开放态度,但我只能使用 dd/sh 解决方案.例如,一个构建省略了 head -c...

Sadly I don't have much control over what gets compiled in to BusyBox. I'm open to other possible methods but a dd/sh solution may be all I can use. For instance, one build has omitted head -c...

感谢所有反馈.我会查看各种建议中的每一个,然后再回来查看结果.

I appreciate all the feedback. I will check out each of the various suggestions and check back with results.

推荐答案

跳过每个其他字符对于 sed 或 awk 等工具来说是微不足道的,只要您不需要处理换行符和空字节.但是 Busybox 对 sed 和 awk 中空字节的支持很差,我认为您根本无法应对它们.可以处理换行符,但这是一个巨大的痛苦,因为根据 4 字节块中的每个位置是否为换行符,有 16 种不同的组合需要处理.

Skipping every other character is trivial for tools like sed or awk as long as you don't need to cope with newlines and null bytes. But Busybox's support for null bytes in sed and awk is poor enough that I don't think you can cope with them at all. It's possible to deal with newlines, but it's a giant pain because there are 16 different combinations to deal with depending on whether each position in a 4-byte block is a newline or not.

既然任意二进制数据很痛苦,让我们转换成十六进制或八进制吧!我将从 bin2hex<中汲取一些灵感/code> 和 hex2bin 脚本由 Stéphane Chazelas 编写.由于我们不关心中间格式,所以我将使用八进制,因为最后一步使用了仅支持八进制的 printf,因此处理起来要简单得多.Stéphane 的 hex2bin 使用 awk 进行十六进制到八进制的转换;oct2bin 可以使用 sed.所以最后你需要 shodsedprintf.我认为您无法避免 printf:输出空字节至关重要.虽然 od 是必不可少的,但它的大部分选项都不是,因此应该可以调整此代码以支持带有更多后处理的非常精简的 od.

Since arbitrary binary data is a pain, let's translate to hexadecimal or octal! I'll draw some inspiration from bin2hex and hex2bin scripts by Stéphane Chazelas. Since we don't care about the intermediate format, I'll use octal, which is a lot simpler to deal with because the final step uses printf which only supports octal. Stéphane's hex2bin uses awk for the hexadecimal-to-octal conversion; a oct2bin can use sed. So in the end you need sh, od, sed and printf. I don't think you can avoid printf: it's critical to outputting null bytes. While od is essential, most of its options aren't, so it should be possible to tweak this code to support a very stripped-down od with a bit more postprocessing.

od -An -v -t o1 -w4 |
sed 's/^ \([0-7]*\) \([0-7]*\).*/printf \\\\\1\\\\\2/' |
sh

与基于 dd 的方法相比,这如此快的原因是 BusyBox 在父进程中运行 printf,而 dd 需要自己的进程.分叉很慢.如果我没记错的话,有一个编译选项可以让 BusyBox 为所有实用程序分叉.在这种情况下,我的方法可能和你的一样慢.这是使用 dd 的中间方法,它无法避免分叉,但至少可以避免每次打开和关闭文件.它应该比你的快一点.

The reason this is so fast compared to your dd-based approach is that BusyBox runs printf in the parent process, whereas dd requires its own process. Forking is slow. If I remember correctly, there's a compilation option which makes BusyBox fork for all utilities. In this case my approach will probably be as slow as yours. Here's an intermediate approach using dd which can't avoid the forks, but at least avoids opening and closing the file every time. It should be a little faster than yours.

i=$(($(wc -c <"$1") / 4))
exec <"$1"
dd ibs=2 count=1 conv=notrunc 2>/dev/null
while [ $i -gt 1 ]; do
  dd ibs=2 count=1 skip=1 conv=notrunc 2>/dev/null
  i=$((i - 1))
done

这篇关于有没有一种快速的方法来读取 dd 中的备用字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆