二进制模式i / o,char的宽度,endianness [英] Binary-mode i/o, width of char, endianness

查看:68
本文介绍了二进制模式i / o,char的宽度,endianness的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

小组,


我很难找出最便携的方式从文件中读取24

位。这与Base-64编码有关。


文件以二进制模式打开,我正在使用fread读取三个

字节来自它。问题是,fread应该放在哪里?我认为
已经考虑了两种选择,但似乎都不是一个好主意:


在大多数情况下,char的宽度是8位,所以一个数组3个字符

就足够了,但字符的宽度保证只有*在
至少* 8位,所以实际需要的字符数是24 /

CHAR_BIT,四舍五入。由于你不能在一个恒定的整数

表达式中进行舍入,因此3个字符是一个很好的安全缓冲区大小,因为它保证

至少为24位。但是,由于我需要能够将这24位分成4个6位数字,因此对于char数组

的索引变得更加复杂,因为6位数字不要均匀地落在

(大概)数组中索引的8位边界上。

如果char的宽度不是8位,那么知道哪些指数看起来像是b $ b和b / shift / mask更难。因此,我想到了

第二个选项。


第二个选项是将输入缓冲区分配为一个int

保证至少24位宽的对象:long int,

甚至有8个字节可供使用。 fread可以安全地将3个字节的数据写入long int。我只担心因为一个long int是一个多字节整数,因为

字节顺序考虑因素访问它的各个部分是危险的,或者是仅与字节顺序相关的字节顺序

表示整个多字节整数的* value *? fread不关心
关心它:它将三个字节写入long int的地址,

从最低位置的字节开始,但是转移/掩蔽

便携式?例如,一个多字节整数常量0x1234有一个值为0x12的最重要字节,但在大端机器上会将* b $ b存储在*最低*占用空间的内存地址。由于

这样,只留下32位

整数的*最低* 6位所需的掩码可以是0x3F000000或0x0000003F,具体取决于

endianness,对吗?或者是十六进制整数常量总是存储

as-is?也就是说,最低字节位于最后一个整数

常量而不是最后位置的最低有效字节?这个

似乎违反直觉。


如果这些选项都不好,还有其他办法吗?


提前致谢,

Thomas

Hi group,

I''m having some difficulty figuring out the most portable way to read 24
bits from a file. This is related to a Base-64 encoding.

The file is opened in binary mode, and I''m using fread to read three
bytes from it. The question is though, where should fread put this? I
have considered two alternatives, but neither seem like a good idea:

In most cases, the width of a char is 8 bits, so an array of 3 chars
would suffice, but the width of a char is guaranteed to be only *at
least* 8 bits, so the actual number of chars required would be 24 /
CHAR_BIT, rounded up. Since you can''t round in a constant integral
expression, 3 chars is a good safe buffer size because it''s guaranteed
to be at least 24 bits. However, since I need to be able to divide
those 24 bits into four 6-bit numbers, indices into the char array
become more complicated as the 6-bit numbers do not fall evenly on the
(presumably) 8-bit boundaries that indexes in the array would give me.
If the width of a char is not 8 bits, then knowing which indices to look
at and shift/mask is even more difficult. As such, I thought of the
second option.

The second option is to allocate the input buffer as simply one int
object that is guaranteed to be at least 24 bits wide: the long int,
which even has 8 bytes to spare. fread can safely write 3 bytes of data
into a long int. I only have worries that because a long int is a
multi-byte integer, accessing various parts of it is dangerous due to
endianness considerations, or is endianness only relevant to the
represented *value* of the multi-byte integer as a whole? fread doesn''t
care about that: it writes three bytes into the address of the long int,
starting at the lowest-positioned byte, but would the shifting/masking
be portable? For example a multi-byte integer constant 0x1234 has a
most-significant byte of value 0x12, but on a big-endian machine would
be stored on the *lowest* memory address of the space it takes up. As
such, the mask required to leave only the *lowest* 6 bits of a 32-bit
integer could be either 0x3F000000 or 0x0000003F depending on
endianness, right? Or are hexadecimal integer constants always stored
as-is? That is, the lowest byte is positioned last in an integer
constant instead of the least significant byte positioned last? This
seems counter-intuitive.

If neither of these options is good, is there another way?

Thanks in advance,
Thomas

推荐答案

T Koster写道:
T Koster wrote:

嗨组,

我在查找从文件中读取24位的最便携方式时遇到了一些困难。这与Base-64编码有关。

文件以二进制模式打开,我正在使用fread从中读取三个字节。问题是,fread应该放在哪里?我已经考虑了两种选择,但两者似乎都不是一个好主意:

在大多数情况下,char的宽度是8位,所以3个字符的数组将是
足够,但char的宽度保证只有*至少* 8位,所以实际需要的字符数为24 /
CHAR_BIT,向上舍入。由于你不能在一个恒定的整数表达式中进行舍入,因此3个字符是一个很好的安全缓冲区大小,因为它保证至少为24位。


要存储BITS位,至少需要(BITS + CHAR_BIT - 1)/ CHAR_BIT

字节。如果BITS不变:


#define BITS 24


那么:


unsigned char buf [(BITS + CHAR_BIT - 1)/ CHAR_BIT] = {0};


是合法的。

但是,因为我需要能够划分<将这24位变为4位6位数字,char数组的索引变得更加复杂,因为6位数字不会均匀地落在
(可能)8位边界上在数组中会给我。


所以你需要掩饰和转移。如果我们假设每个八位字节的数据

存储在一个单独的字节中,那么这并不像听起来那么难。


/ * 1 。获取第一个八位字节的第7位到第2位* /

num [0] =(buf [0]& 0xFC)>> 2;

/ * 2.获取第一个八位位组的第1位和第0位,以及第5位到第4位的第2位八位数* /

num [1] =((buf [0]& 0x03)<< 6)| ((buf [1]& 0xF0)>> 4);




如果char的宽度不是8位,然后知道哪些指数看起来和转移/掩盖更加困难。


如果它们分散,请参见上文,每个字节有8个值位

(剩余的位未被使用)。如果他们被包装进来,你只需要一个聪明的CHAR_BIT
。一旦你开始分析

这个问题,你会发现它听起来并不那么难。

因此,我想到了这个问题。 />第二个选项。

第二个选项是将输入缓冲区分配为一个保证至少24位宽的int
对象:long int,
甚至有8个字节备用。


好​​吧,至少8 *位*备用。 :-)

fread可以安全地将3个字节的数据写入long int。

Hi group,

I''m having some difficulty figuring out the most portable way to read 24
bits from a file. This is related to a Base-64 encoding.

The file is opened in binary mode, and I''m using fread to read three
bytes from it. The question is though, where should fread put this? I
have considered two alternatives, but neither seem like a good idea:

In most cases, the width of a char is 8 bits, so an array of 3 chars
would suffice, but the width of a char is guaranteed to be only *at
least* 8 bits, so the actual number of chars required would be 24 /
CHAR_BIT, rounded up. Since you can''t round in a constant integral
expression, 3 chars is a good safe buffer size because it''s guaranteed
to be at least 24 bits.
To store BITS bits, you need at least (BITS + CHAR_BIT - 1) / CHAR_BIT
bytes. If BITS is constant:

#define BITS 24

then:

unsigned char buf[(BITS + CHAR_BIT - 1) / CHAR_BIT] = {0};

is legal.
However, since I need to be able to divide
those 24 bits into four 6-bit numbers, indices into the char array
become more complicated as the 6-bit numbers do not fall evenly on the
(presumably) 8-bit boundaries that indexes in the array would give me.
So you need to mask and shift. If we assume that each octet of data
is stored in a separate byte, then this isn''t as hard as it sounds.

/* 1. get bits 7 through 2 of first octet */
num[0] = (buf[0] & 0xFC) >> 2;
/* 2. get bits 1 and 0 of first octet, and bits 7 through 4 of
second octet */
num[1] = ((buf[0] & 0x03) << 6) | ((buf[1] & 0xF0) >> 4);

etc.
If the width of a char is not 8 bits, then knowing which indices to look
at and shift/mask is even more difficult.
See above if they''re spread out, with 8 value bits to each byte
(the remaining bits being unused). If they''re packed in, you just
have to be a little clever with CHAR_BIT. Once you start to analyse
this problem, you''ll see that it isn''t as hard as it sounds.
As such, I thought of the
second option.

The second option is to allocate the input buffer as simply one int
object that is guaranteed to be at least 24 bits wide: the long int,
which even has 8 bytes to spare.
Well, at least 8 *bits* to spare. :-)
fread can safely write 3 bytes of data
into a long int.




不一定。在你担心的那种平台上,你需要担心的是
(CHAR_BIT> 8),long int的宽度可能不到4个字节!


考虑一个具有11位字节的平台。在这样的平台上,长整理

可能只占用3个字节。在(可能更常见的)具有

16位或32位字节的平台上,long int可能只有2个字节,甚至1个字节。


对于这个项目,我会坚持使用unsigned char。长期投注将会增加你的头痛,分散你的注意力,增加你的b
担忧,减去你的理解(模数为

日 - 今天使用,显然)。



Not necessarily. On platforms such as the kind you are worrying about
(CHAR_BIT > 8), long int may well be fewer than four bytes wide!

Consider a platform with 11-bit bytes. On such a platform, long ints
may only occupy 3 bytes. On (perhaps more common) platforms with
16-bit or 32-bit bytes, long int may be only 2 bytes, or even 1 byte.

I would stick to unsigned char for this project. Long ints will
multiply your headaches, divide your attention, add to your
worries, and subtract from your understanding (modulo their
day-to-day uses, obviously).


infobahn写道:
infobahn wrote:
T Koster写道:
T Koster wrote:
我在查找从文件中读取24位的最便携方式时遇到了一些困难。这与Base-64编码有关。

文件以二进制模式打开,我正在使用fread从中读取三个字节。问题是,fread应该放在哪里?我已经考虑了两种选择,但两者似乎都不是一个好主意:

在大多数情况下,char的宽度是8位,所以3个字符的数组将是
足够,但char的宽度保证只有*至少* 8位,所以实际需要的字符数为24 /
CHAR_BIT,向上舍入。由于你不能在一个恒定的整数表达式中进行舍入,因此3个字符是一个很好的安全缓冲区大小,因为它保证至少为24位。
存储BITS位,你至少需要(BITS + CHAR_BIT - 1)/ CHAR_BIT
字节。如果BITS不变:

#define BITS 24
然后:

unsigned char buf [(BITS + CHAR_BIT - 1)/ CHAR_BIT] = {0};

是合法的。
I''m having some difficulty figuring out the most portable way to read 24
bits from a file. This is related to a Base-64 encoding.

The file is opened in binary mode, and I''m using fread to read three
bytes from it. The question is though, where should fread put this? I
have considered two alternatives, but neither seem like a good idea:

In most cases, the width of a char is 8 bits, so an array of 3 chars
would suffice, but the width of a char is guaranteed to be only *at
least* 8 bits, so the actual number of chars required would be 24 /
CHAR_BIT, rounded up. Since you can''t round in a constant integral
expression, 3 chars is a good safe buffer size because it''s guaranteed
to be at least 24 bits.
To store BITS bits, you need at least (BITS + CHAR_BIT - 1) / CHAR_BIT
bytes. If BITS is constant:

#define BITS 24

then:

unsigned char buf[(BITS + CHAR_BIT - 1) / CHAR_BIT] = {0};

is legal.




啊,好主意。



Ahh, good idea.

但是,由于我需要能够将这24位分成4个6位数字,所以char数组的索引会变得更复杂,因为6位数字不会均匀地落在
(可能)数组中索引的8位边界上。
However, since I need to be able to divide
those 24 bits into four 6-bit numbers, indices into the char array
become more complicated as the 6-bit numbers do not fall evenly on the
(presumably) 8-bit boundaries that indexes in the array would give me.



所以你需要掩码和移位。如果我们假设每个八位字节的数据存储在一个单独的字节中,那么这并不像听起来那么难。

/ * 1.获取第一个第7位到第2位八位字节* /
num [0] =(buf [0]& 0xFC)>> 2;
/ * 2.获取第一个八位位组的第1位和第0位,以及第二个八位位组的第7位到第4位* /
num [1] =((buf [0]& 0x03)<< 6)| ((buf [1]& 0xF0)>> 4);




So you need to mask and shift. If we assume that each octet of data
is stored in a separate byte, then this isn''t as hard as it sounds.

/* 1. get bits 7 through 2 of first octet */
num[0] = (buf[0] & 0xFC) >> 2;
/* 2. get bits 1 and 0 of first octet, and bits 7 through 4 of
second octet */
num[1] = ((buf[0] & 0x03) << 6) | ((buf[1] & 0xF0) >> 4);

etc.

如果char的宽度不是8位,然后知道看哪些指数和转移/掩码更加困难。
If the width of a char is not 8 bits, then knowing which indices to look
at and shift/mask is even more difficult.



如果它们被展开,见上文,每个字节有8个值位
(剩余的比特未被使用)。如果他们被包装,你只需要聪明一点CHAR_BIT。一旦你开始分析这个问题,你就会发现它并不像听起来那么难。



See above if they''re spread out, with 8 value bits to each byte
(the remaining bits being unused). If they''re packed in, you just
have to be a little clever with CHAR_BIT. Once you start to analyse
this problem, you''ll see that it isn''t as hard as it sounds.




我们似乎在使用术语''byte''具有不同的含义......见下文。



We seem to be using the term ''byte'' with different meanings...see below.

因此,我想到了
第二个选项。

第二个选项是将输入缓冲区分配为一个保证至少24位宽的int
对象:long int,
,甚至有8个字节备用。
As such, I thought of the
second option.

The second option is to allocate the input buffer as simply one int
object that is guaranteed to be at least 24 bits wide: the long int,
which even has 8 bytes to spare.



好吧,至少8 *位*备用。 : - )



Well, at least 8 *bits* to spare. :-)




当然:)



Certainly :)

fread可以安全地写入3个字节的数据
进入一个长的int。
fread can safely write 3 bytes of data
into a long int.



不一定。在诸如你担心的那种平台上(CHAR_BIT> 8),long int可能少于四个字节宽!

考虑一个11位字节的平台。在这样的平台上,长整数
可能只占用3个字节。在(可能更常见的)具有16位或32位字节的平台上,long int可能只有2个字节,甚至1个字节。



Not necessarily. On platforms such as the kind you are worrying about
(CHAR_BIT > 8), long int may well be fewer than four bytes wide!

Consider a platform with 11-bit bytes. On such a platform, long ints
may only occupy 3 bytes. On (perhaps more common) platforms with
16-bit or 32-bit bytes, long int may be only 2 bytes, or even 1 byte.




嗯,这似乎成了一个术语问题。我认为

根据定义,一个字节是8位宽。我没有使用C

类型''char''可互换地使用''一个_byte_大'的int'。当我认为CHAR_BIT可能大于8时,我的意思是,并且

并不是说这个平台上的一个存储字节有超过8位,

因为我认为这是无稽之谈。也就是说,char可能占用一个字节的存储空间超过
,但一个字节仍然是一个8位字节。调用fread

并要求三个字节意味着将读取24位,

无论平台如何,对吗?因此,一个长int,保证至少有32位,保证至少占用四个字节的存储空间,这就是为什么我说这个fread可以在long int中安全地存储
三个字节(按定义为24位)。纠正我,如果我在这里错误


我会坚持使用这个项目的unsigned char。长期注意会增加你的头痛,分散你的注意力,增加你的担忧,减去你的理解(明显地模仿他们的日常用途)。



Hmmm, this appears to be becoming a question of terminology. I thought
that by definition, one byte is eight bits wide. I''m not using the C
type ''char'' interchangably with ''an int that is one _byte_ big''. When I
consider that CHAR_BIT may be greater than 8, I mean exactly that, and
not that a byte of storage on this platform has more than eight bits,
since I thought that was nonsense. That is, a char may occupy more than
one byte of storage, but a byte is still an 8-bit byte. Calling fread
and asking for three bytes implies that 24 bits will be read,
irrespective of platform, correct? As such, a long int, being
guaranteed to have at least 32 bits, is guaranteed to occupy at least
four bytes of storage, which is why I say that fread can safely store
three bytes (24 bits by definition) in a long int. Correct me if I''m
wrong here.
I would stick to unsigned char for this project. Long ints will
multiply your headaches, divide your attention, add to your
worries, and subtract from your understanding (modulo their
day-to-day uses, obviously).




谢谢,

Thomas。



Thanks,
Thomas.


T Koster写道:
T Koster wrote:
infobahn写道:
infobahn wrote:
T Koster写道:
T Koster wrote:
fread可以安全地将3个字节的数据写入长整数。
一定。在诸如你担心的那种平台上(CHAR_BIT> 8),long int可能少于四个字节宽!
>
考虑一个11位字节的平台。在这样的平台上,长整数
可能只占用3个字节。在(可能更常见的)具有16位或32位字节的平台上,long int可能只有2个字节,甚至1个字节。
fread can safely write 3 bytes of data
into a long int.
Not necessarily. On platforms such as the kind you are worrying about
(CHAR_BIT > 8), long int may well be fewer than four bytes wide!
>
Consider a platform with 11-bit bytes. On such a platform, long ints
may only occupy 3 bytes. On (perhaps more common) platforms with
16-bit or 32-bit bytes, long int may be only 2 bytes, or even 1 byte.



嗯,这个出现了成为一个术语问题。我认为根据定义,一个字节是8位宽。



Hmmm, this appears to be becoming a question of terminology. I thought
that by definition, one byte is eight bits wide.




不在comp.lang.c中,它不是,因为ISO C认识到它在某些平台上根本不是真的。

我没有使用C
类型''char''可以互换' '一个_byte_大'的int。当我认为CHAR_BIT可能大于8时,我的意思是,并不是说这个平台上的一个存储字节有超过8位,
因为我认为那是胡说八道。


很多人都认为,很多人都错了。如果CHAR_BIT是大于8的
,那是因为该实现的字节大于8位宽



即char可能占用超过一个字节的存储空间,但一个字节仍然是一个8位字节。


在C中,根据定义,char的大小正好是一个字节。

sizeof(char)总是产生1作为其值。但是字符可以比b比特更宽。如果是,那么字节也是如此。

调用fread
并要求三个字节意味着将读取24位,
无论平台如何,对吗?


不。考虑一个典型的现代DSP。你可能从

48到96位得到任何东西! (现在可能更多。)

因此,保证至少有32位的long int保证至少占用四个字节的存储空间这就是为什么我说fread可以在long int中安全地存储三个字节(按定义为24位)。如果我在这里错了,请纠正我。



Not in comp.lang.c, it isn''t, because ISO C recognises that it
simply isn''t true on some platforms.
I''m not using the C
type ''char'' interchangably with ''an int that is one _byte_ big''. When I
consider that CHAR_BIT may be greater than 8, I mean exactly that, and
not that a byte of storage on this platform has more than eight bits,
since I thought that was nonsense.
Many people think that, and many people are wrong. If CHAR_BIT is
greater than 8, it is because bytes are greater than 8 bits wide
for that implementation.
That is, a char may occupy more than
one byte of storage, but a byte is still an 8-bit byte.
In C, by definition, a char is exactly one byte in size.
sizeof(char) always yields 1 as its value. But chars can
be wider than 8 bits. If they are, then so are bytes.
Calling fread
and asking for three bytes implies that 24 bits will be read,
irrespective of platform, correct?
Nope. Consider a typical modern DSP. You might get anything from
48 to 96 bits! (Maybe even more, nowadays.)
As such, a long int, being
guaranteed to have at least 32 bits, is guaranteed to occupy at least
four bytes of storage, which is why I say that fread can safely store
three bytes (24 bits by definition) in a long int. Correct me if I''m
wrong here.




你当然可以保证将24位变成长整数,是的。

但是你可能不需要三个字节就可以了,如果你读了
三个字节,你可能会得到比你可以咀嚼更多的东西。


(Pun绝对有意。)



You can certainly guarantee to get 24 bits into a long int, yes.
But you might not need three bytes to do it in, and if you read
three bytes you might end up with more than you can chew.

(Pun definitely intended.)


这篇关于二进制模式i / o,char的宽度,endianness的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆