请有人在Big-Endian系统上测试这个 [英] Please someone test this on a Big-Endian System

查看:72
本文介绍了请有人在Big-Endian系统上测试这个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想看看这段代码是否适用于Big-Endian

系统。另外,如果有人对如何在编译时确定这一点有任何想法,以便我使用正确的解码或编码功能,我将非常感谢帮助。


谢谢,
$ b $bChé

#include< iostream>


int main( int argc,char * argv [])

{

//默认系统到小端

bool isLittleEndian = true;


//检查这个平台是big-endian还是little endian

wchar_t a = L''a'';

unsigned char * testChar = reinterpret_cast< unsigned char *>(& a);


// Big Endian在此输出中不显示任何内容

std :: cout< ;< (unsigned char *)testChar<< std :: endl;


if(testChar == 0)

{

isLittleEndian = false;


// Big Endian应显示''" Big Endian Success"这里

std :: cout<< Big Endian成功 << std :: endl;


返回0;

}

解决方案

" ThazKool" < Ch ********** @ gmail.comschrieb im Newsbeitrag

新闻:11 ******************** *@j8g2000cwa.googlegrou ps.com ...

< quote>

我想知道这段代码是否应该像Big-Endian那样工作/>
系统。另外,如果有人对如何在编译时确定这一点有任何想法,以便我使用正确的解码或编码功能,我将非常感谢帮助。


谢谢,
$ b $bChé

#include< iostream>


int main( int argc,char * argv [])

{

//默认系统到小端

bool isLittleEndian = true;


//检查这个平台是big-endian还是little endian

wchar_t a = L''a'';

unsigned char * testChar = reinterpret_cast< unsigned char *>(& a);


// Big Endian在此输出中不显示任何内容

std :: cout< ;< (unsigned char *)testChar<< std :: endl;


if(testChar == 0)

{

isLittleEndian = false;


// Big Endian应显示''" Big Endian Success"这里

std :: cout<< Big Endian成功 << std :: endl;


返回0;

}

< / quote>


如果可行,但可能也不会这样做。你假设char和

wchar_t是不同的类型。情况可能并非总是如此。你还假设

足够高的L''a''为零,这使得一个大端系统认为

char *指向一个wchar_t实际指向一个空字符串。那么你是以未定义的方式(或未指定的方式)使用reinterpret_cast来获得


行为(在指向不相关类型的指针之间始终是这样)。并且

最后一个指向局部变量的指针永远不会为0,所以testChar == 0无论系统使用哪个字节顺序(如果有的话)都将永远不会是真的。


为了测试耐力,你应该


1)测试CHAR_BITS(或其< climitsequivalent)是否等于8. Endiness

仅为内部使用八位字节的系统定义。如果CHAR_BITS不是
等于8,则无法访问该系统上的八位字节,至少不能以简单的

方式访问。


2)测试sizeof(wchar_t)== 2. Endiness仅定义为

八位字节对。所以,如果wchar_t不是一对八位字节,你必须考虑

其他东西。


3)为wchar_t变量分配一个众所周知的值。 (''a''不是很好

已知值。很有可能它会是0x0061,但它可能是

完全不同的东西。)使用比如0xFEFF之类的东西。 (0xFEFF

是Unicode字节顺序标记,但其他值也可以。)然后获取占用相同值的两个字符(八位字节)的

值空格作为变量和

将它们与0xFE和0xFF进行比较:


wchar_t wc = 0xFEFF;

unsigned char const * cp = reinterpret_cast< unsigned char *>(& wc);

if(cp [0] == 0xFE&& cp [1] == 0xFF)

{

// Big-Endian

}

else if(cp [0] == 0xFF&& cp [1] == 0xFE)

{

// Little-Endian

}

其他

{

//完全不同的东西

}


唉,该代码还取决于指向不相关的指针类型。


但为什么你需要知道你的程序运行的系统的持久性?

通常你只有一种字节转换形式 - 当你从外部源(文件,网络连接)读取

时,命令到另一个离开)或写入这样一个目的地的
。在这种情况下,您可以轻松地在外部格式和程序中使用的格式之间转换

,而不需要知道系统本身的字节顺序。你只需要知道外部字节

的顺序。然后你可以以便携方式转换。


要读取Unicode(UCS-16)字符串,请将字符串读入字节数组

(unisgned在大多数系统中,char可能是一个神奇的选择,但是添加一些

测试,其中CHAR_BITS真的等于8)。将这些八位字节的转换对

转换为足以容纳UCS-16字符的类型的值:


if(ExternalFormatIsLittleEndian)

{

for(int i = 0; i< BytesRead; i + = 2)

internalString [i / 2] = externalString [i] + 256 *

externalString [i + 1];

}

else

{

for(int i = 0; i< BytesRead; i + = 2)

internalString [i / 2] = externalString [i] * 256 +

externalString [ i + 1];

}


在您将内部数据写入外部目的地之前,您必须导致

转换您的内部表示到外部表示,但是你可以在不知道内部字节顺序的情况下这样做。你只需要知道如何在你的程序之外安排
字节。


HTH

Heinz




Heinz Ozwirk写道:


" ThazKool" < Ch ********** @ gmail.comschrieb im Newsbeitrag

新闻:11 ******************** *@j8g2000cwa.googlegrou ps.com ...

< quote>

我想知道这段代码是否应该像Big-Endian那样工作/>
系统。另外,如果有人对如何在编译时确定这一点有任何想法,以便我使用正确的解码或编码功能,我将非常感谢帮助。


谢谢,
$ b $bChé


#include< iostream>


int main(int argc,char * argv [])

{

//默认系统到小端

bool isLittleEndian = true;


//检查这个平台是big-endian还是little endian

wchar_t a = L''a'';

unsigned char * testChar = reinterpret_cast< unsigned char *>(& a);


// Big Endian在这里输出什么都不显示

std :: cout<< (unsigned char *)testChar<< std :: endl;


if(testChar == 0)

{

isLittleEndian = false;


// Big Endian应显示''" Big Endian Success"这里

std :: cout<< Big Endian成功 << std :: endl;


返回0;

}

< / quote>


如果可行,但可能也不会这样做。你假设char和

wchar_t是不同的类型。情况可能并非总是如此。你还假设

足够高的L''a''为零,这使得一个大端系统认为

char *指向一个wchar_t实际指向一个空字符串。那么你是以未定义的方式(或未指定的方式)使用reinterpret_cast来获得


行为(在指向不相关类型的指针之间始终是这样)。并且

最后一个指向局部变量的指针永远不会为0,所以testChar == 0无论系统使用哪个字节顺序(如果有的话)都将永远不会是真的。


为了测试耐力,你应该


1)测试CHAR_BITS(或其< climitsequivalent)是否等于8. Endiness

仅为内部使用八位字节的系统定义。如果CHAR_BITS不是
等于8,则无法访问该系统上的八位字节,至少不能以简单的

方式访问。


2)测试sizeof(wchar_t)== 2. Endiness仅定义为

八位字节对。所以,如果wchar_t不是一对八位字节,你必须考虑

其他东西。


3)为wchar_t变量分配一个众所周知的值。 (''a''不是很好

已知值。很有可能它会是0x0061,但它可能是

完全不同的东西。)使用比如0xFEFF之类的东西。 (0xFEFF

是Unicode字节顺序标记,但其他值也可以。)然后获取占用相同值的两个字符(八位字节)的

值空格作为变量和

将它们与0xFE和0xFF进行比较:


wchar_t wc = 0xFEFF;

unsigned char const * cp = reinterpret_cast< unsigned char *>(& wc);

if(cp [0] == 0xFE&& cp [1] == 0xFF)

{

// Big-Endian

}

else if(cp [0] == 0xFF&& cp [1] == 0xFE)

{

// Little-Endian

}

其他

{

//完全不同的东西

}


唉,该代码还取决于指向不相关的指针类型。


但为什么你需要知道你的程序运行的系统的持久性?

通常你只有一种字节转换形式 -
$ b时订购另一个$ b您正在从外部来源(文件,网络连接)读取或

写入此类目的地。在这种情况下,您可以轻松地在外部格式和程序中使用的格式之间转换

,而不需要知道系统本身的字节顺序。你只需要知道外部字节

的顺序。然后你可以以便携方式转换。


要读取Unicode(UCS-16)字符串,请将字符串读入字节数组

(unisgned在大多数系统中,char可能是一个神奇的选择,但是添加一些

测试,其中CHAR_BITS真的等于8)。将这些八位字节的转换对

转换为足以容纳UCS-16字符的类型的值:


if(ExternalFormatIsLittleEndian)

{

for(int i = 0; i< BytesRead; i + = 2)

internalString [i / 2] = externalString [i] + 256 *

externalString [i + 1];

}

else

{

for(int i = 0; i< BytesRead; i + = 2)

internalString [i / 2] = externalString [i] * 256 +

externalString [ i + 1];

}


在您将内部数据写入外部目的地之前,您必须导致

转换您的内部表示到外部表示,但是你可以在不知道内部字节顺序的情况下这样做。你只需要知道如何在你的程序之外安排
字节。


HTH

Heinz



我非常感谢你的帮助。至少有一个愚蠢的错误,因为我复制并将代码添加到main而不进行测试。你对我不知道的一些问题完全正确。

。我的

愿意这样做是出于不确定性。我想制作

便携式unicode处理函数,可直接与

说一个人随便输入C ++ const wchar_t * L" Hello World"

无忧无虑。


感谢您的帮助。


ThazKool发布:


#include< iostream>

int main(int argc,char * argv [])

{

//默认系统为小端

bool isLittleEndian = true;


//检查此平台是否为big-endian或little endian

wchar_t a = L''a'';

unsigned char * testChar = reinterpret_cast< unsigned char *>(& a);


// Big Endian在输出中不显示任何内容

std :: cout<< (unsigned char *)testChar<<的std :: ENDL;



哦,好主耶稣没有!


有完全可移植的方法,这不是一个

他们!


查看我最近在comp.std.c上发布的一些代码。

http://groups.google.ie/group/comp.s...4a21366?hl=en&

-


Frederick Gotham


I want to see if this code works the way it should on a Big-Endian
system. Also if anyone has any ideas on how determine this at
compile-time so that I use the right decoding or encoding functions, I
would greatly appreciate the help.

Thanks,
Ché
#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L''a'';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;

if( testChar == 0 )
{
isLittleEndian = false;

// Big Endian should display ''"Big Endian Success" here
std::cout << "Big Endian Success" << std::endl;

return 0;
}

解决方案

"ThazKool" <Ch**********@gmail.comschrieb im Newsbeitrag
news:11*********************@j8g2000cwa.googlegrou ps.com...
<quote>
I want to see if this code works the way it should on a Big-Endian
system. Also if anyone has any ideas on how determine this at
compile-time so that I use the right decoding or encoding functions, I
would greatly appreciate the help.

Thanks,
Ché
#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L''a'';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;

if( testChar == 0 )
{
isLittleEndian = false;

// Big Endian should display ''"Big Endian Success" here
std::cout << "Big Endian Success" << std::endl;

return 0;
}
</quote>

If might work, but it might not also do so. You are assuming that char and
wchar_t are different type. This may not always be the case. You also assume
that enough high bits of L''a'' are zero to make a big endian system think a
char* pointing to a wchar_t actually points to an empty string. Then you are
using reinterpret_cast in a way that is undefined (or unspecified?)
behaviour (casting btween pointers to unrelated types always is). And
finally a pointer to a local variable will never be 0, so "testChar==0" will
never be true, no matter which byte order the system is using (if any).

To test for endiness you should

1) Test if CHAR_BITS (or its <climitsequivalent) is equal to 8. Endiness
is only defined for systems internally using octets. If CHAR_BITS is not
equal to 8 you cannot access octets on that system, at least not in an easy
way.

2) Test if sizeof(wchar_t) == 2. Endiness is only defined for pairs of
octets. So, if wchar_t is not a pair of octets, you have to think about
something else.

3) Assign a well known value to a wchar_t variable. (L''a'' is not a well
known value. There are good chances that it will be 0x0061, but it might be
something completly different.) Use something like 0xFEFF instead. (0xFEFF
is the Unicode byte-order-mark, but other values will do, too.) Then get the
value of the two chars (octets) occupying the same space as the variable and
compare them with 0xFE and 0xFF:

wchar_t wc = 0xFEFF;
unsigned char const* cp = reinterpret_cast<unsigned char*>(&wc);
if (cp[0] == 0xFE && cp[1] == 0xFF)
{
// Big-Endian
}
else if (cp[0] == 0xFF && cp[1] == 0xFE)
{
// Little-Endian
}
else
{
// Something completly different
}

Alas, that code also depends on a cast of pointer to unrelated types.

But why do you need to know the endiness of the system your program runs on?
Usually you only have convert form one kind of byte-order to another when
you are reading from an external source (file, network connection) or
writing to such a destination. And in those situation you can easyly convert
between the external format and the format used in a program without knowing
the byte-order of the system itself. You only have to know the external byte
order. Then you can convert in a portable way.

To read a Unicode (UCS-16) string, read the string into an array of bytes
(unisgned char will probably be a god choid on most systems, but add some
test that CHAR_BITS is really equal to 8). The convert pairs of those octets
into values of a type large enough to hold an UCS-16 character:

if (ExternalFormatIsLittleEndian)
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString[i] + 256 *
externalString[i + 1];
}
else
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString[i] * 256 +
externalString[i + 1];
}

Before you write internal data to an external destination, you must of cause
convert your internal representation to the external one, but again you can
do so without knowing the internal byte order. You only have to know how
bytes should be arranged outside your program.

HTH
Heinz



Heinz Ozwirk wrote:

"ThazKool" <Ch**********@gmail.comschrieb im Newsbeitrag
news:11*********************@j8g2000cwa.googlegrou ps.com...
<quote>
I want to see if this code works the way it should on a Big-Endian
system. Also if anyone has any ideas on how determine this at
compile-time so that I use the right decoding or encoding functions, I
would greatly appreciate the help.

Thanks,
Ché
#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L''a'';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;

if( testChar == 0 )
{
isLittleEndian = false;

// Big Endian should display ''"Big Endian Success" here
std::cout << "Big Endian Success" << std::endl;

return 0;
}
</quote>

If might work, but it might not also do so. You are assuming that char and
wchar_t are different type. This may not always be the case. You also assume
that enough high bits of L''a'' are zero to make a big endian system think a
char* pointing to a wchar_t actually points to an empty string. Then you are
using reinterpret_cast in a way that is undefined (or unspecified?)
behaviour (casting btween pointers to unrelated types always is). And
finally a pointer to a local variable will never be 0, so "testChar==0" will
never be true, no matter which byte order the system is using (if any).

To test for endiness you should

1) Test if CHAR_BITS (or its <climitsequivalent) is equal to 8. Endiness
is only defined for systems internally using octets. If CHAR_BITS is not
equal to 8 you cannot access octets on that system, at least not in an easy
way.

2) Test if sizeof(wchar_t) == 2. Endiness is only defined for pairs of
octets. So, if wchar_t is not a pair of octets, you have to think about
something else.

3) Assign a well known value to a wchar_t variable. (L''a'' is not a well
known value. There are good chances that it will be 0x0061, but it might be
something completly different.) Use something like 0xFEFF instead. (0xFEFF
is the Unicode byte-order-mark, but other values will do, too.) Then get the
value of the two chars (octets) occupying the same space as the variable and
compare them with 0xFE and 0xFF:

wchar_t wc = 0xFEFF;
unsigned char const* cp = reinterpret_cast<unsigned char*>(&wc);
if (cp[0] == 0xFE && cp[1] == 0xFF)
{
// Big-Endian
}
else if (cp[0] == 0xFF && cp[1] == 0xFE)
{
// Little-Endian
}
else
{
// Something completly different
}

Alas, that code also depends on a cast of pointer to unrelated types.

But why do you need to know the endiness of the system your program runs on?
Usually you only have convert form one kind of byte-order to another when
you are reading from an external source (file, network connection) or
writing to such a destination. And in those situation you can easyly convert
between the external format and the format used in a program without knowing
the byte-order of the system itself. You only have to know the external byte
order. Then you can convert in a portable way.

To read a Unicode (UCS-16) string, read the string into an array of bytes
(unisgned char will probably be a god choid on most systems, but add some
test that CHAR_BITS is really equal to 8). The convert pairs of those octets
into values of a type large enough to hold an UCS-16 character:

if (ExternalFormatIsLittleEndian)
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString[i] + 256 *
externalString[i + 1];
}
else
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString[i] * 256 +
externalString[i + 1];
}

Before you write internal data to an external destination, you must of cause
convert your internal representation to the external one, but again you can
do so without knowing the internal byte order. You only have to know how
bytes should be arranged outside your program.

HTH
Heinz

I really appreciate your help. There was at least one silly mistake as
I copied and added the code to main without testing. You are
completely correct on some of the issues that I was unaware of. My
desire to do this was formed out of uncertainty. I want to make
portable unicode handling functions that can interface directly with
say a person typing casually into C++ the const wchar_t* L"Hello World"
without worry.

Thank you for your help.


ThazKool posted:

#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L''a'';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;


Oh good lord Jesus no!

There are perfectly portable ways of doing this, and this is not one of
them!

Check out some code I posted recently on comp.std.c++

http://groups.google.ie/group/comp.s...4a21366?hl=en&
--

Frederick Gotham


这篇关于请有人在Big-Endian系统上测试这个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆