Google的协议缓冲区在实践中如何跨平台处理浮点类型? [英] How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?

查看:176
本文介绍了Google的协议缓冲区在实践中如何跨平台处理浮点类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google的协议缓冲区允许您在邮件中存储浮动和双打。我浏览了实现源代码,想知道他们是如何在跨平台的方式做到这一点,我偶然发现了:

  inline uint32 WireFormatLite :: EncodeFloat(float value){
union {float f; uint32 i;};
f = value;
return i;
}

inline float WireFormatLite :: DecodeFloat(uint32 value){
union {float f; uint32 i;};
i = value;
return f;
}

inline uint64 WireFormatLite :: EncodeDouble(double value){
union {double f; uint64 i;};
f = value;
return i;
}

inline double WireFormatLite :: DecodeDouble(uint64 value){
union {double f; uint64 i;};
i = value;
return f;
}

现在,一个重要的附加信息是这些例程不是结束的过程,而是它们的结果进行后处理以将字节按小端顺序排列:

  inline void WireFormatLite :: WriteFloatNoTag(float value,
io :: CodedOutputStream * output){
output-> WriteLittleEndian32(EncodeFloat(value));
}

inline void WireFormatLite :: WriteDoubleNoTag(double value,
io :: CodedOutputStream * output){
output-> WriteLittleEndian64(EncodeDouble(value)) ;
}

模板<>
inline bool WireFormatLite :: ReadPrimitive< float,WireFormatLite :: TYPE_FLOAT>(
io :: CodedInputStream * input,
float * value){
uint32 temp;
if(!input-> ReadLittleEndian32(& temp))return false;
* value = DecodeFloat(temp);
return true;
}

模板<>
inline bool WireFormatLite :: ReadPrimitive< double,WireFormatLite :: TYPE_DOUBLE>(
io :: CodedInputStream * input,
double * value){
uint64 temp;
if(!input-> ReadLittleEndian64(& temp))return false;
* value = DecodeDouble(temp);
return true;
}

所以我的问题是:这在实践中是否足够好, C ++中浮点和双精度的序列化将可以跨平台传输?



我在我的问题中明确插入了在实践中字样,因为我知道理论不能对如何使用C ++格式化浮点数和双精度进行任何假设,但我不知道这种理论上的危险是否实际上是我在实践中应该非常担心的。



UPDATE



现在看起来像我在SPARC上可能会遇到的PB。如果我了解 Oracle介绍SPARC上的号码所使用的格式正确地,SPARC使用相反的字节序作为x86的整数,但是相同的字节序为x86的浮点和双精度。然而,PB通过首先将它们直接转换为适当大小的整数类型(通过并集的手段;参见我上面的问题中引用的代码段)来编码浮点/双精度,然后在平台上颠倒字节的顺序big-endian integers:

  void CodedOutputStream :: WriteLittleEndian64(uint64 value){
uint8 bytes [sizeof ];

bool use_fast = buffer_size_> = sizeof(value);
uint8 * ptr = use_fast? buffer_:bytes;

WriteLittleEndian64ToArray(value,ptr);

if(use_fast){
Advance(sizeof(value));
} else {
WriteRaw(bytes,sizeof(value));
}
}

inline uint8 * CodedOutputStream :: WriteLittleEndian64ToArray(uint64 value,
uint8 * target){
#if defined(PROTOBUF_LITTLE_ENDIAN)
memcpy(target,& value,sizeof(value));
#else
uint32 part0 = static_cast< uint32>(value);
uint32 part1 = static_cast< uint32>(value>> 32);

target [0] = static_cast< uint8>(part0);
target [1] = static_cast< uint8>(part0>> 8);
target [2] = static_cast< uint8>(part0>> 16);
target [3] = static_cast< uint8>(part0>> 24);
target [4] = static_cast< uint8>(part1);
target [5] = static_cast< uint8>(part1>> 8);
target [6] = static_cast< uint8>(part1>> 16);
target [7] = static_cast< uint8>(part1>> 24);
#endif
return target + sizeof(value);
}

然而,这是完全错误的事情,



因此,如果我的理解是正确的,那么浮点数是不是在SPARC和x86之间使用PB传输,因为本质上PB假设所有数字以相同的字节顺序(相对于其他平台)存储为给定平台上的整数,这是在SPARC上做出的不正确的假设



正如Lyke指出的,IEEE 64位浮点数点 以大端顺序存储在SPARC上,与x86相反。然而,只有两个32位字是相反的顺序,而不是所有的8个字节,特别是IEEE 32位浮点看起来像它们以与x86相同的顺序存储。

解决方案

我认为只要你的目标C ++平台使用IEEE-754和库正确处理字节顺序就应该是好的。基本上,你显示的代码假设如果你有正确的顺序和IEEE-754实现的正确的位,你会得到正确的值。字节序由协议缓冲器处理,并且假设IEEE-754-是非常普遍的。


Google's Protocol Buffers allows you to store floats and doubles in messages. I looked through the implementation source code wondering how they managed to do this in a cross-platform manner, and what I stumbled upon was:

inline uint32 WireFormatLite::EncodeFloat(float value) {
  union {float f; uint32 i;};
  f = value;
  return i;
}

inline float WireFormatLite::DecodeFloat(uint32 value) {
  union {float f; uint32 i;};
  i = value;
  return f;
}

inline uint64 WireFormatLite::EncodeDouble(double value) {
  union {double f; uint64 i;};
  f = value;
  return i;
}

inline double WireFormatLite::DecodeDouble(uint64 value) {
  union {double f; uint64 i;};
  i = value;
  return f;
}

Now, an important additional piece of information is that these routines are not the end of the process but rather the result of them is post-processed to put the bytes in little-endian order:

inline void WireFormatLite::WriteFloatNoTag(float value,
                                        io::CodedOutputStream* output) {
  output->WriteLittleEndian32(EncodeFloat(value));
}

inline void WireFormatLite::WriteDoubleNoTag(double value,
                                         io::CodedOutputStream* output) {
  output->WriteLittleEndian64(EncodeDouble(value));
}

template <>
inline bool WireFormatLite::ReadPrimitive<float, WireFormatLite::TYPE_FLOAT>(
    io::CodedInputStream* input,
    float* value) {
  uint32 temp;
  if (!input->ReadLittleEndian32(&temp)) return false;
  *value = DecodeFloat(temp);
  return true;
}

template <>
inline bool WireFormatLite::ReadPrimitive<double, WireFormatLite::TYPE_DOUBLE>(
    io::CodedInputStream* input,
    double* value) {
  uint64 temp;
  if (!input->ReadLittleEndian64(&temp)) return false;
  *value = DecodeDouble(temp);
  return true;
}

So my question is: is this really good enough in practice to ensure that the serialization of floats and doubles in C++ will be transportable across platforms?

I am explicitly inserting the words "in practice" in my question because I am aware that in theory one cannot make any assumptions about how floats and doubles are actually formatted in C++, but I don't have a sense of whether this theoretical danger is actually something I should be very worried about in practice.

UPDATE

It now looks to me like the approach PB takes might be broken on SPARC. If I understand this page by Oracle describing the format used for number on SPARC correctly, the SPARC uses the opposite endian as x86 for integers but the same endian as x86 for floats and doubles. However, PB encodes floats/doubles by first casting them directly to an integer type of the appropriate size (via means of a union; see the snippets of code quoted in my question above), and then reversing the order of the bytes on platforms with big-endian integers:

void CodedOutputStream::WriteLittleEndian64(uint64 value) {
  uint8 bytes[sizeof(value)];

  bool use_fast = buffer_size_ >= sizeof(value);
  uint8* ptr = use_fast ? buffer_ : bytes;

  WriteLittleEndian64ToArray(value, ptr);

  if (use_fast) {
    Advance(sizeof(value));
  } else {
    WriteRaw(bytes, sizeof(value));
  }
}

inline uint8* CodedOutputStream::WriteLittleEndian64ToArray(uint64 value,
                                                            uint8* target) {
#if defined(PROTOBUF_LITTLE_ENDIAN)
  memcpy(target, &value, sizeof(value));
#else
  uint32 part0 = static_cast<uint32>(value);
  uint32 part1 = static_cast<uint32>(value >> 32);

  target[0] = static_cast<uint8>(part0);
  target[1] = static_cast<uint8>(part0 >>  8);
  target[2] = static_cast<uint8>(part0 >> 16);
  target[3] = static_cast<uint8>(part0 >> 24);
  target[4] = static_cast<uint8>(part1);
  target[5] = static_cast<uint8>(part1 >>  8);
  target[6] = static_cast<uint8>(part1 >> 16);
  target[7] = static_cast<uint8>(part1 >> 24);
#endif
  return target + sizeof(value);
}

This, however, is exactly the wrong thing for it to be doing in the case of floats/doubles on SPARC since the bytes are already in the "correct" order.

So in conclusion, if my understanding is correct then floating point numbers are not transportable between SPARC and x86 using PB, because essentially PB assumes that all numbers are stored with the same endianess (relative to other platforms) as the integers on a given platform, which is an incorrect assumption to make on SPARC.

UPDATE 2

As Lyke pointed out, IEEE 64-bit floating points are stored in big-endian order on SPARC, in contrast to x86. However, only the two 32-bit words are in reverse order, not all 8 of the bytes, and in particular IEEE 32-bit floating points look like they are stored in the same order as on x86.

解决方案

I think it should be fine so long as your target C++ platform uses IEEE-754 and the library handles the endianness properly. Basically the code you've shown is assuming that if you've got the right bits in the right order and an IEEE-754 implementation, you'll get the right value. The endianness is handled by protocol buffers, and the IEEE-754-ness is assumed - but pretty universal.

这篇关于Google的协议缓冲区在实践中如何跨平台处理浮点类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆