可移植的C二进制序列化原语 [英] Portable C binary serialization primitives

查看:118
本文介绍了可移植的C二进制序列化原语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,C库在将数值序列化为非文本字节流方面没有任何帮助.如果我错了,请纠正我.

As far as I know, the C library provides no help in serializing numeric values into a non-text byte stream. Correct me if I'm wrong.

使用中最标准的工具是POSIX中的 htonl 等.这些功能有缺点:

The most standard tool in use is htonl et al from POSIX. These functions have shortcomings:

  • 不支持64位.
  • 没有浮点支持.
  • 没有用于签名类型的版本.反序列化时,无符号到有符号的转换依赖于有符号整数溢出,即UB.
  • 他们的名字没有说明数据类型的大小.
  • 它们取决于8位字节以及确切大小的uint_ N _t的存在.
  • 输入类型与输出类型相同,而不是引用字节流.
    • 这要求用户执行指针类型转换,对齐可能不安全.
    • 执行了这种类型转换后,用户可能会尝试在其本机内存布局中转换和输出结构,这是一种糟糕的做法,会导致意外错误.
    • There is no 64-bit support.
    • There is no floating-point support.
    • There are no versions for signed types. When deserializing, the unsigned-to-signed conversion relies on signed integral overflow which is UB.
    • Their names do not state the size of the datatype.
    • They depend on 8-bit bytes and the presence of exact-size uint_N_t.
    • The input types are the same as the output types, instead of referring to a byte stream.
      • This requires the user to perform a pointer typecast which is possibly unsafe in alignment.
      • Having performed that typecast, the user is likely to attempt to convert and output a structure in its native memory layout, a poor practice which results in unexpected errors.

      用于将任意大小的char序列化为8位标准字节的接口将介于C标准之间,该标准不会真正确认8位字节,而无论哪种标准(ITU?)都将八位字节设置为传输的基本单位.但是旧标准并没有得到修订.

      An interface for serializing arbitrary-size char to 8-bit standard bytes would fall in between the C standard, which doesn't really acknowledge 8-bit bytes, and whatever standards (ITU?) set the octet as the fundamental unit of transmission. But the older standards aren't getting revised.

      现在C11具有许多可选组件,可以在不对现有实现提出要求的情况下,在线程之类的东西旁边添加二进制序列化扩展.

      Now that C11 has many optional components, a binary serialization extension could be added alongside things like threads without placing demands on existing implementations.

      这样的扩展会有用吗,还是担心非二进制补码机器毫无意义吗?

      Would such an extension be useful, or is worrying about non-two's-complement machines just that pointless?

      推荐答案

      我从未使用过它们,但我认为Google的协议缓冲区可以满足您的要求.

      I've never used them, but I think Google's Protocol Buffers satisfy your requirements.

      • 全部受支持.
      • 生成的API是类型安全的
      • 可以对流进行序列化

      本教程似乎很不错的介绍,您可以阅读有关实际的二进制存储格式此处.

      This tutorial seems like a pretty good introduction, and you can read about the actual binary storage format here.

      从他们的网页:

      什么是协议缓冲区?

      What Are Protocol Buffers?

      协议缓冲区是Google的语言无关,平台无关的可扩展机制,用于对结构化数据进行序列化–考虑到XML,但更小,更快,更简单.您可以定义如何一次构造数据,然后可以使用生成的特殊源代码轻松地使用各种语言(Java,C ++或Python)在各种数据流之间来回写入和读取结构化数据.

      Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.

      在纯C中没有正式的实现(仅C ++),但是有两个C端口可以满足您的需求:

      There's no official implementation in pure C (only C++), but there are two C ports that might fit your needs:

      Protobuf-c,位于 http://code.google.com/p/protobuf-c/

      Protobuf-c at http://code.google.com/p/protobuf-c/

      我不知道在非8位字节的情况下它们的性能如何,但应该相对容易发现.

      I don't know how they fare in the presence of non-8 bit bytes, but it should be relatively easy to find out.

      这篇关于可移植的C二进制序列化原语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆