在C ++中将double保存为二进制的问题 [英] Issues saving double as binary in c++

查看:74
本文介绍了在C ++中将double保存为二进制的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的粒子系统仿真代码中,我有一个为粒子定义的类,每个粒子都具有一个pos属性,其中包含它的位置,它是一个double pos[3];,因为每个粒子有3个坐标分量.因此,对于由particles = new Particle[npart];定义的粒子对象(因为我们有npart个粒子),则例如第二个粒子的y分量将通过double dummycomp = particles[1].pos[1];

In my simulation code for a particle system, I have a class defined for particles, and each particle has a property of pos containing its position, which is a double pos[3]; as there are 3 coordinate components per particle. So with particle object defined by particles = new Particle[npart]; (as we have npart many particles), then e.g. the y-component of the 2nd particle would be accessed with double dummycomp = particles[1].pos[1];

要在使用二进制文件之前将粒子保存到文件中,我会使用(保存为txt,浮点精度为10,每行一个粒子):

To save the particles to file before using binary I would use (saved as txt, with float precision of 10 and one particle per line):

#include <iostream>
#include <fstream>

ofstream outfile("testConfig.txt", ios::out);
outfile.precision(10);

  for (int i=0; i<npart; i++){
    outfile << particle[i].pos[0] << " " << particle[i].pos[1]  << " " << particle[i].pos[2] << endl;
}
outfile.close();

但是现在,为了节省空间,我试图将配置另存为二进制文件,而我的尝试是从

But now, to save space, I am trying to save the configuration as a binary file, and my attempt, inspired from here, has been as follows:

ofstream outfile("test.bin", ios::binary | ios::out);

for (int i=0; i<npart; i++){ 
outfile.write(reinterpret_cast<const char*>(particle[i].pos),streamsize(3*sizeof(double))); 
}
outfile.close();

但是尝试运行分段错误.我的问题是:

but I am facing a segmentation fault when trying to run it. My questions are:

  • 我在reinterpret_cast上做错什么了吗?还是在streamsize()的论点上做错了?
  • 理想情况下,如果保存的二进制格式也可以在Python中读取,那会很好,我的方法(一旦固定)是否允许这样做?
  • Am I doing something wrong with reinterpret_cast or rather in the argument of streamsize()?
  • Ideally, it would be great if the saved binary format could also be read within Python, is my approach (once fixed) allowing for that?

旧的保存方法(非二进制)的工作示例:

#include <iostream>
#include <fstream>

using namespace std;
class Particle {

 public:

  double pos[3];

};


int main() {

  int npart = 2;
  Particle particles[npart];
  //initilizing the positions:
  particles[0].pos[0] = -74.04119568;
  particles[0].pos[1] = -44.33692582;
  particles[0].pos[2] = 17.36278231;

  particles[1].pos[0] = 48.16310086;
  particles[1].pos[1] = -65.02325252;
  particles[1].pos[2] = -37.2053818;

  ofstream outfile("testConfig.txt", ios::out);
  outfile.precision(10);

    for (int i=0; i<npart; i++){
      outfile << particles[i].pos[0] << " " << particles[i].pos[1]  << " " << particles[i].pos[2] << endl;
  }
  outfile.close();

    return 0;
}

并且为了将粒子位置保存为二进制,用

And in order to save the particle positions as binary, substitute the saving portion of the above sample with

  ofstream outfile("test.bin", ios::binary | ios::out);

  for (int i=0; i<npart; i++){
  outfile.write(reinterpret_cast<const char*>(particles[i].pos),streamsize(3*sizeof(double))); 
  }
  outfile.close();


第二个附录:使用Python读取二进制文件

我设法使用numpy读取了python中保存的二进制文件:

I managed to read the saved binary in python as follows using numpy:

data = np.fromfile('test.bin', dtype=np.float64)
data
array([-74.04119568, -44.33692582,  17.36278231,  48.16310086,
       -65.02325252, -37.2053818 ])

但是鉴于注释中关于二进制格式不可移植性的疑问,我不相信这种用Python进行的读取将一直有效!如果有人能够阐明这种方法的可靠性,那真是太好了.

But given the doubts cast in the comments regarding non-portability of binary format, I am not confident this type of reading in Python will always work! It would be really neat if someone could elucidate on the reliability of such approach.

推荐答案

问题是ascii中double的以10为基的表示形式存在缺陷,不能保证为您提供正确的结果(尤其是如果您仅使用10位数字).即使您使用所有std::numeric_limits<max_digits10>位数字,也可能会丢失信息,因为该数字可能无法精确地以10为基数.

The trouble is that base 10 representation of double in ascii is flawed and not guaranteed to give you the correct result (especially if you only use 10 digits). There is a potential for a loss of information even if you use all std::numeric_limits<max_digits10> digits as the number may not be representable in base 10 exactly.

您遇到的另一个问题是,double的二进制表示形式不规范,因此使用它非常脆弱,并且很容易导致代码破坏.简单地更改编译器或编译器位置可能会导致不同的双重格式,并且更改您绝对无法保证的体系结构.

The other issue you have is that the binary representation of a double is not standardized so using it is very fragile and can lead to code breaking very easily. Simply changing the compiler or compiler sittings can result in a different double format and changing architectures you have absolutely no guarantees.

您可以使用十六进制格式将其序列化为无损表示形式的文本.

You can serialize it to text in a non lossy representation by using the hex format for doubles.

 stream << std::fixed << std::scientific << particles[i].pos[0];

 // If you are using C++11 this was simplified to

 stream << std::hexfloat << particles[i].pos[0];

这具有在C中的printf()中打印与%a"相同的值的效果,该值将字符串打印为十六进制浮点数,小写".在这里,radixmantissa都将转换为十六进制值,然后以非常特定的格式打印.由于基础表示形式是二进制的,因此这些值可以精确地以十六进制表示,并提供了在系统之间传输数据的无损方式. IT还会截断连续的零和零,因此对于许多数字而言,它是相对紧凑的.

This has the affect of printing the value with the same as "%a" in printf() in C, that prints the string as "Hexadecimal floating point, lowercase". Here both the radix and mantissa are converted into hex values before being printed in a very specific format. Since the underlying representation is binary these values can be represented exactly in hex and provide a non lossy way of transferring data between systems. IT also truncates proceeding and succeeding zeros so for a lot of numbers is relatively compact.

在python端.也支持此格式.您应该能够以字符串形式读取值,然后使用float.fromhex()

On the python side. This format is also supported. You should be able to read the value as a string then convert it to a float using float.fromhex()

请参阅: https://docs.python.org/3/library/stdtypes.html#float.fromhex

但是您的目标是节省空间:

But your goal is to save space:

但是现在,为了节省空间,我正在尝试将配置另存为二进制文件.

But now, to save space, I am trying to save the configuration as a binary file.

我想问一个问题,您真的需要节省空间吗?您是否在低功耗,低资源的环境中运行?当然,肯定可以节省空间(但这在如今已经不多见(但这些环境确实存在)).

I would ask the question do you really need to save space? Are you running on a low powered low resource environment? Sure then space saving can definitely be a thing (but that is rare nowadays (but these environments do exist)).

但是似乎您正在运行某种形式的粒子模拟.这不会尖叫低资源用例.即使您拥有数兆字节的数据,我仍然会采用可移植的,易于读取的格式,而不是二进制.最好是一种无损耗的.存储空间很便宜.

But it seems like you are running some form of particle simulation. This does not scream low resource use case. Even if you have tera bytes of data I would still go with a portable easy to read format over binary. Preferably one that is not lossy. Storage space is cheap.

这篇关于在C ++中将double保存为二进制的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆