用C++将对象写入硬盘文件 [英] Writing objects to hard disk files in C++

查看:208
本文介绍了用C++将对象写入硬盘文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个包含不同数据类型的类的实例写入硬盘并在需要时读取它.我使用下面的代码来做到这一点.问题是,每当我将对象保存到文件中时,它都会在文件夹中创建一个文件,但它的大小仅为 1 KB.同样,当我从保存文件的同一函数打开文件时,我可以读取类中的变量,但是当我将读取部分移动到另一个函数并从那里打开文件时,无法读取变量.我该如何解决这个问题?提前致谢.

I want to write an instance of a class that includes different data types into hard disk and read it whenever I need. I used the below code to do this. The problem is that whenever I save the object into a file, it creates a file on the folder but it is just size of 1 KB. Also when I open the file from the same function that saves the file, I can read variables in the class, but when I move the read section to another function and open the file from there, variables cannot be read. How can I fix the problem? Thanks in advance.

写入文件:

stream.open("configuration/KMeansModel.bin", std::ios::out | std::ios::binary);
stream.write((char *)& kmeans, sizeof(kmeans));
stream.close();

从文件中读取:

KMeans::KMeans kmeans_(umapFeatureLabel_);
stream_.open("configuration/KMeansModel.bin", std::ios::in, std::ios::binary);
stream_.read((char *)& kmeans_, sizeof(kmeans_));
stream_.close();

类定义:

class KMeans
{
private:
    int m_K;
    int m_iters;
    int m_dimensions;
    int m_total_features;
    std::vector<Cluster> m_clusters;
    std::unordered_map<std::string, std::string> m_umapFeatureLabel;
    std::unordered_map<int, std::vector<std::vector<long double>>> m_umapClusterFeatureList;

    int getNearestClusterId(Feature feature);

public:
    KMeans::KMeans::KMeans();
    KMeans(std::unordered_map<std::string, std::string>& umapFeatureLabel);
    void run(std::vector<Feature>& allFeatures);
    void predict(Feature feature);
    void updateKMeans(std::vector<Feature>& allNewFeaturesRead);
    std::string getLabelOfFeature(std::string feature);
};

推荐答案

坏消息:

您的文件保存代码使用函数sizeof.您的数据结构包括矢量和地图对象.

The bad news:

Your file saving code uses function sizeof. Your data structure includes vector and map objects.

例如,就 sizeof 而言,std::vector 对象占用 16 个字节,绝对与元素数量无关.假设是 64 位机器,那就是元素计数的 8 个字节,加上指向实际元素的指针的 8 个字节.

For example, as far as sizeof is concerned, a std::vector object takes 16 bytes, absolutely regardless of the number of elements. That's 8 bytes for the element count, plus 8 bytes for the pointer to the actual elements, assuming a 64 bits machine.

假设您的向量有 100 个元素,每个元素 8 个字节,并且这些元素从内存地址 424000 开始存储.write 方法将尽职尽责地将数字 100 和 b) 存储到文件中数量 424000;但它绝对不会尝试将 424000 到 424800 的文件内存位置保存到文件中.因为它无法知道 424000 是一个指针;这只是一个数字.

Say your vector has 100 elements, 8 bytes per element, and the elements are stored starting at memory address 424000. The write method will dutifully store into the file a) the number 100 and b) the number 424000; but it will make absolutely no attempt to save into the file memory locations from 424000 to 424800. For it has no way to know that 424000 is a pointer; that's just a number.

因此,该文件不包含恢复矢量状态所需的信息.

Hence, the file does not contain the information that is necessary to restore the vector state.

正如上面的评论中提到的,将复杂的基于指针的数据结构保存到简单的字节数组中以用于文件存储或网络传输的主题被称为序列化编组/解组.

As mentioned in the comments above, the subject of saving complex pointer-based data structures into simple byte arrays for the purpose of file storage or network transmission is known as serialization or marshalling/unmarshalling.

它本身是一个不明显的主题,就像排序算法或矩阵乘法是不明显的主题一样.您可能需要花费大量时间来想出一个自己正确调试的解决方案,一个负责维护保存和恢复代码之间一致性的解决方案,等等......

It is a non obvious subject of its own, in the same way as sorting algorithms or matrix multiplication are non obvious subjects. It would probably take you a lot of time to come up with a properly debugged solution of your own, a solution that takes care of maintaining consistency between saving and restoring code, etc ...

连载是一个不明显的主题,但它也是一个古老的、众所周知的主题.因此,您可以依赖现有的、公开可用的代码,而不是痛苦地想出自己的解决方案.

Serialization is a non-obvious subject, but it is also an old, well-known subject. So instead of painfully coming up with your own solution, you can rely on existing, publicly available code.

以类似的方式,您必须提出自己的矩阵乘法代码的唯一情况是:

In similar fashion, the only situations where you would have to come up with your own matrix multiplication code is when:

  1. 您这样做纯粹是为了娱乐和/或自我训练
  2. 你正在写一篇关于矩阵乘法的博士论文
  3. 编写线性代数代码是有报酬的

除此之外,您可能会依赖现有的LAPACK 代码.

Other than these, you would probably rely on say existing LAPACK code.

关于序列化,正如 Botje 在上面的评论中所暗示的,Boost 网站提供了一个 C++ 序列化库,以及一个合适的 教程.

Regarding serialization, as hinted to by Botje in the comments above, the Boost web site provides a C++ serialization library, along with a suitable tutorial.

我在下面提供了一个使用 Boost 库的小代码示例.一个简单的豚鼠对象包含一个整数值、一个字符串和一个映射.当然,我是无耻地借鉴了Boost教程.

I am providing below a small code sample using the Boost library. A simple guinea pig object contains an integer value, a string and a map. Of course, I am shamelessly borrowing from the Boost tutorial.

我们需要包含几个头文件:

We need to include a couple of header files:

#include  <map>
#include  <fstream>
#include  <iostream>

#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/utility.hpp>
#include <boost/serialization/map.hpp>

对象类,它假装存储一些标记地理信息:

The object class, which pretends to store some token geographical info:

class CapitalMap
{
public:
    CapitalMap(const std::string& myName, int myVersion) :
        _name(myName), _version(myVersion)
    {};

    CapitalMap() = default;  // seems required by serialization

    inline void add(const std::string&  country, const std::string& city)
    { _cmap[country] = city; }

    void fdump(std::ostream& fh);

private:
    std::string                         _name;
    int                                 _version;
    std::map<std::string, std::string>  _cmap;

    friend class boost::serialization::access;  // ALLOW FOR FILE ARCHIVAL
    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar & _name;
        ar & _version; // mind the name conflict with plain "version" argument
        ar & _cmap;
    }
};

一个小的调试实用函数:

A small debugging utility function:

void CapitalMap::fdump(std::ostream&  ofh)    // text dumping utility for debug
{
    ofh << "CapitalMap  name = \"" << _name << "\"  version = " <<
           _version << '\n';
    for (const auto&  pair : _cmap) {
        auto  country = pair.first;  auto  city = pair.second;
        ofh << city << " is the capital of " << country << '\n';
    }
}

用于创建对象、将其保存在磁盘上并(隐式地)解除分配的代码:

Code to create the object, save it on disk, and (implicitely) deallocate it:

void buildAndSaveCapitalMap (const std::string&  archiveName,
                             const std::string&  mapName,
                             int                 version)
{
    CapitalMap  euroCapitals(mapName, version);

    euroCapitals.add("Germany", "Berlin");
    euroCapitals.add("France",  "Paris");
    euroCapitals.add("Spain",   "Madrid");

    euroCapitals.fdump(std::cout);  // just for checking purposes

    // save data to archive file:

    std::ofstream                  ofs(archiveName);
    boost::archive::text_oarchive  oa(ofs);
    oa << euroCapitals;

    // ofstream connexion closed automatically here
    // archive object deleted here    - because going out of scope
    // CapitalMap object deleted here - because going out of scope
}

创建文件然后从该文件恢复对象状态的小主程序:

Small main program to create the file and then restore the object state from that file:

int main(int argc, char* argv[])
{
    const std::string archiveName{"capitals.dat"};

    std::cout << std::endl;
    buildAndSaveCapitalMap(archiveName, "EuroCapitals", 42);

    // go restore our CapitalMap object to its original state:

    CapitalMap                     cm;  // object created in its default state
    std::ifstream                  ifs(archiveName);
    boost::archive::text_iarchive  inAr(ifs);

    inAr >> cm;           // read back object ...
    std::cout << std::endl;
    cm.fdump(std::cout);  // check that it's actually back and in good shape ...
    std::cout << std::endl;

    return 0;
}

通过改变操作符&"的含义,很好地解决了在保存和恢复代码之间保持一致性的问题根据行驶方向.

The problem of maintaining consistency between saving and restoring code is brilliantly solved by altering the meaning of operator "&" according to the direction of travel.

一路上的小问题:

  1. 在 Linux 发行版上,您需要获取软件包:boost、boost-devel、boost-serialization
  2. 似乎对象类需要有一个默认构造函数
  3. 您需要手动包含诸如boost/serialization/map.hpp"之类的文件

程序执行:

$ g++  serialw00.cpp  -lboost_serialization  -o ./serialw00.x
$ ./serialw00.x

CapitalMap  name = "EuroCapitals"  version = 42
Paris is the capital of France
Berlin is the capital of Germany
Madrid is the capital of Spain

CapitalMap  name = "EuroCapitals"  version = 42
Paris is the capital of France
Berlin is the capital of Germany
Madrid is the capital of Spain

$ 

更多详情:SO_q_523872

这篇关于用C++将对象写入硬盘文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆