我需要从文件中读取一个矩阵,我们不知道矩阵的各个维度 [英] I need to read a matrix from a file which we dont know the matrix dimensions

查看:500
本文介绍了我需要从文件中读取一个矩阵,我们不知道矩阵的各个维度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的结构

 结构数据{
    INT ID;
双test_sample [2065] [1];
INT XX_row;
INT XX_col
双** XX; //大小= [2065] [变化]
双** alpha_new; //尺寸= [变更] [1]
INT alpha_new排;
INT alpha_new_col;
双T3;
双kernel_par;}人[20];

我已经写使用这个结构体的每个人(20人)到20个文件 FWRITE

  FWRITE(安培;人的sizeof(结构数据),1,PTR);

现在我在二进制20个文件。每个文件都包含这些变量的一个人。所有现在好了。

问题:我无法读取文件并将其assingn到来袭,因为在每一个文件中,XX和alpha_new矩阵的尺寸不同即可。 (在一个文件[2065] [8],其中一些产品[2065] [12])

我需要使用读取这些变量 FREAD (或不同),并输入到人脸识别程序...
有没有一种方法来在文件中读取单独变量或我应该改变写作的方法呢?

我不知道的如何写在一个文件中的所有变量矩阵,而不使用结构!

我希望我能在这里解释一下我的问题,对不起我的英文不好,我等待着您的帮助
完成我最后的项目在C;的我使用的Visual Studio 2012


解决方案

对于这样一个复杂的结构,这是一个谦虚大创举。这里有一个不那么短SSCCE(短的,独立的,完整的例子)。真的有3个文件撞上之一:


  • stderr.h - 错误报告功能声明(前10条线)

  • serialize.c - 序列化code(略低于300线之间)

  • stderr.c - 错误报告功能(底部40行)

我不打算解释错误报告功能。他们的工作或多或少如的printf()尽可能格式化参数去,但他们写到标准错误,而不是标准输出,它们包括节目名称为preFIX,并从得出的错误错误号。在 emalloc()函数检查内存分配,报告一个错误,如果分配失败退出。这种错误处理足够简单的程序;这是不足够对需要恢复,如果有一个内存问题的复杂程序,节约了工作或什么的。

在真正的序列化code,有4组的功能,加上的main()来编排。


  1. 分配和初始化函数来创建和初始化结构。

  2. 打印功能来转储结构。

  3. 导出功能可序列化数据用于出口。

  4. 导入功能反序列化数据进行导入。

打印功能,让人类看到的数据,并可以输出保存到文件,并比较了进口数据,出口数据,以确保它们是相同的。

如果您使用的结构来描述你的所有二维数组,如code会更简单

  typedef结构Array_2D
{
    双**数据;
    为size_t NROWS;
    为size_t NCOLS;
} Array_2D;

您会然后简单地嵌入其中3到结构数据

 结构数据
{
    INT ID;
    双T3;
    双kernel_par;
    Array_2D test_sample;
    Array_2D XX;
    Array_2D alpha_new;
};

我真的不清楚是什么双test_sample [2065] [1]的利益; 双test_sample [2065]比较; 。我会观察它使code复杂得多,这将是其他。我最后用&放大器把它当作双击的正常维数组;数据 - > test_sample [0] [0] 为出发点。

有做序列化的方法不止一种。我选择了双打的二维数组重新用N维数组psented $ P $,每个维数组是由为size_t 描述大小pfixed $ P $一维数组。这使在文件中的某些冗余,这意味着有略微更好的错误检测。这将是简单地输出二维数组的两个方面是可行的,随后的列x栏的值。事实上,在一个点上,我不得不进口code假设而出口code,采用另一种技术 - 这并没有使一个快乐的运行时,当数字被误解了,我是越来越调试输出和错误这样的:

  test_sample:2.470328e-323,1.000000e + 00,+ 2.000000e 00 3.000000e + 00,+ 4.000000e 00
二维数组大小4617315517961601024×5 = 4639833516098453504
序列化(46983)的malloc:*** MMAP(大小= 45035996273704960)失败(错误code = 12)
***错误:无法分配区域
***在malloc_error_break设置一个断点调试
./serialize:内存(12:无法分配内存)

这是一个很大的记忆...的2.470328e-323是一个麻烦的征兆了。 (所以,不,我没有得到它我第一次跑了code所有权利。)

我做了大部分在5日和NUM_PERSON与SAMPLE_SIZE测试在3。

serialize.c

  / * * stderr.h /
的#ifndef STDERR_H_INCLUDED
#定义STDERR_H_INCLUDED静态无效err_setarg0(字符常量* argv0);
静态无效err_sysexit(字符常量* FMT,...);
静态无效err_syswarn(字符常量* FMT,...);#ENDIF / * * STDERR_H_INCLUDED /#包括LT&;&stdio.h中GT;
#包括LT&;&stdlib.h中GT;枚举{SAMPLE_SIZE = 20}; / * 2065原* /
枚举{NUM_PERSON = 10}; / * 20 *原/数据结构
{
    INT ID;
    双test_sample [SAMPLE_SIZE] [1]; //为什么?
    为size_t XX_row;
    为size_t XX_col;
    双** XX; //大小= [SAMPLE_SIZE] [变化]
    双** alpha_new; //尺寸= [变更] [1]
    为size_t alpha_new_row;
    为size_t alpha_new_col;
    双T3;
    双kernel_par;
}人[NUM_PERSON]typedef结构数据数据;静态无效* emalloc(为size_t为nbytes)
{
    void *的空间=的malloc(为nbytes);
    如果(空== 0)
        err_sysexit(内存不足);
    回归的空间;
}静态无效free_data(数据*数据)
{
    用于(为size_t我= 0; I<&数据 - GT; XX_row;我++)
        免费(数据并行> XX [I]);
    免费(数据并行> XX);    用于(为size_t我= 0; I<&数据 - GT; alpha_new_row;我++)
        免费(数据并行> alpha_new [I]);
    免费(数据并行> alpha_new);    DATA-和SEQ ID = 0;
    数据 - > T3 = 0.0;
    数据 - > kernel_par = 0.0;
    DATA-> XX = 0;
    DATA-> XX_row = 0;
    DATA-> XX_col = 0;
    DATA-> alpha_new = 0;
    DATA-> alpha_new_row = 0;
    DATA-> alpha_new_col = 0;
}静态无效free_array(数据*数据,为size_t的nentries)
{
    用于(为size_t我= 0; I<且nentries;我++)
        free_data(安培;数据[I]);
}静态双** alloc_2D_double(为size_t行,为size_t COLS)
{
    双**数据= emalloc(行*的sizeof(*数据));
    用于(为size_t我= 0; I<行;我++)
    {
        数据[I] = emalloc(COLS * sizeof的(*数据由[i]));
    }
    返回的数据;
}静态无效populate_data(数据*数据,为size_t entry_num)
{
    / * entry_num作为'改变'大小* /
    数据 - > ID = entry_num;
    数据 - > T3 = entry_num * SAMPLE_SIZE;
    数据 - > kernel_par =(1.0 * SAMPLE_SIZE)/ entry_num;    用于(为size_t我= 0; I< SAMPLE_SIZE;我++)
        DATA-> test_sample [I] [0] = 1 + entry_num;    数据 - > XX_row = SAMPLE_SIZE;
    数据 - > XX_col = entry_num;
    数据 - > XX = alloc_2D_double(数据 - > XX_row线,数据> XX_col);    用于(为size_t我= 0; I<&数据 - GT; XX_row;我++)
    {
        用于(为size_t J = 0; J<&数据 - GT; XX_col; J ++)
            数据 - > XX [I] [J] = I *数据 - > XX_col + J;
    }    数据 - > alpha_new_row = entry_num;
    DATA-> alpha_new_col = 1;
    数据 - > alpha_new = alloc_2D_double(数据 - > alpha_new_row线,数据> alpha_new_col);    用于(为size_t我= 0; I<&数据 - GT; alpha_new_row;我++)
    {
        用于(为size_t J = 0; J<&数据 - GT; alpha_new_col; J ++)
            数据 - > alpha_new [I] [J] = I *数据 - > alpha_new_col + J;
    }
}静态无效populate_array(数据*数据,为size_t的nentries)
{
    用于(为size_t我= 0; I<且nentries;我++)
        populate_data(安培;数据[I]中,I + 1);
}静态无效print_1D_double(FILE * FP,字符常量*标记,双常量*值,为size_t nvalues​​)
{
    字符常量*垫=;
    fprintf中(FP,%S:标签);
    用于(为size_t我= 0; I< nvalues​​;我++)
    {
        fprintf中(FP,%s%E,垫,值[I]);
        垫=,;
    }
    putc将('\\ n',FP);
}静态无效print_2D_double(FILE * FP,字符常量*标记,双**价值观,为size_t NROWS,为size_t NCOLS)
{
    fprintf中(FP,二维数组%S [%ZD] [%ZD] \\ n,吊牌,NROWS,NCOLS);
    用于(为size_t我= 0; I< NROWS;我++)
    {
        字符缓冲区[32];
        的snprintf(缓冲区,缓冲区尺寸,%S [%ZD],标签,I);
        print_1D_double(FP,缓冲器,值[I],NCOLS);
    }
}静态无效print_data(FILE * FP,字符常量*标记,常量数据*数据)
{
    fprintf中(FP,数据:%S \\ N标记);
    fprintf中(FP,ID =%d个; T3 =%E; kernel_par =%E \\ N线,数据> ID,数据 - > T3线,数据> kernel_par);
    print_1D_double(FPtest_sample,&放大器;数据 - > test_sample [0] [0],sizeof的(数据 - > test_sample)/的sizeof(数据 - > test_sample [0] [0]));
    print_2D_double(FP,XX,数据 - > XX线,数据> XX_row线,数据> XX_col);
    print_2D_double(FP,阿尔法新建,DATA-> alpha_new线,数据> alpha_new_row线,数据> alpha_new_col);
}静态无效print_array(FILE * FP,字符常量*标记,常量数据*数据,为size_t的nentries)
{
    fprintf中(FP,阵:%S \\ N标记);
    fprintf中(FP,大小:ZD%\\ n,且nentries);
    用于(为size_t我= 0; I<且nentries;我++)
    {
        字符缓冲区[32];
        的snprintf(缓冲区,缓冲区尺寸,行%ZD,我);
        print_data(FP,缓冲,和放大器;数据[I]);
    }
    fprintf中(FP,结阵:%S \\ n \\ n,标签);
}静态无效set_file_name(字符*缓冲区,为size_t buflen,为size_t I)
{
    的snprintf(缓冲,buflen。exp_data%3zd.exp我);
}静态无效export_1D_double(FILE * FP,双*数据,为size_t NCOLS)
{
    如果(的fwrite(&放大器;!NCOLS,sizeof的(NCOLS),1,FP)= 1)
        err_sysexit(无法​​写出列数);
    如果(FWRITE(数据的sizeof(双),NCOLS,FP)!= NCOLS)
        err_sysexit(无法​​写入%ZD双打阵,NCOLS);
}静态无效export_2D_double(FILE * FP,双**数据,为size_t NROWS,为size_t NCOLS)
{
    如果(的fwrite(&放大器;!NROWS,sizeof的(的nrows),1,FP)= 1)
        err_sysexit(无法​​写的行数);
    如果(的fwrite(&放大器;!NCOLS,sizeof的(NCOLS),1,FP)= 1)
        err_sysexit(无法​​写出列数);
    用于(为size_t我= 0; I< NROWS;我++)
        export_1D_double(FP,数据[I],NCOLS);
}静态无效export_int(FILE * FP,int值)
{
    如果(的fwrite(&放大器;!值的sizeof(值),1,FP)= 1)
        err_sysexit(无法​​写入INT到文件);
}静态无效export_double(FILE * FP,双值)
{
    如果(的fwrite(&放大器;!值的sizeof(值),1,FP)= 1)
        err_sysexit(无法​​双层写入文件);
}静态无效export_data(FILE * FP,数据*数据)
{
    export_int(FP线,数据> ID);
    export_double(FP线,数据> T3);
    export_double(FP线,数据> kernel_par);
    export_1D_double(FP,&放大器;数据 - > test_sample [0] [0],sizeof的(数据 - > test_sample)/的sizeof(数据 - > test_sample [0]));
    export_2D_double(FP线,数据> XX线,数据> XX_row线,数据> XX_col);
    export_2D_double(FP线,数据> alpha_new线,数据> alpha_new_row线,数据> alpha_new_col);
}静态无效export_array(数据*数据,为size_t的nentries)
{
    用于(为size_t我= 0; I<且nentries;我++)
    {
        字符文件名[30];
        set_file_name(文件名,sizeof的(文件名),I);
        FILE *计划生育=的fopen(文件名,W);
        如果(FP == 0)
            err_sysexit(无法​​打开文件%s写,文件名);
        的printf(%出口到ZD%S \\ n,我,文件名);
        export_data(FP,和放大器;数据[I]);
        FCLOSE(FP);
    }
}静态INT import_int(FILE * FP)
{
    int值;
    如果(的fread(安培;!值的sizeof(值),1,FP)= 1)
        err_sysexit(无法​​读取INT);
    返回值;
}静态双import_double(FILE * FP)
{
    双重价值;
    如果(的fread(安培;!值的sizeof(值),1,FP)= 1)
        err_sysexit(无法​​读取INT);
    返回值;
}静态为size_t import_size_t(FILE * FP)
{
    为size_t值;
    如果(的fread(安培;!值的sizeof(值),1,FP)= 1)
        err_sysexit(无法​​读取为size_t);
    返回值;
}静态无效import_1D_double(FILE * FP,双*数据,为size_t nvalues​​)
{
    为size_t大小= import_size_t(FP);
    如果(大小!= nvalues​​)
        err_sysexit(大小不匹配(通缉%ZD,实际%ZD)\\ n,nvalues​​,大小);
    如果(的fread(数据,sizeof的(数据[0]),nvalues​​,FP)!= nvalues​​)
        err_sysexit(无法​​读取%ZD双打);
}静态无效import_2D_double(FILE * FP,双***数据,为size_t * NROWS,为size_t * NCOLS)
{
    * NROWS = import_size_t(FP);
    * NCOLS = import_size_t(FP);
    *数据= alloc_2D_double(* NROWS,* NCOLS);
    用于(为size_t我= 0; I< * NROWS;我++)
        import_1D_double(FP,(*数据)[I],* NCOLS);
}静态无效import_data(FILE * FP,数据*数据)
{
    数据 - > ID = import_int(FP);
    数据 - > T3 = import_double(FP);
    数据 - > kernel_par = import_double(FP);    import_1D_double(FP,&放大器;数据 - > test_sample [0] [0],sizeof的(数据 - > test_sample)/的sizeof(数据 - > test_sample [0] [0]));
    import_2D_double(FP,和放大器;数据 - > XX,和放大器;数据 - > XX_row,&安培;数据 - > XX_col);
    import_2D_double(FP,和放大器;数据 - > alpha_new,&安培;数据 - > alpha_new_row,&安培;数据 - > alpha_new_col);
}静态无效import_array(数据*数据,为size_t的nentries)
{
    用于(为size_t我= 0; I<且nentries;我++)
    {
        字符文件名[30];
        set_file_name(文件名,sizeof的(文件名),I);
        FILE *计划生育=的fopen(文件名,R);
        如果(FP == 0)
            err_sysexit(无法​​打开文件%s读,文件名);
        的printf(%s的\\ n导入%ZD,我,文件名);
        import_data(FP,和放大器;数据[I]);
        FCLOSE(FP);
    }
}INT主(INT ARGC,字符** argv的)
{
    err_setarg0(的argv [0]);
    如果(argc个!= 1)
        err_syswarn(忽略%d个不相干的论据,ARGC-1);
    populate_array(人,NUM_PERSON);
    print_array(标准输出新鲜填充的人,NUM_PERSON);
    export_array(人,NUM_PERSON);
    的printf(\\ n \\ nEXPORT完成\\ n \\ n);
    free_array(人,NUM_PERSON);
    import_array(人,NUM_PERSON);
    的printf(\\ n \\ nIMPORT完成\\ n \\ n);
    print_array(标准输出新鲜进口的,人,NUM_PERSON);
    free_array(人,NUM_PERSON);
    返回(0);
}/ * * stderr.c /
/ *#包括stderr.h* /
#包括LT&;&stdio.h中GT;
#包括LT&;&STDARG.H GT;
#包括LT&;&errno.h中GT;
#包括LT&;&string.h中GT;
#包括LT&;&stdlib.h中GT;静态字符常量*为arg0 =<&未定义GT;静态无效err_setarg0(字符常量* argv0)
{
    将arg0 = argv0;
}静态无效err_vsyswarn(字符常量* FMT,va_list的参数)
{
    INT的差错编号= errno的;
    fprintf中(标准错误,%S,为arg0);
    vfprintf(标准错误,格式化,参数);
    如果(差错编号!= 0)
        fprintf中(标准错误,(%D:%S),差错编号,字符串错误(差错编号));
    putc将('\\ n',标准错误);
}静态无效err_syswarn(字符常量* FMT,...)
{
    va_list的ARGS;
    的va_start(参数,FMT);
    err_vsyswarn(FMT,参数);
    va_end用来(参数);
}静态无效err_sysexit(字符常量* FMT,...)
{
    va_list的ARGS;
    的va_start(参数,FMT);
    err_vsyswarn(FMT,参数);
    va_end用来(参数);
    出口(1);
}

当在的valgrind 运行,它被赋予的健康清洁提单,没有内存泄漏。并花了超过一个合格之前,我可以有把握地说,太(的valgrind 露面的目测结果没有发现一个bug,但很明显,一旦检测到)


问题的答案在注释


  

总之,这里是出现了几个问题,在执行code。


  
  

第一个是'的snprintf':标识符找不到


  
  

二是在行双**数据= emalloc(行*的sizeof(*数据));它说不能从<$ C $转换C>'无效*'到双** ,这很有道理,因为数据为双和 emalloc 将返回无效* ;我怎样才能解决这些问题之前,我开始嵌入这个我原来的计划?



  1. 请不要使用C ++编译器编译C code

  2. 更新到了C99编译器的系统。

或者,因为你可能在Windows和使用MSVC:


  1. 使用铸造双**数据=(双**)emalloc(行*的sizeof(*数据));

  2. 查找 _snprintf() snprintf_s()等在MSDN。我通过谷歌找到它与网站:microsoft.com的snprintf'(对于'的snprintf的各种各样的拼写),当我需要知道MSVC做什么

在紧急情况下,可以使用的sprintf();该缓冲区的大小是足够大了,不应该有溢出的风险,这是什么的snprintf()等防范。



  

顺便说一句,在我的计划有一个叫 cernel_matrix功能(双** M1,双** M2),功能以两个2维矩阵。我通过测试样品和 XX 来此功能,有时 XX XX ,有时 test_sample test_sample ,它取决于所以我不能让 test_sample 1维的;它的功能作品现在的样子。否则,我会得到这个错误:无法从双*转换为双** 。我希望我解释了为什么测试样品不能是1维的。



  1. cernel_matrix()函数没有告诉矩阵有多大,所以我不知道是怎么回事都不可能可靠地工作。

  2. 我不相信通过 test_sample cernel_matrix 是安全的;一个双矩阵[] [1] 值不转换为双** 。所以,我不认为我明白了为什么 test_sample 就是这样一个矩阵。

我把一个微型的测试用例本:

 的extern无效cernel_matrix(双** M1,双** M2);EXTERN无效米(无效);无效米(无效)
{
    双** M0;
    双* M1 [13];
    双2 [234] [1];    cernel_matrix(M0,M1);
    cernel_matrix(M1,M2);
}

编译器告诉我:

  x.c:在功能上的m:
x.c:12:5:警告:传递从兼容的指针类型'cernel_matrix的论据2 [默认启用]
x.c:1:13:注:应为双**,但参数的类型为双(*)[1]
x.c:11:18:警告:'M0'在这个函数中使用初始化[-Wuninitialized]

在'未初始化'警告是完全合法的,但问题是其他预警和照会。你应该得到从你的编译器类似的东西。



  

我想我明白它和功能的想法,但还是有很多事情,我不会在code理解。我应该能恩preSS所有的线,因为我有一个presentation我的老师。


因为你已经没有什么所示。当别人为你提供code,你跑不理解他们做什么风险。

既然你需要了解code至$ P $它psent给老师,你可能会需要做一些编程练习。注意的第一件事情我做了一被切断的问题,到玩具大小(而不是2065,我用5或10或20)。你应该这样做。开始只包含固定大小的元素的结构 - ID T3 kernel_par test_sample 。让这个可以初始化和导出和导入。您可以导入比你导出一个不同的变量,然后做两个变量的比较。你甚至可以忽略 test_sample 中的第一个版本。

当你已经有了工作,然后添加您的阵列和其尺寸成员之一。现在得到的工作(使用4x5的大小或类似)。然后添加其他阵列(这应该是微不足道的)。当你这样做,你应该看到的例子中的各种功能,我给做的,为什么他们在那里。他们都是必要在一定程度上。正如我在我的评论暗示,我花了几个(太多)试图得到它的权利。我是用严格的警告选项编译,但仍的valgrind 是威特灵对未初始化的数据(如我正要张贴)。但我最终找到一个不完全的编辑copy'n'paste件code的。

请注意,如果你贴code,它没有试图导出数据的一个健全的工作,preferably尝试导入数据的健全的工作,那么code可能是固定。既然你发布任何任何价值没有code,它使人们难以产生code,没有产生什么检验的解决您的实际问题。我公司提供的code是测试。测试可以更COM prehensive - 是的,毫无疑问。但要做出code可检验和测试它,是学习编程的重要组成部分。

顺便提及,在用于任何类型的可变长度数据(如阵列)的导出过程的关键点是,以确保数据(数组)中的数据(数组)之前被写入的大小本身被写入。然后,导入过程中知道多少空间读取数据(阵列)早在之前分配。

I have a struct like that

struct Data {
    int ID;
double test_sample[2065][1];
int XX_row;
int XX_col
double **XX;                        //size=[2065][changing]
double **alpha_new;                 //size=[changing][1]
int alpha_new row;
int alpha_new_col;
double t3;
double kernel_par;

}person[20];

I've written this struct for every person (for 20 person) into 20 file using fwrite:

fwrite(&person, sizeof( struct Data ), 1,Ptr );

Now I have 20 files in binary. Each file includes these variables for one person. All is Ok for now.

Problem: I can't read a file and assingn it to a struck because in every file, dimension of XX and alpha_new Matrix is different. (in a file [2065][8],some of them[2065][12])

I need to read these variables using fread (or different) and input to the face recognition program... Is there a way to read variables individually in the file or should I change the writing method as well?

I don't know how to write all variables matrixes in one file without using struct!

I hope I can explain my problem here, sorry for my poor english, I waiting for your help to finish my Final project in c; I am using visual studio 2012

解决方案

For such a complex structure, it's a modestly major undertaking. Here's a not-so-short SSCCE (Short, Self-Contained, Complete Example). There are really 3 files slammed into one:

  • stderr.h — declarations of error reporting functions (top 10 lines)
  • serialize.c — the serialization code (just under 300 lines in between)
  • stderr.c — the error reporting functions (bottom 40 lines)

I'm not planning to explain the error reporting functions. They work more or less like printf() as far as formatting arguments goes, but they write to standard error, not standard output, and they include the program name as a prefix, and the error derived from errno. The emalloc() function checks memory allocations, reporting an error and exiting if the allocation fails. This error handling is adequate for simple programs; it is not adequate for complex programs that need to recover if there's a memory problem, saving the work or whatever.

Within the real serialization code, there are 4 groups of functions, plus main() to orchestrate.

  1. Allocation and initialization functions to create and initialize the structures.
  2. Print functions to dump the structures.
  3. Export functions to serialize the data for export.
  4. Import functions to deserialize the data for import.

The print functions allow a human to see the data, and you could save the output to file and compare the export data with the import data to ensure that they're the same.

The code would be simpler if you used a structure to describe all your 2D arrays, such as:

typedef struct Array_2D
{
    double **data;
    size_t   nrows;
    size_t   ncols;
} Array_2D;

You'd then simply embed 3 of these into your struct Data:

struct Data
{
    int       ID;
    double    t3;
    double    kernel_par;
    Array_2D  test_sample;
    Array_2D  XX;
    Array_2D  alpha_new;
};

I'm really not clear what the benefit of double test_sample[2065][1]; is compared with double test_sample[2065];. I will observe it makes the code more complex than it would be otherwise. I end up treating it as a normal 1D array of double by using &data->test_sample[0][0] as the starting point.

There's more than one way to do the serialization. I've opted for a 2D array of doubles to be represented by N 1D arrays, and each 1D array is prefixed by a size_t describing the size of the 1D array. That gives some redundancy in the files, which means that there's slightly better error detection. It would be feasible to simply output the two dimensions of a 2D array, followed by rows x cols values. Indeed, at one point, I had the import code assuming that while the export code was using the other technique — this did not make for a happy runtime when the numbers were misunderstood and I was getting debug output and errors like:

test_sample: 2.470328e-323, 1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00
2D array size 4617315517961601024 x 5 = 4639833516098453504
serialize(46983) malloc: *** mmap(size=45035996273704960) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
./serialize: Out of memory (12: Cannot allocate memory)

That's a lot of memory...the 2.470328e-323 was a symptom of trouble, too. (So no, I didn't get it all right on the first time I ran the code.)

I did most of the testing with SAMPLE_SIZE at 5 and NUM_PERSON at 3.

serialize.c

/* stderr.h */
#ifndef STDERR_H_INCLUDED
#define STDERR_H_INCLUDED

static void err_setarg0(char const *argv0);
static void err_sysexit(char const *fmt, ...);
static void err_syswarn(char const *fmt, ...);

#endif /* STDERR_H_INCLUDED */

#include <stdio.h>
#include <stdlib.h>

enum { SAMPLE_SIZE = 20 }; /* 2065 in original */
enum { NUM_PERSON  = 10 }; /*   20 in original */

struct Data
{
    int ID;
    double test_sample[SAMPLE_SIZE][1]; //Why?
    size_t XX_row;
    size_t XX_col;
    double **XX;                        //size=[SAMPLE_SIZE][changing]
    double **alpha_new;                 //size=[changing][1]
    size_t alpha_new_row;
    size_t alpha_new_col;
    double t3;
    double kernel_par;
} person[NUM_PERSON];

typedef struct Data Data;

static void *emalloc(size_t nbytes)
{
    void *space = malloc(nbytes);
    if (space == 0)
        err_sysexit("Out of memory");
    return space;
}

static void free_data(Data *data)
{
    for (size_t i = 0; i < data->XX_row; i++)
        free(data->XX[i]);
    free(data->XX);

    for (size_t i = 0; i < data->alpha_new_row; i++)
        free(data->alpha_new[i]);
    free(data->alpha_new);

    data->ID = 0;
    data->t3 = 0.0;
    data->kernel_par = 0.0;
    data->XX = 0;
    data->XX_row = 0;
    data->XX_col = 0;
    data->alpha_new = 0;
    data->alpha_new_row = 0;
    data->alpha_new_col = 0;
}

static void free_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
        free_data(&data[i]);
}

static double **alloc_2D_double(size_t rows, size_t cols)
{
    double **data = emalloc(rows * sizeof(*data));
    for (size_t i = 0; i < rows; i++)
    {
        data[i] = emalloc(cols * sizeof(*data[i]));
    }
    return data;
}

static void populate_data(Data *data, size_t entry_num)
{
    /* entry_num serves as 'changing' size */
    data->ID = entry_num;
    data->t3 = entry_num * SAMPLE_SIZE;
    data->kernel_par = (1.0 * SAMPLE_SIZE) / entry_num;

    for (size_t i = 0; i < SAMPLE_SIZE; i++)
        data->test_sample[i][0] = i + entry_num;

    data->XX_row = SAMPLE_SIZE;
    data->XX_col = entry_num;
    data->XX = alloc_2D_double(data->XX_row, data->XX_col);

    for (size_t i = 0; i < data->XX_row; i++)
    {
        for (size_t j = 0; j < data->XX_col; j++)
            data->XX[i][j] = i * data->XX_col + j;
    }

    data->alpha_new_row = entry_num;
    data->alpha_new_col = 1;
    data->alpha_new = alloc_2D_double(data->alpha_new_row, data->alpha_new_col);

    for (size_t i = 0; i < data->alpha_new_row; i++)
    {
        for (size_t j = 0; j < data->alpha_new_col; j++)
            data->alpha_new[i][j] = i * data->alpha_new_col + j;
    }
}

static void populate_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
        populate_data(&data[i], i+1);
}

static void print_1D_double(FILE *fp, char const *tag, double const *values, size_t nvalues)
{
    char const *pad = "";
    fprintf(fp, "%s: ", tag);
    for (size_t i = 0; i < nvalues; i++)
    {
        fprintf(fp, "%s%e", pad, values[i]);
        pad = ", ";
    }
    putc('\n', fp);
}

static void print_2D_double(FILE *fp, char const *tag, double **values, size_t nrows, size_t ncols)
{
    fprintf(fp, "2D array %s[%zd][%zd]\n", tag, nrows, ncols);
    for (size_t i = 0; i < nrows; i++)
    {
        char buffer[32];
        snprintf(buffer, sizeof(buffer), "%s[%zd]", tag, i);
        print_1D_double(fp, buffer, values[i], ncols);
    }
}

static void print_data(FILE *fp, char const *tag, const Data *data)
{
    fprintf(fp, "Data: %s\n", tag);
    fprintf(fp, "ID = %d; t3 = %e; kernel_par = %e\n", data->ID, data->t3, data->kernel_par);
    print_1D_double(fp, "test_sample", &data->test_sample[0][0], sizeof(data->test_sample)/sizeof(data->test_sample[0][0]));
    print_2D_double(fp, "XX", data->XX, data->XX_row, data->XX_col);
    print_2D_double(fp, "Alpha New", data->alpha_new, data->alpha_new_row, data->alpha_new_col);
}

static void print_array(FILE *fp, char const *tag, const Data *data, size_t nentries)
{
    fprintf(fp, "Array: %s\n", tag);
    fprintf(fp, "Size: %zd\n", nentries);
    for (size_t i = 0; i < nentries; i++)
    {
        char buffer[32];
        snprintf(buffer, sizeof(buffer), "Row %zd", i);
        print_data(fp, buffer, &data[i]);
    }
    fprintf(fp, "End Array: %s\n\n", tag);
}

static void set_file_name(char *buffer, size_t buflen, size_t i)
{
    snprintf(buffer, buflen, "exp_data.%.3zd.exp", i);
}

static void export_1D_double(FILE *fp, double *data, size_t ncols)
{
    if (fwrite(&ncols, sizeof(ncols), 1, fp) != 1)
        err_sysexit("Failed to write number of columns");
    if (fwrite(data, sizeof(double), ncols, fp) != ncols)
        err_sysexit("Failed to write array of %zd doubles", ncols);
}

static void export_2D_double(FILE *fp, double **data, size_t nrows, size_t ncols)
{
    if (fwrite(&nrows, sizeof(nrows), 1, fp) != 1)
        err_sysexit("Failed to write number of rows");
    if (fwrite(&ncols, sizeof(ncols), 1, fp) != 1)
        err_sysexit("Failed to write number of columns");
    for (size_t i = 0; i < nrows; i++)
        export_1D_double(fp, data[i], ncols);
}

static void export_int(FILE *fp, int value)
{
    if (fwrite(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to write int to file");
}

static void export_double(FILE *fp, double value)
{
    if (fwrite(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to write double to file");
}

static void export_data(FILE *fp, Data *data)
{
    export_int(fp, data->ID);
    export_double(fp, data->t3);
    export_double(fp, data->kernel_par);
    export_1D_double(fp, &data->test_sample[0][0], sizeof(data->test_sample)/sizeof(data->test_sample[0]));
    export_2D_double(fp, data->XX, data->XX_row, data->XX_col);
    export_2D_double(fp, data->alpha_new, data->alpha_new_row, data->alpha_new_col);
}

static void export_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
    {
        char filename[30];
        set_file_name(filename, sizeof(filename), i);
        FILE *fp = fopen(filename, "w");
        if (fp == 0)
            err_sysexit("Failed to open file %s for writing", filename);
        printf("Export %zd to %s\n", i, filename);
        export_data(fp, &data[i]);
        fclose(fp);
    }
}

static int import_int(FILE *fp)
{
    int value;
    if (fread(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to read int");
    return value;
}

static double import_double(FILE *fp)
{
    double value;
    if (fread(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to read int");
    return value;
}

static size_t import_size_t(FILE *fp)
{
    size_t value;
    if (fread(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to read size_t");
    return value;
}

static void import_1D_double(FILE *fp, double *data, size_t nvalues)
{
    size_t size = import_size_t(fp);
    if (size != nvalues)
        err_sysexit("Size mismatch (wanted %zd, actual %zd)\n", nvalues, size);
    if (fread(data, sizeof(data[0]), nvalues, fp) != nvalues)
        err_sysexit("Failed to read %zd doubles");
}

static void import_2D_double(FILE *fp, double ***data, size_t *nrows, size_t *ncols)
{
    *nrows = import_size_t(fp);
    *ncols = import_size_t(fp);
    *data  = alloc_2D_double(*nrows, *ncols);
    for (size_t i = 0; i < *nrows; i++)
        import_1D_double(fp, (*data)[i], *ncols);
}

static void import_data(FILE *fp, Data *data)
{
    data->ID = import_int(fp);
    data->t3 = import_double(fp);
    data->kernel_par = import_double(fp);

    import_1D_double(fp, &data->test_sample[0][0], sizeof(data->test_sample)/sizeof(data->test_sample[0][0]));
    import_2D_double(fp, &data->XX, &data->XX_row, &data->XX_col);
    import_2D_double(fp, &data->alpha_new, &data->alpha_new_row, &data->alpha_new_col);
}

static void import_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
    {
        char filename[30];
        set_file_name(filename, sizeof(filename), i);
        FILE *fp = fopen(filename, "r");
        if (fp == 0)
            err_sysexit("Failed to open file %s for reading", filename);
        printf("Import %zd from %s\n", i, filename);
        import_data(fp, &data[i]);
        fclose(fp);
    }
}

int main(int argc, char **argv)
{
    err_setarg0(argv[0]);
    if (argc != 1)
        err_syswarn("Ignoring %d irrelevant arguments", argc-1);
    populate_array(person, NUM_PERSON);
    print_array(stdout, "Freshly populated", person, NUM_PERSON);
    export_array(person, NUM_PERSON);
    printf("\n\nEXPORT COMPLETE\n\n");
    free_array(person, NUM_PERSON);
    import_array(person, NUM_PERSON);
    printf("\n\nIMPORT COMPLETE\n\n");
    print_array(stdout, "Freshly imported", person, NUM_PERSON);
    free_array(person, NUM_PERSON);
    return(0);
}

/* stderr.c */
/*#include "stderr.h"*/
#include <stdio.h>
#include <stdarg.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>

static char const *arg0 = "<undefined>";

static void err_setarg0(char const *argv0)
{
    arg0 = argv0;
}

static void err_vsyswarn(char const *fmt, va_list args)
{
    int errnum = errno;
    fprintf(stderr, "%s: ", arg0);
    vfprintf(stderr, fmt, args);
    if (errnum != 0)
        fprintf(stderr, " (%d: %s)", errnum, strerror(errnum));
    putc('\n', stderr);
}

static void err_syswarn(char const *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    err_vsyswarn(fmt, args);
    va_end(args);
}

static void err_sysexit(char const *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    err_vsyswarn(fmt, args);
    va_end(args);
    exit(1);
}

When run under valgrind, it was given a clean bill of health with no memory leaked. And it took more than one pass before I could safely say that, too (valgrind showed up a bug that eyeballing the results hadn't spotted, though it was obvious once detected).


Answers to questions in comments

Anyway, here are a couple of problem occurring while executing the code.

First one is 'snprintf': identifier not found

Second one is in the line of "double **data = emalloc(rows * sizeof(*data));" it says cannot convert from 'void *' to 'double **' and it make sense because data is double and emalloc is returning void *; how can I solve these problems before I start embedding this to my original program?

  1. Don't use a C++ compiler to compile C code
  2. Update to a system with a C99 compiler.

Or, since you are probably on Windows and using MSVC:

  1. Use a cast double **data = (double **)emalloc(rows * sizeof(*data));
  2. Look up _snprintf() and snprintf_s() and so on in MSDN. I find it via Google with 'site:microsoft.com snprintf' (for various spellings of 'snprintf') when I need to know what MSVC does.

In case of emergency, use sprintf(); the size of the buffer is big enough that there shouldn't be any risk of overflow, which is what snprintf() et al protect against.


By the way,in my program there is a function called cernel_matrix(double **M1 ,double **M2), a function taking two 2-dimensional matrices. I am passing test sample and xx to this function, sometimes xx and xx, sometimes test_sample and test_sample, it depending so I can't make test_sample 1-dimensional; it's just the way of the function works. Otherwise I'll get this error: cannot convert from 'double*' to 'double **'. I hope I explained why test sample can't be 1-dimensional.

  1. The cernel_matrix() function isn't told how big the matrices are, so I don't know how it can possibly work reliably.
  2. I'm not convinced that passing test_sample to cernel_matrix is safe; a double matrix[][1] value does not convert to double **. So I'm not convinced I understand why test_sample is a matrix like that.

I put together a micro test-case for this:

extern void cernel_matrix(double **M1, double **M2);

extern void m(void);

void m(void)
{
    double **m0;
    double *m1[13];
    double m2[234][1];

    cernel_matrix(m0, m1);
    cernel_matrix(m1, m2);
}

The compiler told me:

x.c: In function ‘m’:
x.c:12:5: warning: passing argument 2 of ‘cernel_matrix’ from incompatible pointer type [enabled by default]
x.c:1:13: note: expected ‘double **’ but argument is of type ‘double (*)[1]’
x.c:11:18: warning: ‘m0’ is used uninitialized in this function [-Wuninitialized]

The 'uninitialize' warning is perfectly valid, but the problem is the other warning and its note. You should be getting something similar from your compiler.


I think I understand the idea of it and the functions, but still there are lots of things that I don't understand in the code. I should be able to express all the line because I have a presentation to my teachers.

When someone else provides you with code because you've not shown anything, you run the risk of not understanding what they do.

Since you need to understand the code to present it to the teachers, you're probably going to need to do some programming exercises. Note that one of the first things I did was cut the problem down to toy size (instead of 2065, I used 5 or 10 or 20). You should do the same. Start with a structure that only contains the fixed size elements — id, t3, kernel_par and test_sample. Make it so that you can initialize and export and import that. You can import into a different variable than the one you export, and then do a comparison of the two variables. You could even omit test_sample in the first version.

When you've got that working, then add one of your arrays and its dimension members. Now get that working (with size 4x5 or similar). Then add the other array (it should be trivial). As you do this, you should see what the various functions in the example I gave do, and why they're there. They're all 'necessary' at some level. As I alluded in my comments, it took me several (too many) attempts to get it right. I was compiling with rigorous warning options, but still valgrind was wittering about uninitialized data (as I was about to post). But I eventually spotted an incompletely edited copy'n'paste piece of code.

Note that if you'd posted code that did a sane job of attempting to export the data, and preferably a sane job of attempting to import the data, then that code could have been fixed. Since you posted no code of any worth whatsoever, it made it hard to produce code that addressed your real problem without producing something testable. The code I provided is testable. The testing could be more comprehensive — yes, undoubtedly. But making code testable, and testing it, is an important part of learning to program.

Incidentally, the key point in the export process for variable length data of any type (such as arrays) is to make sure the size of the data (array) is written before the data (array) itself is written. Then the import process knows how much space to allocate before reading the data (array) back in.

这篇关于我需要从文件中读取一个矩阵,我们不知道矩阵的各个维度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆