编译大于2 GB的代码时,如何解决GCC编译错误? [英] How to fix GCC compilation error when compiling >2 GB of code?

查看:94
本文介绍了编译大于2 GB的代码时,如何解决GCC编译错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量的功能,总共约2.8 GB的目标代码(不幸的是,科学计算无法实现...)

当我尝试链接它们时,我得到(预期的)重定位被截断以适应:R_X86_64_32S 错误,我希望通过指定编译器标志 -mcmodel = medium .除了我可以控制的以外,所有链接的库都使用 -fpic 标志进行编译.

仍然,错误仍然存​​在,并且我假定链接到的某些库未使用PIC编译.

这是错误:

 /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o:在函数_start中:(.text + 0x12):截断为适合的位置:R_X86_64_32S针对/usr/lib64/libc_nonshared.a(elf-init.oS)中.text部分中定义的符号'__libc_csu_fini'/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o:在函数'_start'中:(.text + 0x19):截断为适合的位置:R_X86_64_32S针对/usr/lib64/libc_nonshared.a(elf-init.oS)中.text部分中定义的符号'__libc_csu_init'/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o:在函数'_start'中:(.text + 0x20):对"main"的未定义引用/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crti.o:在函数`call_gmon_start'中:(.text + 0x7):截断重定位以适合:R_X86_64_GOTPCREL针对未定义符号'__gmon_start__'/usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtbegin.o:在函数__do_global_dtors_aux中:crtstuff.c :(.text + 0xb):截断重定位以适合:R_X86_64_PC32针对.bsscrtstuff.c :(.text + 0x13):截断为适合的位置:R_X86_64_32针对/usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtend.o中.dtors节中定义的符号'__DTOR_END__'crtstuff.c :(.text + 0x19):截断以适合:R_X86_64_32S针对.dtorscrtstuff.c :(.text + 0x28):截断重定位以适合:R_X86_64_PC32针对.bsscrtstuff.c :(.text + 0x38):重新定位以适合:R_X86_64_PC32针对.bsscrtstuff.c :(.text + 0x3f):重新定位以适合:R_X86_64_32S针对`.dtors'crtstuff.c :(.text + 0x46):重新定位以适合:R_X86_64_PC32针对.bsscrtstuff.c :(.text + 0x51):输出中省略了其他重定位溢出collect2:ld返回1退出状态make:*** [testsme]错误1 

和我链接的系统库:

 <代码> -lgfortran -lm -lrt -lpthread 

有什么线索可以找到问题所在吗?

首先,谢谢您的讨论...

为了澄清一点,我有数百个函数(在单独的目标文件中每个大约1 MB),如下所示:

  double func1(std :: tr1 :: unordered_map< int,double>& csc,std :: vector< EvaluationNode :: Ptr>&ti过程变量s){两倍总和,前置因子,expr;前置因子= + s.ds8 * s.ds10 * ti [0]-> value();expr =(-5/243.*(s.x14 * s.x15 * csc [49300] + 9/10. * s.x14 * s.x15 * csc [49301] +1/10. * s.x14 * s.x15 * csc [49302]-3/5. * s.x14 * s.x15 * csc [49303]-27/10. * s.x14 * s.x15 * csc [49304] + 12/5. * s.x14 * s.x15 * csc [49305]-3/10. * s.x14 * s.x15 * csc [49306]-4/5. * s.x14 * s.x15 * csc [49307] +21/10. * s.x14 * s.x15 * csc [49308] + 1/10. * s.x14 * s.x15 * csc [49309]-s.x14 * s.x15 * csc [51370]-9/10. * s.x14 * s.x15 * csc [51371]-1/10. * s.x14 * s.x15 * csc [51372] + 3/5. * s.x14 * s.x15 * csc [51373] +27/10. * s.x14 * s.x15 * csc [51374]-12/5. * s.x14 * s.x15 * csc [51375] +3/10. * s.x14 * s.x15 * csc [51376] + 4/5. * s.x14 * s.x15 * csc [51377]-21/10. * s.x14 * s.x15 * csc [51378]-1/10. * s.x14 * s.x15 * csc [51379]-2 * s.x14 * s.x15 * csc [55100]-9/5. * s.x14 * s.x15 * csc [55101]-1/5. * s.x14 * s.x15 * csc [55102] + 6/5. * s.x14 * s.x15 * csc [55103] +27/5. * s.x14 * s.x15 * csc [55104]-24/5. * s.x14 * s.x15 * csc [55105] +3/5. * s.x14 * s.x15 * csc [55106] + 8/5. * s.x14 * s.x15 * csc [55107]-21/5. * s.x14 * s.x15 * csc [55108]-1/5. * s.x14 * s.x15 * csc [55109]-2 * s.x14 * s.x15 * csc [55170]-9/5. * s.x14 * s.x15 * csc [55171]-1/5. * s.x14 * s.x15 * csc [55172] + 6/5. * s.x14 * s.x15 * csc [55173] +27/5. * s.x14 * s.x15 * csc [55174]-24/5. * s.x14 * s.x15 * csc [55175] +//...;sum + = prefactor * expr;//...返回总和} 

对象 s 相对较小,并保留所需的常量x14,x15,...,ds0,...等,而 ti 仅返回一个double从外部库.如您所见, csc [] 是一个预先计算的值映射,也可以在以下形式的单独的对象文件(同样是数百个大小约为1 MB的对象)中进行评估:

  void cscs132(std :: tr1 :: unordered_map< int,double>& csc,ProcessVars& s){{双重csc19295 = + s.ds0 * s.ds1 * s.ds2 *(-32 * s.x12pow2 * s.x15 * s.x34 * s.mbpow2 * s.mWpowinv2-32 * s.x12pow2 * s.x15 * s.x35 * s.mbpow2 * s.mWpowinv2-32 * s.x12pow2 * s.x15 * s.x35 * s.x45 * s.mWpowinv2-32 * s.x12pow2 * s.x25 * s.x34 * s.mbpow2 * s.mWpowinv2-32 * s.x12pow2 * s.x25 * s.x35 * s.mbpow2 * s.mWpowinv2-32 * s.x12pow2 * s.x25 * s.x35 * s.x45 * s.mWpowinv2 +32 * s.x12pow2 * s.x34 * s.mbpow4 * s.mWpowinv2 +32 * s.x12pow2 * s.x34 * s.x35 * s.mbpow2 * s.mWpowinv2 +32 * s.x12pow2 * s.x34 * s.x45 * s.mbpow2 * s.mWpowinv2 +32 * s.x12pow2 * s.x35 * s.mbpow4 * s.mWpowinv2 +32 * s.x12pow2 * s.x35pow2 * s.mbpow2 * s.mWpowinv2 +32 * s.x12pow2 * s.x35pow2 * s.x45 * s.mWpowinv2 +64 * s.x12pow2 * s.x35 * s.x45 * s.mbpow2 * s.mWpowinv2 +32 * s.x12pow2 * s.x35 * s.x45pow2 * s.mWpowinv2-64 * s.x12 * s.p1p3 * s.x15 * s.mbpow4 * s.mWpowinv2 +64 * s.x12 * s.p1p3 * s.x15pow2 * s.mbpow2 * s.mWpowinv2 +96 * s.x12 * s.p1p3 * s.x15 * s.x25 * s.mbpow2 * s.mWpowinv2-64 * s.x12 * s.p1p3 * s.x15 * s.x35 * s.mbpow2 * s.mWpowinv2-64 * s.x12 * s.p1p3 * s.x15 * s.x45 * s.mbpow2 * s.mWpowinv2-32 * s.x12 * s.p1p3 * s.x25 * s.mbpow4 * s.mWpowinv2 +32 * s.x12 * s.p1p3 * s.x25pow2 * s.mbpow2 * s.mWpowinv2-32 * s.x12 * s.p1p3 * s.x25 * s.x35 * s.mbpow2 * s.mWpowinv2-32 * s.x12 * s.p1p3 * s.x25 * s.x45 * s.mbpow2 * s.mWpowinv2-32 * s.x12 * s.p1p3 * s.x45 * s.mbpow2 +64 * s.x12 * s.x14 * s.x15pow2 * s.x35 * s.mWpowinv2 +96 * s.x12 * s.x14 * s.x15 * s.x25 * s.x35 * s.mWpowinv2 +32 * s.x12 * s.x14 * s.x15 * s.x34 * s.mbpow2 * s.mWpowinv2-32 * s.x12 * s.x14 * s.x15 * s.x35 * s.mbpow2 * s.mWpowinv2-64 * s.x12 * s.x14 * s.x15 * s.x35pow2 * s.mWpowinv2-32 * s.x12 * s.x14 * s.x15 * s.x35 * s.x45 * s.mWpowinv2 +32 * s.x12 * s.x14 * s.x25pow2 * s.x35 * s.mWpowinv2 +32 * s.x12 * s.x14 * s.x25 * s.x34 * s.mbpow2 * s.mWpowinv2-32 * s.x12 * s.x14 * s.x25 * s.x35pow2 * s.mWpowinv2-//...csc.insert(cscMap :: value_type(192953,csc19295));}{double csc19296 =//...;csc.insert(cscMap :: value_type(192956,csc19296));}//...} 

就是这样.然后,最后一步就是调用所有这些 func [i] 并将结果求和.

关于这是一个非常特殊且不寻常的情况:是的.这是人们在尝试进行粒子物理学的高精度计算时必须面对的问题.

我还应该补充一点,x12,x13等并不是真正的常数.将它们设置为特定值,运行所有这些函数并返回结果,然后选择一组新的x12,x13等以产生下一个值.这必须完成10 5 至10 6 次...

到目前为止,感谢您的建议和讨论...老实说,我将尝试以某种方式在代码生成时汇总循环,不确定如何做到这一点,但这是最好的选择.

顺便说一句,我没有试图掩盖这是科学计算-无法优化"的背后.
只是该代码的基础是从黑匣子"中产生的.在这里,我没有真正的访问权限,而且,通过简单的示例,整个过程就很不错了,而我主要感到对现实世界中的应用程序发生的事情不知所措...

因此,通过简化计算机代数系统中的表达式,我设法将 csc 定义的代码大小减少了大约四分之一( 可用

最后

由于您的所有建议,我设法使用Mathematica并对 func 的代码生成器进行了一些修改,从而大大减少了代码大小:)

我用Mathematica简化了 csc 函数,将其减小到92 MB.这是不可还原的部分.第一次尝试花了很长时间,但是经过一些优化后,现在可以在单个CPU上运行大约10分钟.

func 的影响是巨大的:它们的整个代码大小大约为9 MB,因此现在的代码总数在100 MB范围内.现在,打开优化很有意义,并且执行速度非常快.

再次感谢大家的建议,我学到了很多东西.

因此,您已经有一个生成此文本的程序:

  prefactor = + s.ds8 * s.ds10 * ti [0]-> value();expr =(-5/243.*(s.x14 * s.x15 * csc [49300] + 9/10. * s.x14 * s.x15 * csc [49301] +1/10. * s.x14 * s.x15 * csc [49302]-3/5. * s.x14 * s.x15 * csc [49303] -... 

  double csc19295 = + s.ds0 * s.ds1 * s.ds2 *(-32 * s.x12pow2 * s.x15 * s.x34 * s.mbpow2 * s.mWpowinv2-32 * s.x12pow2 * s.x15 * s.x35 * s.mbpow2 * s.mWpowinv2-32 * s.x12pow2 * s.x15 * s.x35 * s.x45 * s.mWpowinv2 -... 

对吗?

如果您所有的函数都具有相似的格式"(m次乘以n个数字,然后将结果相加-或类似的结果),那么我认为您可以这样做:

array + evaluator将代表与您的函数之一相同的逻辑,但只有评估器将是代码.该数组是数据",可以在运行时生成,也可以保存在磁盘上并读取i块或与内存映射的文件.

对于func1中的特定示例,请想象一下,如果可以访问 s csc 的基地址以及类似需要添加到基地址以获取 x14 ds8 csc [51370]

您需要创建一种新的数据"形式,该形式将描述如何处理传递给大量功能的实际数据.

I have a huge number of functions totaling around 2.8 GB of object code (unfortunately there's no way around, scientific computing ...)

When I try to link them, I get (expected) relocation truncated to fit: R_X86_64_32S errors, that I hoped to circumvent by specifing the compiler flag -mcmodel=medium. All libraries that are linked in addition that I have control of are compiled with the -fpic flag.

Still, the error persists, and I assume that some libraries I link to are not compiled with PIC.

Here's the error:

/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_32S against symbol `__libc_csu_fini'     defined in .text section in /usr/lib64/libc_nonshared.a(elf-init.oS)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x19): relocation truncated to fit: R_X86_64_32S against symbol `__libc_csu_init'    defined in .text section in /usr/lib64/libc_nonshared.a(elf-init.oS)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crti.o: In function    `call_gmon_start':
(.text+0x7): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol      `__gmon_start__'
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0xb): relocation truncated to fit: R_X86_64_PC32 against `.bss' 
crtstuff.c:(.text+0x13): relocation truncated to fit: R_X86_64_32 against symbol `__DTOR_END__' defined in .dtors section in /usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtend.o
crtstuff.c:(.text+0x19): relocation truncated to fit: R_X86_64_32S against `.dtors'
crtstuff.c:(.text+0x28): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x38): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x3f): relocation truncated to fit: R_X86_64_32S against `.dtors'
crtstuff.c:(.text+0x46): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x51): additional relocation overflows omitted from the output
collect2: ld returned 1 exit status
make: *** [testsme] Error 1

And system libraries I link against:

-lgfortran -lm -lrt -lpthread

Any clues where to look for the problem?

EDIT:

First of all, thank you for the discussion...

To clarify a bit, I have hundreds of functions (each approx 1 MB in size in separate object files) like this:

double func1(std::tr1::unordered_map<int, double> & csc, 
             std::vector<EvaluationNode::Ptr> & ti, 
             ProcessVars & s)
{
    double sum, prefactor, expr;

    prefactor = +s.ds8*s.ds10*ti[0]->value();
    expr =       ( - 5/243.*(s.x14*s.x15*csc[49300] + 9/10.*s.x14*s.x15*csc[49301] +
           1/10.*s.x14*s.x15*csc[49302] - 3/5.*s.x14*s.x15*csc[49303] -
           27/10.*s.x14*s.x15*csc[49304] + 12/5.*s.x14*s.x15*csc[49305] -
           3/10.*s.x14*s.x15*csc[49306] - 4/5.*s.x14*s.x15*csc[49307] +
           21/10.*s.x14*s.x15*csc[49308] + 1/10.*s.x14*s.x15*csc[49309] -
           s.x14*s.x15*csc[51370] - 9/10.*s.x14*s.x15*csc[51371] -
           1/10.*s.x14*s.x15*csc[51372] + 3/5.*s.x14*s.x15*csc[51373] +
           27/10.*s.x14*s.x15*csc[51374] - 12/5.*s.x14*s.x15*csc[51375] +
           3/10.*s.x14*s.x15*csc[51376] + 4/5.*s.x14*s.x15*csc[51377] -
           21/10.*s.x14*s.x15*csc[51378] - 1/10.*s.x14*s.x15*csc[51379] -
           2*s.x14*s.x15*csc[55100] - 9/5.*s.x14*s.x15*csc[55101] -
           1/5.*s.x14*s.x15*csc[55102] + 6/5.*s.x14*s.x15*csc[55103] +
           27/5.*s.x14*s.x15*csc[55104] - 24/5.*s.x14*s.x15*csc[55105] +
           3/5.*s.x14*s.x15*csc[55106] + 8/5.*s.x14*s.x15*csc[55107] -
           21/5.*s.x14*s.x15*csc[55108] - 1/5.*s.x14*s.x15*csc[55109] -
           2*s.x14*s.x15*csc[55170] - 9/5.*s.x14*s.x15*csc[55171] -
           1/5.*s.x14*s.x15*csc[55172] + 6/5.*s.x14*s.x15*csc[55173] +
           27/5.*s.x14*s.x15*csc[55174] - 24/5.*s.x14*s.x15*csc[55175] +
           // ...
           ;

        sum += prefactor*expr;
    // ...
    return sum;
}

The object s is relatively small and keeps the needed constants x14, x15, ..., ds0, ..., etc. while ti just returns a double from an external library. As you can see, csc[] is a precomputed map of values which is also evaluated in separate object files (again hundreds with about ~1 MB of size each) of the following form:

void cscs132(std::tr1::unordered_map<int,double> & csc, ProcessVars & s)
{
    {
    double csc19295 =       + s.ds0*s.ds1*s.ds2 * ( -
           32*s.x12pow2*s.x15*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x15*s.x35*s.x45*s.mWpowinv2 -
           32*s.x12pow2*s.x25*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x25*s.x35*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x25*s.x35*s.x45*s.mWpowinv2 +
           32*s.x12pow2*s.x34*s.mbpow4*s.mWpowinv2 +
           32*s.x12pow2*s.x34*s.x35*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x34*s.x45*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x35*s.mbpow4*s.mWpowinv2 +
           32*s.x12pow2*s.x35pow2*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x35pow2*s.x45*s.mWpowinv2 +
           64*s.x12pow2*s.x35*s.x45*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x35*s.x45pow2*s.mWpowinv2 -
           64*s.x12*s.p1p3*s.x15*s.mbpow4*s.mWpowinv2 +
           64*s.x12*s.p1p3*s.x15pow2*s.mbpow2*s.mWpowinv2 +
           96*s.x12*s.p1p3*s.x15*s.x25*s.mbpow2*s.mWpowinv2 -
           64*s.x12*s.p1p3*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
           64*s.x12*s.p1p3*s.x15*s.x45*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x25*s.mbpow4*s.mWpowinv2 +
           32*s.x12*s.p1p3*s.x25pow2*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x25*s.x35*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x25*s.x45*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x45*s.mbpow2 +
           64*s.x12*s.x14*s.x15pow2*s.x35*s.mWpowinv2 +
           96*s.x12*s.x14*s.x15*s.x25*s.x35*s.mWpowinv2 +
           32*s.x12*s.x14*s.x15*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.x14*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
           64*s.x12*s.x14*s.x15*s.x35pow2*s.mWpowinv2 -
           32*s.x12*s.x14*s.x15*s.x35*s.x45*s.mWpowinv2 +
           32*s.x12*s.x14*s.x25pow2*s.x35*s.mWpowinv2 +
           32*s.x12*s.x14*s.x25*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.x14*s.x25*s.x35pow2*s.mWpowinv2 -
           // ...
    
       csc.insert(cscMap::value_type(192953, csc19295));
    }

    {
       double csc19296 =      // ... ;

       csc.insert(cscMap::value_type(192956, csc19296));
    }

    // ...
}

That's about it. The final step then just consists in calling all those func[i] and summing the result up.

Concerning the fact that this is a rather special and unusual case: Yes, it is. This is what people have to cope with when trying to do high precision computations for particle physics.

EDIT2:

I should also add that x12, x13, etc. are not really constants. They are set to specific values, all those functions are run and the result returned, and then a new set of x12, x13, etc. is chosen to produce the next value. And this has to be done 105 to 106 times...

EDIT3:

Thank you for the suggestions and the discussion so far... I'll try to roll the loops up upon code generation somehow, not sure how to this exactly, to be honest, but this is the best bet.

BTW, I didn't try to hide behind "this is scientific computing -- no way to optimize".
It's just that the basis for this code is something that comes out of a "black box" where I have no real access to and, moreover, the whole thing worked great with simple examples, and I mainly feel overwhelmed with what happens in a real world application...

EDIT4:

So, I have managed to reduce the code size of the csc definitions by about one forth by simplifying expressions in a computer algebra system (Mathematica). I see now also some way to reduce it by another order of magnitude or so by applying some other tricks before generating the code (which would bring this part down to about 100 MB) and I hope this idea works.

Now related to your answers:

I'm trying to roll the loops back up again in the funcs, where a CAS won't help much, but I have already some ideas. For instance, sorting the expressions by the variables like x12, x13,..., parse the cscs with Python and generate tables that relate them to each other. Then I can at least generate these parts as loops. As this seems to be the best solution so far, I mark this as the best answer.

However, I'd like to also give credit to VJo. GCC 4.6 indeed works much better, produces smaller code and is faster. Using the large model works at the code as-is. So technically this is the correct answer, but changing the whole concept is a much better approach.

Thank you all for your suggestions and help. If anyone is interested, I'm going to post the final outcome as soon as I am ready.

REMARKS:

Just some remarks to some other answers: The code I'm trying to run does not originate in an expansion of simple functions/algorithms and stupid unnecessary unrolling. What actually happens is that the stuff we start with is pretty complicated mathematical objects and bringing them to a numerically computable form generates these expressions. The problem lies actually in the underlying physical theory. Complexity of intermediate expressions scales factorially, which is well known, but when combining all of this stuff to something physically measurable -- an observable -- it just boils down to only a handful of very small functions that form the basis of the expressions. (There is definitely something "wrong" in this respect with the general and only available ansatz which is called "perturbation theory") We try to bring this ansatz to another level, which is not feasible analytically anymore and where the basis of needed functions is not known. So we try to brute-force it like this. Not the best way, but hopefully one that helps with our understanding of the physics at hand in the end...

LAST EDIT:

Thanks to all your suggestions, I've managed to reduce the code size considerably, using Mathematica and a modification of the code generator for the funcs somewhat along the lines of the top answer :)

I have simplified the csc functions with Mathematica, bringing it down to 92 MB. This is the irreducible part. The first attempts took forever, but after some optimizations this now runs through in about 10 minutes on a single CPU.

The effect on the funcs was dramatic: The whole code size for them is down to approximately 9 MB, so the code now totals in the 100 MB range. Now it makes sense to turn optimizations on and the execution is quite fast.

Again, thank you all for your suggestions, I've learned a lot.

解决方案

So, you already have a program that produces this text:

prefactor = +s.ds8*s.ds10*ti[0]->value();
expr = ( - 5/243.*(s.x14*s.x15*csc[49300] + 9/10.*s.x14*s.x15*csc[49301] +
       1/10.*s.x14*s.x15*csc[49302] - 3/5.*s.x14*s.x15*csc[49303] -...

and

double csc19295 =       + s.ds0*s.ds1*s.ds2 * ( -
       32*s.x12pow2*s.x15*s.x34*s.mbpow2*s.mWpowinv2 -
       32*s.x12pow2*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
       32*s.x12pow2*s.x15*s.x35*s.x45*s.mWpowinv2 -...

right?

If all your functions have a similar "format" (multiply n numbers m times and add the results - or something similar) then I think you can do this:

  • change the generator program to output offsets instead of strings (i.e. instead of the string "s.ds0" it will produce offsetof(ProcessVars, ds0)
  • create an array of such offsets
  • write an evaluator which accepts the array above and the base addresses of the structure pointers and produces an result

The array+evaluator will represent the same logic as one of your functions, but only the evaluator will be code. The array is "data" and can be either generated at runtime or saved on disk and read i chunks or with a memory mapped file.

For your particular example in func1 imagine how you would rewrite the function via an evaluator if you had access to the base address of s and csc and also a vector like representation of the constants and the offsets you need to add to the base addresses to get to x14, ds8 and csc[51370]

You need to create a new form of "data" that will describe how to process the actual data you pass to your huge number of functions.

这篇关于编译大于2 GB的代码时,如何解决GCC编译错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆