轻松地“逆向” C预处理器宏的集合 [英] 'Reverse' a collection of C preprocessor macros easily

查看:99
本文介绍了轻松地“逆向” C预处理器宏的集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多 预处理器宏定义,例如:

  #define FOO 1 
#define BAR 2
#define BAZ 3

在实际应用中,每个定义对应于解释器虚拟机中的一条指令。宏在编号上也不是连续的,以留出空间供将来使用;可能有一个 #define FOO 41 ,然后下一个是 #define BAR 64



我现在正在为此虚拟机调试器上工作,并且需要有效地反转这些前置宏。换句话说,我需要一个具有 number 并返回宏名称的函数,例如输入2将返回 BAR



当然,我可以使用<$ c创建一个函数$ c> switch 我自己:

  const char * struction_by_id(int id){
switch(id){
case FOO:
返回 FOO;
case BAR:
返回 BAR;
情况BAZ:
返回 BAZ;
默认值:
返回 ???;
}
}

但是,自重命名以来,这将一直是噩梦,删除或添加指令也将需要修改此功能。



是否还有另一个宏可以用来为我创建这样的函数?其他方法?如果没有,是否可以创建一个宏来执行此任务?



我在Windows 10上使用的是gcc 6.3。

解决方案

您的方法有误。 如果尚未阅读,请阅读 SICP


我有很多预处理器宏定义,例如:

  #define FOO 1 
#define BAR 2
#define BAZ 3


记住可以生成C或C ++代码,而且很容易指导您的构建自动化工具来生成某些特定的C文件(使用 GNU make 忍者,您只需添加一些规则或配方)。


例如,您可以使用一些不同预处理器(liek GPP m4 ),或某些脚本-例如在 awk Python Guile 等...,或编写您自己的程序(用C,C ++,Ocaml等...)来生成包含这些 #define -s。另一个脚本或程序(或相同的脚本,以不同的方式调用)可以生成 instruction_by_id

的C代码

这样的基本 元编程技术(用于从较高级别但特定的内容中生成一些或几个C文件)至少在1980年代就开始使用(例如, yacc RPCGEN )。 C预处理器通过其 #include 指令(因为您甚至可以在某些函数体的内部 行中包含此类,等等)。实际上,代码是数据(和证明)和数据是代码的想法甚至更古老( Church-Turing论文 Curry-Howard对应停止问题)。 哥德尔,埃舍尔,巴赫 书非常有趣。...


例如,您可以决定使用文本文件 opcodes.txt (甚至一些 sqlite 包含内容的数据库....),例如

 #忽略以井号开头的行
FOO 1
BAR 2

并且有两个小的 awk 或Python脚本(或两个小的C专用程序),其中一个生成了 #define -s (放入 opcode-defines.h )和另一个生成 instruction_by_id 的正文(进入 opcode -instr.inc )。然后,您需要调整您的 Makefile 来生成这些文件,然后将 #include opcode-defines.h 放入其中一些全局头文件,并且具有

  const char * struction_by_id(int id){
switch(id){
#include opcode-instr.inc;
默认值:返回 ???;
}
}



这将是一场噩梦。 / p>

这种元编程方法并非如此。您只需维护 opcodes.txt 以及使用该脚本的脚本,但可以在指定的条件下表达给定的知识元素。 ( FOO 与1的关系)(在 opcode.txt 的一行中)。当然,您需要对此进行记录(至少,在您的 Makefile 中带有注释)。


一些更高级别的元编程,声明性形式化,是一个非常强大的范例。自从退休以来,J.Pitrat在法国率先提出了这一点(他今天写了一个有趣的 blog ,但已退休) 1960年代。在美国, J.MacCarthy Lisp 社区。

有关有趣的演讲,请参阅Liam Proven FOSDEM 2018在电路较少走动

上的演讲

大型软件经常使用这种元编程方法。例如, GCC编译器具有大约十二个C ++代码生成器(总计,它们产生的代码数量超过一百万个C ++行)。


查看这种方法的另一种方法是特定于域的语言,可以编译为C 。如果您使用提供动态加载的操作系统,您甚至可以编写一个包含C代码的程序,分叉一个过程以将其编译为某个插件,然后将该插件(在POSIX或Linux上,使用 dlopen )。有趣的是,计算机现在足够快,可以在交互式应用程序中启用这种方法(以某种 REPL ):您可以发出几千行的C文件,并将其编译为某些 .so 共享对象文件,而 dlopen 只需几分之一秒。您还可以使用 GCCJIT 或LLVM的JIT编译库在运行时生成代码。您可以嵌入解释器(例如 Lua Guile )插入程序。


顺便说一句,元编程方法是基本编译技术应该为大多数开发人员所了解(不仅限于编译器行业的人);另一个原因是解析问题非常普遍。因此,请阅读 龙书 。 / p>

请注意格林斯潘的第十条规则。这不只是一个玩笑,实际上是关于大型软件的深刻真理。


I have a lot of preprocessor macro definitions, like this:

#define FOO 1
#define BAR 2
#define BAZ 3

In the real application, each definition corresponds to an instruction in an interpreter virtual machine. The macros are also not sequential in numbering to leave space for future instructions; there may be a #define FOO 41, then the next one is #define BAR 64.

I'm now working on a debugger for this virtual machine, and need to effectively 'reverse' these preprecessor macros. In other words, I need a function which takes the number and returns the macro name, e.g. an input of 2 returns "BAR".

Of course, I could create a function using a switch myself:

const char* instruction_by_id(int id) {
    switch (id) {
        case FOO:
            return "FOO";
        case BAR:
            return "BAR";
        case BAZ:
            return "BAZ";
        default:
            return "???";
    }
}

However, this will a nightmare to maintain, since renaming, removing or adding instructions will require this function to be modified too.

Is there another macro which I can use to create a function like this for me, or is there some other approach? If not, is it possible to create a macro to perform this task?

I'm using gcc 6.3 on Windows 10.

解决方案

You have the wrong approach. Read SICP if you have not read it.

I have a lot of preprocessor macro definitions, like this:

#define FOO 1
#define BAR 2
#define BAZ 3

Remember that C or C++ code can be generated, and it is quite easy to instruct your build automation tool to generate some particular C file (with GNU make or ninja you just add some rule or recipe).

For example, you could use some different preprocessor (liek GPP or m4), or some script -e.g. in awk or Python or Guile, etc..., or write your own program (in C, C++, Ocaml, etc...), to generate the header file containing these #define-s. And another script or program (or the same one, invoked differently) could generate the C code of instruction_by_id

Such basic metaprogramming techniques (of generating some or several C files from something higher level but specific) have been used since at least the 1980s (e.g. with yacc or RPCGEN). The C preprocessor facilitates that with its #include directive (since you can even include lines inside some function body, etc...). Actually, the idea that code is data (and proof) and data is code is even older (Church-Turing thesis, Curry-Howard correspondence, Halting problem). The Gödel, Escher, Bach book is very entertaining....

For example, you could decide to have a textual file opcodes.txt (or even some sqlite database containing stuff....) like

# ignore lines starting with an hashsign
FOO 1
BAR 2

and have two small awk or Python scripts (or two tiny C specialized programs), one generating the #define-s (into opcode-defines.h) and another generating the body of instruction_by_id (into opcode-instr.inc). Then you need to adapt your Makefile to generate these, and put #include "opcode-defines.h" inside some global header, and have

 const char* instruction_by_id(int id) {
    switch (id) {
 #include "opcode-instr.inc"
    default: return "???";
    }
 }

this will a nightmare to maintain,

Not so with such a metaprogramming approach. You'll just maintain opcodes.txt and the scripts using it, but you express a given "knowledge element" (the relation of FOO to 1) only once (in a single line of opcode.txt). Of course you need to document that (at the very least, with comments in your Makefile).

Metaprogramming from some higher-level, declarative formalization, is a very powerful paradigm. In France, J.Pitrat pioneered it (and he is writing an interesting blog today, while being retired) since the 1960s. In the US, J.MacCarthy and the Lisp community also.

For an entertaining talk, see Liam Proven FOSDEM 2018 talk on The circuit less traveled

Large software are using that metaprogramming approach quite often. For example, the GCC compiler have about a dozen of C++ code generators (in total, they are emitting more than a million of C++ lines).

Another way of looking at such an approach is the idea of domain-specific languages that could be compiled to C. If you use an operating system providing dynamic loading, you can even write a program emitting C code, forking a process to compile it into some plugin, then loading that plugin (on POSIX or Linux, with dlopen). Interestingly, computers are now fast enough to enable such an approach in an interactive application (in some sort of REPL): you can emit a C file of a few thousand lines, compile it into some .so shared object file, and dlopen that, in a fraction of second. You could also use JIT-compiling libraries like GCCJIT or LLVM to generate code at runtime. You could embed an interpreter (like Lua or Guile) into your program.

BTW, metaprogramming approaches is one of the reasons why basic compilation techniques should be known by most developers (and not only just people in the compiler business); another reason is that parsing problems are very common. So read the Dragon Book.

Be aware of Greenspun's tenth rule. It is much more than a joke, actually a profound truth about large software.

这篇关于轻松地“逆向” C预处理器宏的集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆