将正则表达式转换/编译为C代码 [英] convert/compile regular expressions to C code

查看:291
本文介绍了将正则表达式转换/编译为C代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个内存有限的系统, boost :: regex 太大。有什么选项存在,以编译我的正则表达式直接到C / C + +和多少KB的代码大小应该除外?以尽可能减少内存和代码大小为目标。

I am on a memory limited system, boost::regex is too large. What options exist to compile my regular expression straight to C/C++ and how many KB of code size should I except? With the goal of reducing memory and code size as much as possible.

我正在寻找低于100kb的代码大小和相同的内存使用。 Boost正则表达式大约为470kb,太大了。

I am looking for under 100kb of code size and the same in memory usage. Boost regex appears to be approx 470kb which is too large.

推荐答案

lex (和 flex )产生表驱动的词法分析器,它通常很小;他们回到100kB被认为是超级计算机的日子:)基本的 flex 代码框架很小(几个kB),表依赖于多少令牌类型你有正则表达式有多复杂,但是一个简单的 flex 扫描器表通常也有几个kB。

lex (and flex) produce table-driven lexers which are generally pretty small; they go back to the days when 100kB would have been considered a supercomputer :) The basic flex code skeleton is tiny (a few kB) and the tables depend on how many token types you have and how complicated the regular expressions are, but a simple flex scanner table are typically a few kB as well.

然而,如果你不使用它们构建一个解释器/编译器,他们有一些恼人的特点:首先,他们坚持你的输入和缓冲,这是很好,如果你总是从一个文件,但是如果你的输入来自一个套接字或终端(或更糟糕的是,由某种翻译器进行预处理),第二,它们是设计为一个环境中有一些简单的令牌类型,而你有一个负责解释排序的解析器。 (因此 yacc bison 。)你可以使用这些工具来解析HTTP,你已经学到了一些有用的新技能。

However, if you're not using them for building an interpreter/compiler, they do have a couple of annoying characteristics: first, they insist on doing your input and buffering for you, which is nice if you're always reading from a file but can be less cool if your input is coming from a socket or terminal (or, worse, being preprocessed by some kind of translator), and second they are designed for an environment where you have a few simple token types, and you have a parser which is responsible for interpreting the sequencing. (Hence yacc or bison.) You could use these tools to parse HTTP, certainly, and you might even find that you've learned some useful new skills.

有一个工具叫做 re2c 你可能会发现有点更舒服。与 lex 不同,它生成定制的C代码,这是相当有点庞大,但可以说运行稍快。我不认为这是积极维护,但我有很多的成功与它几年前。您应该可以在在SourceForge 上找到它。

There is a tool called re2c (i.e. regular expression to C) which you might find a little more comfortable. Unlike lex, it produces customized C code, which is quite a bit bulkier, but arguably runs slightly faster. I don't think it's being actively maintained, but I had quite a lot of success with it some years back. You should be able to find it on SourceForge.

祝你好运。

这篇关于将正则表达式转换/编译为C代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆