是否有一个比lex / flex更好的(更现代的)工具来生成C ++的分词器? [英] Is there a better (more modern) tool than lex/flex for generating a tokenizer for C++?

查看:230
本文介绍了是否有一个比lex / flex更好的(更现代的)工具来生成C ++的分词器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近添加了源文件解析到现有的工具,从复杂的命令行参数生成输出文件。

I recent added source file parsing to an existing tool that generated output files from complex command line arguments.

命令行参数变得非常复杂,我们开始允许它们作为一个文件提供,它被解析为一个非常大的命令行,但是语法仍然尴尬。所以我添加了使用更合理的语法解析源文件的能力。

The command line arguments got to be so complex that we started allowing them to be supplied as a file that was parsed as if it was a very large command line, but the syntax was still awkward. So I added the ability to parse a source file using a more reasonable syntax.

我使用flex 2.5.4为windows生成这个自定义源文件格式的分词器,它工作。但我讨厌的代码。全局变量,wierd命名约定和它生成的c ++代码是可怕的。现有的代码生成后端粘贴到flex的输出 - 我不使用yacc或bison。

I used flex 2.5.4 for windows to generate the tokenizer for this custom source file format, and it worked. But I hated the code. global variables, wierd naming convention, and the c++ code it generated was awful. The existing code generation backend was glued to the output of flex - I don't use yacc or bison.

我要回溯到该代码,我想使用更好/更现代的工具。

I'm about to dive back into that code, and I'd like to use a better/more modern tool. Does anyone know of something that.


  • 在Windows命令提示符下运行(Visual studio集成是确定的,但我使用make文件来构建) / li>
  • 生成正确的封装C ++令牌生成器。 (没有全局变量)

  • 使用正则表达式描述标记规则(与lex语法一致)

  • 不强迫我使用c-runtime(或假它)用于文件读取。 (从内存中解析)

  • 当我的规则强制分词器回溯(或自动修正)时发出警告

  • 给予我完全控制变量

  • 允许我将多个解析器链接到一个没有名称冲突的.exe中。

  • 可以如果我想要生成一个UNICODE(16位UCS-2)解析器

  • 不是集成的tokenizer +解析器生成器(我想要一个lex替换,而不是lex + yacc替换) / li>
  • Runs in Windows command prompt (Visual studio integration is ok, but I use make files to build)
  • Generates a proper encapsulated C++ tokenizer. (No global variables)
  • Uses regular expressions for describing the tokenizing rules (compatible with lex syntax a plus)
  • Does not force me to use the c-runtime (or fake it) for file reading. (parse from memory)
  • Warns me when my rules force the tokenizer to backtrack (or fixes it automatically)
  • Gives me full control over variable and method names (so I can conform to my existing naming convention)
  • Allows me to link multiple parsers into a single .exe without name collisions
  • Can generate a UNICODE (16bit UCS-2) parser if I want it to
  • Is NOT an integrated tokenizer + parser-generator (I want a lex replacement, not a lex+yacc replacement)

我可能会使用一个只生成分词表的工具,如果这是唯一可用的。

I could probably live with a tool that just generated the tokenizing tables if that was the only thing available.

推荐答案

Ragel: http://

Ragel: http://www.complang.org/ragel/ It fits most of your requirements.


  • 它在Windows上运行

  • 它不声明变量,所以你可以把它们放在一个类或者一个函数内你喜欢。

  • 它有很好的工具,用于分析正则表达式看看他们什么时候回溯。 (我不知道这个,因为我从来没有使用Ragel语法创建一个回溯解析器。)

  • 变量名不能改变。

  • 表名称以机器名称作为前缀,并且它们被声明为const static,因此您可以在同一个文件中放置多个,并且在单个程序中有多个具有相同名称的

  • 您可以将变量声明为任何整数类型,包括UChar(或者您喜欢的任何UTF-16类型)。它不会自动处理代理对。它不具有用于Unicode的特殊字符类(我认为)。

  • 它只有正则表达式...没有bison / yacc特性。

  • It runs on Windows
  • It doesn't declare the variables, so you can put them inside a class or inside a function as you like.
  • It has nice tools for analyzing regular expressions to see when they would backtrack. (I don't know about this very much, since I never use syntax in Ragel that would create a backtracking parser.)
  • Variable names can't be changed.
  • Table names are prefixed with the machine name, and they're declared "const static", so you can put more than one in the same file and have more than one with the same name in a single program (as long as they're in different files).
  • You can declare the variables as any integer type, including UChar (or whatever UTF-16 type you prefer). It doesn't automatically handle surrogate pairs, though. It doesn't have special character classes for Unicode either (I think).
  • It only does regular expressions... has no bison/yacc features.

它产生的代码对程序的干扰很小。代码也是令人难以置信的快,和Ragel语法更灵活和可读性比任何我见过的。这是一个坚实的软件。它可以生成表驱动解析器或goto驱动解析器。

The code it generates interferes very little with a program. The code is also incredibly fast, and the Ragel syntax is more flexible and readable than anything I've ever seen. It's a rock solid piece of software. It can generate a table-driven parser or a goto-driven parser.

这篇关于是否有一个比lex / flex更好的(更现代的)工具来生成C ++的分词器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆