有限状态机解析器 [英] Finite State Machine parser

查看:203
本文介绍了有限状态机解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在C ++ 中使用类似FSM的解析器解析自行设计的文件格式(这是一个教授自己的c ++ - the-hard-way -by-doing-something-big-and-difficult 种类的项目:))。我有一个标记化的字符串,换行符表示euh ...行的结束。有关输入示例,请参见此处。所有的评论will和垃圾被过滤掉,所以我有一个std :: string这样:

I would like to parse a self-designed file format with a FSM-like parser in C++ (this is a teach-myself-c++-the-hard-way-by-doing-something-big-and-difficult kind of project :)). I have a tokenized string with newlines signifying the end of a euh... line. See here for an input example. All the comments will and junk is filtered out, so I have a std::string like this:

global \n { \n SOURCE_DIRS src \n HEADER_DIRS include \n SOURCES bitwise.c framing.c \n HEADERS ogg/os_types.h ogg/ogg.h \n } \n ...

语法说明:


  • {}

  • \ n仅在选项/文件列表中很重要,表示列表的结尾。

所以我认为FSM对我的需求/知识来说是简单/可扩展的。至于我可以告诉(并希望我的文件设计是),我不需要并发状态或任何喜欢的东西。一些设计/实现问题:

So I thought that a FSM would be simple/extensible enough for my needs/knowledge. As far as I can tell (and want my file design to be), I don't need concurrent states or anything fancy like that. Some design/implementation questions:


  1. 我应该使用枚举 code> class +我的州的导数?第一个可能更好的小语法,但可能会变得丑陋,第二个是完全相反。我倾向于第一,因为它的简单。 枚举示例类示例。 EDIT:此建议适用于 goto code>,我认为他们在C ++中是邪恶的?

  2. 读取列表时,我需要不忽略 \\\
    。我使用字符串通过 stringstream 的首选方式将忽略 \\\
    。因此,我需要简单的告诉方式(同样!) stringstream ,以便在启用某个状态时不忽略换行符。

  3. 简单 enum 状态足以进行多级解析(范围内的范围 {... {...} ...}
  4. 这里是我想到的草案状态:

    • :读取全局,exe,lib +目标名称...

    • 可以读取SOURCES ...,创建用户变量...

    • 列表:将列表添加到列表中,直到遇到换行符。

  1. Should I use an enum or an abstract class + derivatives for my states? The first is probably better for small syntax, but could get ugly later, and the second is the exact opposite. I'm leaning to the first, for its simplicity. enum example and class example. what about this suggestion for goto, I thought they were evil in C++?
  2. When reading a list, I need to NOT ignore \n. My preferred way of using the string via stringstream, will ignore \n by default. So I need simple way of telling (the same!) stringstream to not ignore newlines when a certain state is enabled.
  3. Will the simple enum states suffice for multi-level parsing (scopes within scopes {...{...}...}) or would that need hacky implementations?
  4. Here's the draft states I have in mind:
    • upper: reads global, exe, lib+ target names...
    • normal: inside a scope, can read SOURCES..., create user variables...
    • list: adds items to a list until a newline is encountered.

每个作用域都有一种条件(例如win32:global {gcc :CFLAGS = ...}),并且需要以完全相同的方式处理evey(即使在 list 状态,每个项目)。

Each scope will have a kind of conditional (e.g. win32:global { gcc:CFLAGS = ... }) and will need to be handled in the exact same fashion eveywhere (even in the list state, per item).

感谢任何输入。

推荐答案

如果你有嵌套范围,是不是正确的方式,你应该看一个上下文自由语法分析器。 LL(1)解析器可以写成一组递归函数,或 LALR(1)解析器可以使用解析器生成器如Bison编写。

If you have nesting scopes, then a Finite State Machine is not the right way to go, and you should look at a Context Free Grammar parser. An LL(1) parser can be written as a set of recursive funcitons, or an LALR(1) parser can be written using a parser generator such as Bison.

如果您将一个堆栈添加到FSM,那么您将进入下推自动机区域。非确定性下推自动机等效于上下文无关语法(尽管确定性下推自动机严格来说不太强大。)LALR (1)解析器生成器实际上在内部生成确定性下推自动机。一个好的编译器设计教科书将涵盖从语法构建下推自动机的确切算法。 (这样,添加堆栈不是hacky。​​)此维基百科文章还介绍了如何构建LR

If you add a stack to an FSM, then you're getting into pushdown automaton territory. A nondeterministic pushdown automaton is equivalent to a context free grammar (though a deterministic pushdown automaton is strictly less powerful.) LALR(1) parser generators actually generate a deterministic pushdown automaton internally. A good compiler design textbook will cover the exact algorithm by which the pushdown automaton is constructed from the grammar. (In this way, adding a stack isn't "hacky".) This Wikipedia article also describes how to construct the LR(1) pushdown automaton from your grammar, but IMO, the article is not as clear as it could be.

如果你的范围嵌套有限,你的语法,但IMO,文章不清楚。 正常列表已嵌套 list s或嵌套正常),则可以使用FSM而不使用堆栈。

If your scopes nest only finitely deep (i.e. you have the upper, normal and list levels but you don't have nested lists or nested normals), then you can use a FSM without a stack.

这篇关于有限状态机解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆