在Emacs Lisp中进行解析 [英] Parsing in Emacs Lisp

查看:117
本文介绍了在Emacs Lisp中进行解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Emacs Lisp写一个解析器。文本文件
的解析器如下所示:

 规则:
int:1,2, 3,...
string:和,或者,当
文本:
----------
Lorem ipsum dolor sit amet,consectetuer adipiscing ELIT。 pellentesque
在tellus。在pharetra结果augue。在骗子Curabitur
pellentesque iaculis。 Proin magna odio,posuere sed,commodo nec,
varius nec,tortor。
----------
更多:...

规则:
...

我不太在乎密钥(int,string,...)。我想要
的值。所以对于上面的文件,int的值为1,2,3,...,字符串
,或者当然,当和文本Lorem ...(不包括破折号) p>

我正在考虑两种不同的解决方案,但我不使用哪种解决方案。我应该:


  1. 创建一个简单的解析器,循环遍历所有行,每个
    行与一些正则表达式匹配然后分组我想要的部分?


  2. 使用词法分析器和解析器来做更复杂的解析器?


现在文件很简单,我想我不需要像第二个选项一样做
某事。但是这些文件可能会使
更复杂一些,所以我想让它更容易扩展。



你将如何解决这个问题?

解决方案

您是否熟悉递归下降解析器?手工编写的程序语言比较容易,包括Emacs Lisp。对于非常简单的解析,您可以经常使用查找搜索转发。这些也将构成您的递归下降解析器调用的任何标记化例程的基础,或



[11 Feb 2009] 我在下面的emacs lisp中添加了一个示例递归下降解析器。它解析简单的算术表达式,包括加法,减法,乘法,除法,求幂和括号子表达式。现在,它假定所有令牌都在全局变量 * tokens * 中,但如果修改 gettok peektok 根据需要,您可以让他们穿过缓冲区。要使用它,只需尝试以下内容:

 (setq * token *'(3 ^ 5 ^ 7 + 5 * 3 + 7/11))
(rdh / expr)
=> (+(+(^ 3(^ 5 7))(* 5 3))(/ 7 11))

解析代码如下。

 (defun gettok()
(和* token *(pop *令牌*)))
(defun peektok()
(和* token *(car * token *)))

(defun rdh / expr()
(rdh / expr-tail(rdh / factor)))

(defun rdh / expr-tail(expr)
(let((tok(peektok)))
$($)
(等于tok))
expr)
((成员tok'(+ - ))
(gettok)
(let((fac(rdh / factor)))
(rdh / expr-tail(list tok expr fac))))
(t(errorbad expr)))) b
$ b(defun rdh / factor()
(rdh / factor-tail(rdh / term)))

(defun rdh / factor-tail(fac)
(let((tok(peektok)))
(cond((或(null tok))
(成员tok'()+ - )))
fac)
((成员tok'(* /))
(gettok)
(let((term(rdh / term)))
(rdh / factor-tail ()()()()()()()()()()() / $)
(tok(peektok)))
(cond((或(null tok))
(成员tok'()+ - / *))
$ b((等于tok'^)
(gettok)
(list tok prim(rdh / term)))
(t(错误bad term)) ))

(defun rdh / prim()
(let((tok(gettok)))
(cond((numberp tok)tok)
(等于tok()
(let *((expr(rdh / expr))
(tok(peektok)))
(if(not(equal tok)
(错误坏括号expr)
(gettok)
expr)))
(t(错误bad prim))))


I'm writing a parser in Emacs Lisp. It's a parser for text files looking like this:

rule:
  int: 1, 2, 3, ...
  string: and, or, then, when
  text:
  ----------
  Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pellentesque
  in tellus. In pharetra consequat augue. In congue. Curabitur
  pellentesque iaculis eros. Proin magna odio, posuere sed, commodo nec,
  varius nec, tortor.
  ----------
  more: ...

rule:
  ...

I don't really care about the key (int, string, ...). I want the value. So for the file above int has value "1, 2, 3, ...", string "and, or, then, when" and text "Lorem ..." (excluding the dashes).

I'm thinking about two different solutions, but I don't which one to use. Should I:

  1. create a simple parser that loops through all lines and for each line matches it with some regex and then group the parts I want out?

  2. do a more sophisticated parser with a lexer and a parser?

Right now the files are quite simple and I guess I don't need to do something as advance as the second option. But these files may get a bit more complicated, so I want to make it easy to extend.

How would you solve this?

解决方案

Are you already familiar with recursive descent parsers? They're relatively easy to write by hand in your favourite programming language, which would include Emacs Lisp. For very simple parsing, you can often get by with looking-at and search-forward. These would also form the basis of any tokenizing routines that would be called by your recursive descent parser, or any other style of parser.

[11 Feb 2009] I added an example recursive descent parser in emacs lisp below. It parses simple arithmetic expressions including addition, subtraction, multiplication, division, exponentiation, and parenthesized sub-expressions. Right now, it assumes all tokens are in the global variable *tokens*, but if you modify gettok and peektok as necessary you can have them walk through a buffer. To use it as is, just try out the following:

(setq *token* '( 3 ^ 5 ^ 7 + 5 * 3 + 7 / 11))
(rdh/expr)
=> (+ (+ (^ 3 (^ 5 7)) (* 5 3)) (/ 7 11))

The parsing code follows.

(defun gettok ()
  (and *token* (pop *token*)))
(defun peektok ()
  (and *token* (car *token*)))

(defun rdh/expr ()
  (rdh/expr-tail (rdh/factor)))

(defun rdh/expr-tail (expr)
  (let ((tok (peektok)))
    (cond ((or (null tok)
           (equal tok ")"))
       expr)
      ((member tok '(+ -))
       (gettok)
       (let ((fac (rdh/factor)))
         (rdh/expr-tail (list tok expr fac))))
      (t (error "bad expr")))))

(defun rdh/factor ()
  (rdh/factor-tail (rdh/term)))

(defun rdh/factor-tail (fac)
  (let ((tok (peektok)))
    (cond ((or (null tok)
           (member tok '(")" + -)))
       fac)
      ((member tok '(* /))
       (gettok)
       (let ((term (rdh/term)))
         (rdh/factor-tail (list tok fac term))))
      (t (error "bad factor")))))

(defun rdh/term ()
  (let* ((prim (rdh/prim))
         (tok (peektok)))
    (cond ((or (null tok)
               (member tok '(")" + - / *)))
           prim)
          ((equal tok '^)
           (gettok)
           (list tok prim (rdh/term)))
          (t (error "bad term")))))

(defun rdh/prim ()
  (let ((tok (gettok)))
    (cond ((numberp tok) tok)
      ((equal tok "(")
       (let* ((expr (rdh/expr))
          (tok (peektok)))
         (if (not (equal tok ")"))
         (error "bad parenthesized expr")
           (gettok)
           expr)))
      (t (error "bad prim")))))

这篇关于在Emacs Lisp中进行解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆