我们如何使用instaparse定义clojure代码的语法? [英] How do we define a grammar for clojure code using instaparse?

查看:142
本文介绍了我们如何使用instaparse定义clojure代码的语法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个新手来解析并希望分析一些clojure代码。我希望有人可以提供一个示例如何clojure代码可以使用instaparse解析。我只需要做数字,符号,关键字,性别,向量和空格。



我想要解析的一些例子:

 (+ 1 2 
(+ 3 4))

{:hellothere
:look' I am
indented)}


解决方案

是你的问题的两个部分。第一部分解析表达式

 (+ 1 2 
(+ 3 4))

第二部分是将输出转换为所需的结果。为了更好地理解这些原则,我强烈推荐Udacity的编程语言课程。 Carin Meier的博文也是非常有用的。



了解解析器如何工作的最佳方式是将其分解为更小的部分。


  1. 一个简单的示例



    您首先需要编写一个语法,告诉instaparse如何解析给定的表达式。我们将从解析 1 开始:

      parser 
    (insta / parser
    sexp = number
    number =#'[0-9] +'
    ))

    sexp描述了性别表达的最高级语法。我们的语法说,sexp只能有一个数字。下一行声明该数字可以是任何数字0-9, + 类似于regex + 这意味着它必须有一个数字重复任何次数。如果我们运行我们的解析器,我们得到以下解析树:

     (解析器1)
    => [:sexp [:number1]]

    / p>

    我们可以通过向我们的语法添加尖括号< 来忽略某些值。因此,如果我们要将(1)简单地解析为 1 ,我们可以将语法改为:

     (def parser 
    (insta / parser
    sexp = lparen number rparen
    < lparen> =<'('>
    < rparen> =<')'>
    number =#'[0-9] +'
    ))

    如果我们再次运行解析器,它将忽略左右括号:

     (parser(1))
    => [:sexp [:number1]]

    这将有助于我们编写语法sexp下面。



    添加空格



    现在,如果我们添加空格并运行(parser(1))?我们得到一个错误:

     (parser(1))
    =>在第1行第2列解析错误:
    (1)
    ^
    预期:
    #[0-9] +

    这是因为我们没有在语法中定义空间的概念!所以我们可以这样添加空格:

     (def parser 
    (insta / parser
    sexp = lparen space number space rparen
    < lparen> =<'('>
    < rparen> =<')'>
    number =#'[0-9] +'
    < space> =<#'[] *'>
    ))

    同样, * 类似于regex * ,它意味着零或多个发生空间。这意味着以下示例将返回相同的结果:

     (parser(1))=> [:sexp [:number1]] 
    (parser(1))=> [:sexp [:number1]]
    (parser(1))=> [:sexp [:number1]]


  2. / strong>



    我们正在慢慢地从头构建我们的语法。您可以查看此处的最终产品,以便了解我们的前进方向。



    因此,sexp不仅仅包含我们简单语法所定义的数字。我们可以对sexp的一个高级视图是将它们视为两个括号之间的操作。所以基本上作为一个(操作)。我们可以直接将它写入我们的语法。

     (def parser 
    (insta / parser
    sexp = lparen operation rparen
    < lparen> =<'('>
    < rparen> =<')'>
    operation = ???
    ))

    如上所述,斜括号< 告诉instaparse当它做分析树时忽略这些值。现在什么是操作?一个操作包括一个操作符,例如 + 和一些参数,如数字 1 2 。所以我们可以这样写我们的语法:

     (def parser 
    (insta / parser
    sexp = lparen operation rparen
    < lparen> =<'('>
    < rparen> =<')'>
    operation = operator + args
    ='+'
    args = number
    number =#'[0-9] +'
    ))

    我们只表示一个可能的运算符 + ,只是为了简单。我们还包括了上面简单例子中的数字语法规则。然而,我们的语法是非常有限的。它可以解析的唯一有效的sexp是(+ 1)。这是因为我们没有包括空格的概念,并且声明args只能有一个数字。所以在这一步,我们将做两件事。我们将添加空格,并且我们将声明args可以有多个数字。

     (def parser 
    (insta / parser
    sexp = lparen operation rparen
    < ; lparen> =<'('>
    < rparen> =<')'>
    operation = operator + args
    operator ='+'
    args = snumber +
    < snumber> =空格号
    < space> =<#'[] *'>
    number =#'[0-9] +'
    ))

    我们添加了使用我们在简单示例中定义的空间语法规则。我们创建了一个新的 snumber ,定义为 space ,并将 + 添加到snumber中,以声明它必须出现一次,但它可以重复任意次数。所以我们可以这样运行我们的解析器:

     (parser(+ 1 2))
    => [:sexp [:operation [:operator+] [:args [:number1] [:number2]]]]


    我们可以通过 args 引用 sexp 。这样我们可以sexp在我们的sexp!我们可以通过创建 ssexp 来添加空格到 sexp ,然后将 ssexp 添加到 args

     (def parser 
    (insta / parser
    sexp = lparen operation rparen
    < lparen> =<'('>
    < rparen> =<')'
    operation = operator + args
    operator ='+'
    args = snumber + ssexp *
    < ssexp> = space sexp
    < snumber> =空格号
    < space> =<#'[] *'>
    number =#'[0-9] +'
    ))

    现在我们可以运行

     (parser(+ 1 2(+ 1 2)))
    => [:sexp
    [:operation
    [:operator+]
    [:args
    [:number1]
    [:number2]
    [:sexp
    [:operation [:operator+] [:args [:number1] [:number2]]]]]]]

    $>
  3. 步骤可以使用任何数量的工作在树上,如招聘,拉链,匹配和树seq的工具。然而,Instaparse还包括它自己的有用函数 insta \transform 。我们可以通过用有效的clojure函数替换解析树中的键来构建我们的转换。例如,:number 变成 read-string ,将我们的字符串转换为有效数字,: args 变成向量来构建我们的参数。



    因此,我们要转换这个:

      [:sexp [:operation [:operator+] [:args [:number1] [:number2]]]] 

    进入:

      )(read-string2))))
    => 3

    我们可以通过定义转换选项来实现:

     (defn choose-op [op] 
    (case op
    ++))
    (def transform-options
    {:number read-string
    :args vector
    :operator choose-op
    :操作申请
    :sexp身份
    })

    这里唯一棘手的事情是添加函数 choose-op 。我们想要的是将函数 + 传递给 apply ,但如果我们替换 + 将使用 + 作为常规函数。所以它会将我们的树转换为这样:

      ...(apply / code> 

    但是通过使用 choose-op c> $ 作为的参数 c>:

      ...(apply +(vector ... 


结束



现在,我们可以通过将解析器和变压器放在一起来运行我们的小解释器:

 (defn lisp [input] 
( - >>(解析器输入) b
$ b(lisp(+ 1 2))
=> 3

(lisp(+ 1 2(+ 3 4)) $ b => 10

您可以找到本教程中使用的最终代码在这里



希望这个简短的介绍足以开始您自己的项目。你可以通过声明 \\\
的语法来创建新行,甚至可以通过删除尖括号来忽略分析树中的空格, < 。这可能是有帮助的,因为你试图保持缩进。希望这有帮助,如果不只是写一个评论!


I'm a newbie to parsing and wish to analyse some clojure code. I am hoping that someone can provide an example of how clojure code can be parsed using instaparse. I just need to do numbers, symbols, keywords, sexps, vectors and whitespace.

Some examples that I wish to parse:

(+ 1 2 
   (+ 3 4))

{:hello "there"
 :look '(i am 
           indented)}

解决方案

Well there are two parts to your question. The first part is parsing the expression

(+ 1 2 
   (+ 3 4))

The second part is transforming the output to the result that you want. To get a good understanding of these principles, I highly recommend Udacity's Programming Languages course. Carin Meier's blog post is also quite helpful.

The best way to understand how the parser will work, is to break it down into smaller parts. So in the first we'll just examine some parsing rules, and in the second part we'll build our sexps.

  1. A simple example

    You will first need to write a grammar that tells instaparse how to parse the given expression. We'll start by just parsing the number 1:

    (def parser
        (insta/parser
            "sexp = number
             number = #'[0-9]+'
            "))
    

    sexp describes the highest level grammar for the sexpression. Our grammar states that the sexp can only have a number. The next line states that the number can be any digit 0-9, and the + is similar to the regex + which means that it must have one number repeated any number of times. If we run our parser we get the following parse tree:

    (parser "1")     
    => [:sexp [:number "1"]]
    

    Ingoring Parenthesis

    We can ignore certain values by adding angled brackets < to our grammar. So if we want to parse "(1)" as simply 1 we can right our grammar as:

    (def parser
        (insta/parser
            "sexp = lparen number rparen
             <lparen> = <'('>
             <rparen> = <')'>
             number = #'[0-9]+'
            "))
    

    and if we run the parser again, it will ignore the left and right parenthesis:

    (parser "(1)")
    => [:sexp [:number "1"]]
    

    This will become helpful when we write the grammar for sexp below.

    Adding Spaces

    Now happens if we add spaces and run (parser "( 1 )")? Well we get an error:

    (parser "( 1 )")
    => Parse error at line 1, column 2:
       ( 1 )
        ^
       Expected: 
       #"[0-9]+"
    

    That's because we haven't defined the concept of space in our grammar! So we can add spaces as such:

    (def parser
        (insta/parser
            "sexp = lparen space number space rparen
             <lparen> = <'('>
             <rparen> = <')'>
             number = #'[0-9]+'
             <space>  = <#'[ ]*'> 
            "))
    

    Again the * is similar to the regex * and it means zero or more than one occurrence of a space. That means the following examples will all return the same result:

    (parser "(1)")         => [:sexp [:number "1"]]
    (parser "( 1 )")       => [:sexp [:number "1"]]
    (parser "(       1 )") => [:sexp [:number "1"]]
    

  2. Building the Sexp

    We're slowly going to build our grammar from the ground up. It might be useful to look at the final product here, just to give an overview of where we're headed.

    So, an sexp contains more than just numbers as defined by our simple grammar. One high level view we can have of sexp is to view them as an operation between two parenthesis. So basically as a ( operation ). We can write this directly into our grammar.

    (def parser
        (insta/parser
            "sexp = lparen operation rparen
             <lparen> = <'('>
             <rparen> = <')'>
             operation = ???
            "))
    

    As I stated above, the angled brackets < tell instaparse to ignore these values when it is making the parse tree. Now what is an operation? Well an operation consists of an operator, like +, and some arguments, like the numbers 1 and 2. So we can say write our grammar as:

    (def parser
        (insta/parser
            "sexp = lparen operation rparen
             <lparen> = <'('>
             <rparen> = <')'>
             operation = operator + args
             operator = '+'
             args = number
             number = #'[0-9]+'
            "))
    

    We stated only one possible operator, +, just to keep things simple. We have also included the number grammar rule from the simple example above. Our grammar, however, is very limited. The only valid sexp it can parse is (+1). That's because we haven't included the concept of spaces, and have stated that args can have only one number. So in this step we will do two things. We will add spaces, and we will state that args can have more than one number.

    (def parser
        (insta/parser
            "sexp = lparen operation rparen
             <lparen> = <'('>
             <rparen> = <')'>
             operation = operator + args
             operator = '+'
             args = snumber+
             <snumber> = space number
             <space>  = <#'[ ]*'> 
             number = #'[0-9]+'
            "))
    

    We added space by using the space grammar rule we defined in the simple example. We created a new snumber which is defined as space and a number, and added the + to snumber to state that it must appear once but it can repeat any number of times. So we can run our parser as so:

    (parser "(+ 1 2)")
    => [:sexp [:operation [:operator "+"] [:args [:number "1"] [:number "2"]]]]
    

    We can make our grammar more robust by having args reference back to sexp. That way we can have sexp in our sexp! We can do this by creating ssexp which adds a space to sexp and then add ssexp to args.

    (def parser
        (insta/parser
            "sexp = lparen operation rparen
             <lparen> = <'('>
             <rparen> = <')'>
             operation = operator + args
             operator = '+'
             args = snumber+ ssexp* 
             <ssexp>   = space sexp
             <snumber> = space number
             <space>  = <#'[ ]*'> 
             number = #'[0-9]+'
            "))
    

    Now we can run

    (parser "(+ 1 2 (+ 1 2))")
     =>   [:sexp
           [:operation
            [:operator "+"]
            [:args
             [:number "1"]
             [:number "2"]
             [:sexp
              [:operation [:operator "+"] [:args [:number "1"] [:number "2"]]]]]]]
    

  3. Transformations

    This step can be done using any number of tools that work on trees, such enlive, zippers, match, and tree-seq. Instaparse, however, also includes its own useful function called insta\transform. We can build our transformations by replacing the keys in our parse tree by the valid clojure functions. For example, :number becomes read-string to turn our strings into valid numbers, :args becomes vector to build our arguments.

    So, we want to transform this:

     [:sexp [:operation [:operator "+"] [:args [:number "1"] [:number "2"]]]]
    

    Into this:

     (identity (apply + (vector (read-string "1") (read-string "2"))))
     => 3
    

    We can do that by defining our transformation options:

    (defn choose-op [op]
     (case op
        "+" +))
    (def transform-options
       {:number read-string
        :args vector
        :operator choose-op
        :operation apply
        :sexp identity
     })
    

    The only tricky thing here was adding the function choose-op. What we want, is to pass the function + to apply, but if we replace operator with + it will use + as a regular function. So it will transform our tree to this:

     ... (apply (+ (vector ...
    

    But by using choose-op it will pass + as an argument to apply as such:

     ... (apply + (vector ...
    

Conclusion

We can now run our little interpreter by putting the parser and transformer together:

(defn lisp [input]
   (->> (parser input) (insta/transform transform-options)))

(lisp "(+ 1 2)")
   => 3

(lisp "(+ 1 2(+ 3 4))")
   => 10

You can find the final code used in this tutorial here.

Hopefully, this short introduction is enough to get going on your own projects. You can new lines by declaring a grammar for \n and you can even choose to not ignore spaces in your parse tree by removing the angled brackets <. That might be helpful given that you're trying to keep the indentation. Hope this helps, If not just write a comment!

这篇关于我们如何使用instaparse定义clojure代码的语法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆