无法在树发送程序中对块规则的优先级进行编码,而不对语句规则进行编码 [英] Unable to encode precedence of block rule over statement rule in tree-sitter

查看:52
本文介绍了无法在树发送程序中对块规则的优先级进行编码,而不对语句规则进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对简单语法进行编码,该语法既包括普通语句,也包括用块括起来的语句.Block具有特殊的关键字.我已将阻止规则优先级指定为零,但是树发送器仍然不匹配它.即使报告错误,也就是其他规则不匹配.但是尽管如此,它还是不想匹配块!为什么以及如何解决?

I am trying to encode simple grammar which covers both plain statements and also statements enclosed with a block. Block has special keyword for it. I have specified block rule precedence over zero, but tree-sitter still doesn't match it. Even it reports error, i.e. other rules don't match. But nevertheless it doesn't want to match block! Why and how to fix?

代码:

area = pi*r^2;

block {
    r=12;
}

尽管语句中不允许使用花括号,但

tree-sitter 与整个序列 block {r = 12; 都匹配为一条语句.因此它报告了一个错误,但不希望匹配阻止规则,尽管它是适用的.

tree-sitter matches entire sequence block { r=12; as a statement, despite the fact that curle brackets disallowed in statements. So it reports an error, but doesn't want to match block rule, although it is applicable.

语法:

module.exports = grammar({
    name: 'test',

    rules: {
        source_file: $ => seq(
            repeat(choice($.block, $.statement_with_semicolon)),
            optional($.statement_without_semicolon)
        ),

        block: $ => prec(1, seq(
            "block",
            "{",
            repeat( $.statement_with_semicolon ),
            optional( $.statement_without_semicolon),
            "}",
            optional(";")
        )),

        statement_without_semicolon: $ => $.token_chain,

        statement_with_semicolon: $ => seq(
            $.token_chain,
            ";"
        ),

        token_chain: $ => repeat1(
            $.token
        ),

        token: $ => choice(
            $.alphanumeric,
            $.punctuation
        ),

        alphanumeric: $ => /[a-zA-Zα-ωΑ-Ωа-яА-Я0-9]+/,

        punctuation: $ => /[^a-zA-Zα-ωΑ-Ωа-яА-Я0-9"{}\(\)\[\];]+/
    }
});

输出:

>tree-sitter parse example-file
(source_file [0, 0] - [4, 1]
  (statement_with_semicolon [0, 0] - [0, 14]
    (token_chain [0, 0] - [0, 13]
      (token [0, 0] - [0, 4]
        (alphanumeric [0, 0] - [0, 4]))
      (token [0, 4] - [0, 7]
        (punctuation [0, 4] - [0, 7]))
      (token [0, 7] - [0, 9]
        (alphanumeric [0, 7] - [0, 9]))
      (token [0, 9] - [0, 10]
        (punctuation [0, 9] - [0, 10]))
      (token [0, 10] - [0, 11]
        (alphanumeric [0, 10] - [0, 11]))
      (token [0, 11] - [0, 12]
        (punctuation [0, 11] - [0, 12]))
      (token [0, 12] - [0, 13]
        (alphanumeric [0, 12] - [0, 13]))))
  (statement_with_semicolon [0, 14] - [3, 9]
    (token_chain [0, 14] - [3, 8]
      (token [0, 14] - [2, 0]
        (punctuation [0, 14] - [2, 0]))
      (token [2, 0] - [2, 5]
        (alphanumeric [2, 0] - [2, 5]))
      (token [2, 5] - [2, 6]
        (punctuation [2, 5] - [2, 6]))
      (ERROR [2, 6] - [2, 7])
      (token [2, 7] - [3, 4]
        (punctuation [2, 7] - [3, 4]))
      (token [3, 4] - [3, 5]
        (alphanumeric [3, 4] - [3, 5]))
      (token [3, 5] - [3, 6]
        (punctuation [3, 5] - [3, 6]))
      (token [3, 6] - [3, 8]
        (alphanumeric [3, 6] - [3, 8]))))
  (statement_without_semicolon [3, 9] - [4, 0]
    (token_chain [3, 9] - [4, 0]
      (token [3, 9] - [4, 0]
        (punctuation [3, 9] - [4, 0]))))
  (ERROR [4, 0] - [4, 1]))
example-file    0 ms    (ERROR [2, 6] - [2, 7])

推荐答案

您的问题是您的标点正则表达式与换行符 \ n \ r ,您可以在此处查看:

Your issue is that your punctuation regex matches newline characters \n and \r, which you can see here:

  (statement_with_semicolon [0, 14] - [3, 9]
    (token_chain [0, 14] - [3, 8]
      (punctuation [0, 14] - [2, 0]))

看看它如何匹配第零行的末尾和空白的第一行?当解析器到达 block 时,它认为block只是 statement_with_semicolon 中与字母数字匹配的另一个标记.您可以通过将您的标点符号定义更改为:

See how it matches the end of the zeroth line and the blank first line? By the time the parser gets to block it thinks block is just another token in statement_with_semicolon matching alphanumeric. You can fix this immediate issue by changing your punctuation definition to:

punctuation: $ => /[^a-zA-Zα-ωΑ-Ωа-яА-Я0-9"{}\(\)\[\];\n\r]+/

但是,这可能不是您遇到的这种类型的最后一个问题,因此您可能希望重写语法,以更明确地了解其接受的标点符号以及位置.例如,定义有效的运算符集.

However this likely won't be the last issue of this type you run into, so you might want to rewrite your grammar to be more explicit about the punctuation it accepts, and where. Defining the set of valid operators, for example.

这还会回答您的其他问题.

这篇关于无法在树发送程序中对块规则的优先级进行编码,而不对语句规则进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆