Perl 6语法不匹配,我认为应该匹配 [英] Perl 6 Grammar doesn't match like I think it should

查看:89
本文介绍了Perl 6语法不匹配,我认为应该匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在代码第9天出现:

您坐了一会儿,并记录部分视频流(您的拼图输入).字符代表组-以{开头并以}结束的序列.在一个组中,有零个或多个其他事物,以逗号分隔:另一个组或垃圾.由于组可以包含其他组,因此}仅关闭最近打开的未关闭组-也就是说,它们是可嵌套的.您的拼图输入代表一个大的组,它本身包含许多较小的组.

You sit for a while and record part of the stream (your puzzle input). The characters represent groups - sequences that begin with { and end with }. Within a group, there are zero or more other things, separated by commas: either another group or garbage. Since groups can contain other groups, a } only closes the most-recently-opened unclosed group - that is, they are nestable. Your puzzle input represents a single, large group which itself contains many smaller ones.

有时候,您会发现垃圾而不是一组垃圾.垃圾以<开头,以>结束.在这些尖括号之间,几乎任何字符都可以出现,包括{}.在垃圾中,<没有特殊含义.

Sometimes, instead of a group, you will find garbage. Garbage begins with < and ends with >. Between those angle brackets, almost any character can appear, including { and }. Within garbage, < has no special meaning.

为清除垃圾而徒劳地尝试,某些程序已使用!取消了其中的某些字符:在垃圾内部,!之后的任何字符都应被忽略,包括<,甚至还有另一个!.

In a futile attempt to clean up the garbage, some program has canceled some of the characters within it using !: inside garbage, any character that comes after ! should be ignored, including <, >, and even another !.

当然,这是Perl 6语法的尖叫...

Of course, this screams out for a Perl 6 Grammar...

grammar Stream
{
    rule TOP { ^ <group> $ }

    rule group { '{' [ <group> || <garbage> ]* % ',' '}' }
    rule garbage { '<' [ <garbchar> | <garbignore> ]* '>' }

    token garbignore { '!' . }
    token garbchar { <-[ !> ]> }
}

这在简单的示例上似乎可以正常工作,但是连续出现两个garbchar却出错了:

This seems to work fine on simple examples, but it goes wrong with two garbchars in a row:

say Stream.parse('{<aa>}');

给出Nil.

Grammar::Tracer没有帮助:

TOP
|  group
|  |  group
|  |  * FAIL
|  |  garbage
|  |  |  garbchar
|  |  |  * MATCH "a"
|  |  * FAIL
|  * FAIL
* FAIL
Nil

多个garbignore没问题:

say Stream.parse('{<!!a!a>}');

给予:

「{<!!a!a>}」
 group => 「{<!!a!a>}」
  garbage => 「<!!a!a>」
   garbignore => 「!!」
   garbchar => 「a」
   garbignore => 「!a」

有什么想法吗?

推荐答案

UPD 鉴于代码问世问题并未提及空格,因此您根本不应该使用rule构造.只需将所有rule切换到token,就应该设置好了.通常,请遵循Brad的建议-使用token,除非您知道,您需要rule(在下面讨论)或regex(如果需要回溯).

UPD Given that the Advent of code problem doesn't mention whitespace you shouldn't be using the rule construct at all. Just switch all the rules to tokens and you should be set. In general, follow Brad's advice -- use token unless you know you need a rule (discussed below) or a regex (if you need backtracking).

下面我的原始答案探讨了rule为何不起作用的原因.我现在将其保留.

My original answer below explored why the rules didn't work. I'll leave it in for now.

TL; DR <garbchar> |包含一个空格.直接在rule中任何原子之后的空格表示标记化中断.您可以简单地删除此不适当的空间,即改为编写<garbchar>|(或者,如果不需要捕获垃圾,最好写为<.garbchar>|)以获得所需的结果.

TL;DR <garbchar> | contains a space. Whitespace that directly follows any atom in a rule indicates a tokenizing break. You can simply remove this inappropriate space, i.e. write <garbchar>| instead (or better still, <.garbchar>| if you don't need to capture the garbage) to get the result you seek.

在您最初的问题允许的情况下,这不是错误,只是您的思维模式已关闭.

As your original question allowed, this isn't a bug, it's just that your mental model is off.

您的答案正确识别了问题:令牌.

Your answer correctly identifies the issue: tokenization.

因此,剩下的就是您的后续问题,该问题与您的令牌化思维模型有关,或者至少与默认情况下Perl 6令牌化的方式有关:

So what we're left with is your follow up question, which is about your mental model of tokenization, or at least how Perl 6 tokenizes by default:

为什么...我的第二个示例...连续两次出现乱码都出错了:

why ... my second example ... goes wrong with two garbchars in a row:

'{<aa>}'

简化,问题在于如何对此标记化:

Simplifying, the issue is how to tokenize this:

aa

一个简单的高级答案是,在分析本地语言时,aa通常将被视为一个令牌,而不是两个令牌,并且默认情况下,Perl 6假定此普通定义.这是您遇到的问题.

The simple high level answer is that, in parsing vernacular, aa will ordinarily be treated as one token, not two, and, by default, Perl 6 assumes this ordinary definition. This is the issue you're encountering.

您可以否决该常规定义,以获取您希望实现的任何标记化结果.但这几乎没有必要这样做,而且在这样的简单情况下当然不是必需的.

You can overrule this ordinary definition to get any tokenizing result you care to achieve. But it's seldom necessary to do so and it certainly isn't in simple cases like this.

我将提供两条多余的路径,希望这些路径可以引导人们建立正确的心理模型:

I'll provide two redundant paths that I hope might lead folk to the correct mental model:

  • 对于那些喜欢直接深入细节的人来说,有一个此SO答案的其余部分提供了高层次的讨论,是对我的reddit评论中低层解释的补充.

    The rest of this SO answer provides a high level discussion that complements the low level explanation in my reddit comment.

    摘录自 Wikipedia页面上有关标记化的部分,并将摘录与P6的特定讨论交织在一起:

    Excerpting from the "Obstacles" section of the wikipedia page on tokenization, and interleaving the excerpts with P6 specific discussion:

    通常,标记化发生在单词级别.但是,有时很难定义单词"的含义.分词器通常依赖于简单的启发式方法,例如:

    Typically, tokenization occurs at the word level. However, it is sometimes difficult to define what is meant by a "word". Often a tokenizer relies on simple heuristics, for example:

    • 标点和空白可能会或可能不会包含在结果标记列表中.

    在Perl 6中,您可以使用与令牌化正交的捕获功能来控制解析树中包含或不包含的内容.

    In Perl 6 you control what gets included or not in the parse tree using capturing features that are orthogonal to tokenizing.

    • 所有连续的字母字符字符串都是一个标记的一部分;同样是数字.

    • All contiguous strings of alphabetic characters are part of one token; likewise with numbers.

    标记由空格字符(例如空格或换行符)或标点符号分隔.

    Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters.

    默认情况下,Perl 6设计体现了这两种启发式方法的等效功能.

    By default, the Perl 6 design embodies an equivalent of these two heuristics.

    要获取的关键是,它是处理多个令牌字符串的rule构造. token构造用于定义每个调用单个令牌.

    The key thing to get is that it's the rule construct that handles a string of tokens, plural. The token construct is used to define a single token per call.

    我想在这里结束我的回答,因为它已经相当长了.请使用评论来帮助我们改善此答案.希望到目前为止所写的内容对您有所帮助.

    I think I'll end my answer here because it's already getting pretty long. Please use the comments to help us improve this answer. I hope what I've written so far helps.

    这篇关于Perl 6语法不匹配,我认为应该匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆