超能力:仅当它开始一行时才将字符串与标记器匹配 [英] Superpower: match a string with tokenizer only if it begins a line

查看:23
本文介绍了超能力:仅当它开始一行时才将字符串与标记器匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 superpower 中标记化时,如何匹配字符串仅当它是一行中的第一件事时(注意:这是一个与 这个) ?

When tokenizing in superpower, how to match a string only if it is the first thing in a line (note: this is a different question than this one) ?

例如,假设我的语言只有以下 4 个字符(' '、':'、'X'、'Y'),每个字符都是一个标记.还有一个标题"标记来捕获以下正则表达式模式/^[XY]+:/(任何数量的 X 和 Y 后跟一个冒号,仅当它们开始行时)的情况.

For example, assume I have a language with only the following 4 characters (' ', ':', 'X', 'Y'), each of which is a token. There is also a 'Header' token to capture cases of the following regex pattern /^[XY]+:/ (any number of Xs and Ys followed by a colon, only if they start the line).

这是一个用于测试的快速类(第 4 个测试用例失败):

Here is a quick class for testing (the 4th test-case fails):

using System;
using Superpower;
using Superpower.Parsers;
using Superpower.Tokenizers;

public enum Tokens { Space, Colon, Header, X, Y }

public class XYTokenizer
{
    static void Main(string[] args)
    {
        Test("X", Tokens.X);
        Test("XY", Tokens.X, Tokens.Y);
        Test("X Y:", Tokens.X, Tokens.Space, Tokens.Y, Tokens.Colon);
        Test("X: X", Tokens.Header, Tokens.Space, Tokens.X);
    }

    public static readonly Tokenizer<Tokens> tokenizer = new TokenizerBuilder<Tokens>()
        .Match(Character.EqualTo('X'), Tokens.X)
        .Match(Character.EqualTo('Y'), Tokens.Y)
        .Match(Character.EqualTo(':'), Tokens.Colon)
        .Match(Character.EqualTo(' '), Tokens.Space)
        .Build();

    static void Test(string input, params Tokens[] expected)
    {
        var tokens = tokenizer.Tokenize(input);
        var i = 0;
        foreach (var t in tokens)
        {
            if (t.Kind != expected[i])
            {
                Console.WriteLine("tokens[" + i + "] was Tokens." + t.Kind
                    + " not Tokens." + expected[i] + " for '" + input + "'");
                return;
            }
            i++;
        }
        Console.WriteLine("OK");
    }
}

推荐答案

我想出了一个基于 此处找到的示例.我在整个代码中添加了注释,以便您了解正在发生的事情.

I came up with a custom Tokenizer based on the example found here. I added comments throughout the code so you can follow what's happening.

public class MyTokenizer : Tokenizer<Tokens>
{
    protected override IEnumerable<Result<Tokens>> Tokenize(TextSpan input)
    {
        Result<char> next = input.ConsumeChar();

        bool checkForHeader = true;

        while (next.HasValue)
        {
            // need to check for a header when starting a new line
            if (checkForHeader)
            {
                var headerStartLocation = next.Location;
                var tokenQueue = new List<Result<Tokens>>();
                while (next.HasValue && (next.Value == 'X' || next.Value == 'Y'))
                {
                    tokenQueue.Add(Result.Value(next.Value == 'X' ? Tokens.X : Tokens.Y, next.Location, next.Remainder));
                    next = next.Remainder.ConsumeChar();
                }

                // only if we had at least one X or one Y
                if (tokenQueue.Any())
                {
                    if (next.HasValue && next.Value == ':')
                    {
                        // this is a header token; we have to return a Result of the start location 
                        // along with the remainder at this location
                        yield return Result.Value(Tokens.Header, headerStartLocation, next.Remainder);
                        next = next.Remainder.ConsumeChar();
                    }
                    else
                    {
                        // this isn't a header; we have to return all the tokens we parsed up to this point
                        foreach (Result<Tokens> tokenResult in tokenQueue)
                        {
                            yield return tokenResult;
                        }
                    }
                }

                if (!next.HasValue)
                    yield break;
            }

            checkForHeader = false;

            if (next.Value == '\r') 
            {
                // skip over the carriage return
                next = next.Remainder.ConsumeChar();
                continue;
            }

            if (next.Value == '\n')
            {
                // line break; check for a header token here
                next = next.Remainder.ConsumeChar();
                checkForHeader = true;
                continue;
            }

            if (next.Value == 'A')
            {
                var abcStart = next.Location;
                next = next.Remainder.ConsumeChar();
                if (next.HasValue && next.Value == 'B')
                {
                    next = next.Remainder.ConsumeChar();
                    if (next.HasValue && next.Value == 'C')
                    {
                        yield return Result.Value(Tokens.ABC, abcStart, next.Remainder);
                        next = next.Remainder.ConsumeChar();
                    }
                    else
                    {
                        yield return Result.Empty<Tokens>(next.Location, $"unrecognized `AB{next.Value}`");
                    }
                }
                else
                {
                    yield return Result.Empty<Tokens>(next.Location, $"unrecognized `A{next.Value}`");
                }
            }
            else if (next.Value == 'X')
            {
                yield return Result.Value(Tokens.X, next.Location, next.Remainder);
                next = next.Remainder.ConsumeChar();
            }
            else if (next.Value == 'Y')
            {
                yield return Result.Value(Tokens.Y, next.Location, next.Remainder);
                next = next.Remainder.ConsumeChar();
            }
            else if (next.Value == ':')
            {
                yield return Result.Value(Tokens.Colon, next.Location, next.Remainder);
                next = next.Remainder.ConsumeChar();
            }
            else if (next.Value == ' ')
            {
                yield return Result.Value(Tokens.Space, next.Location, next.Remainder);
                next = next.Remainder.ConsumeChar();
            }
            else
            {
                yield return Result.Empty<Tokens>(next.Location, $"unrecognized `{next.Value}`");
                next = next.Remainder.ConsumeChar(); // Skip the character anyway
            }
        }
    }
}

你可以这样称呼它:

var tokens = new MyTokenizer().Tokenize(input);

这篇关于超能力:仅当它开始一行时才将字符串与标记器匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆