如何在不向每个规则添加EOF的情况下测试ANTLR转换 [英] How to test ANTLR translation without adding EOF to every rule

查看:62
本文介绍了如何在不向每个规则添加EOF的情况下测试ANTLR转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正处于重新编写翻译器的过程中,这次我在测试方面受到了更多的约束,因为该版本可能存在超过几个星期.

因为您可以从任何节点开始运行访问者,所以您可以几乎编写类似这样的漂亮的小型测试...

  expect(parse(一些测试代码","startGrammarRule")).toEqual(new ASTForGrammarRule()) 

,然后为每个访问者功能写一个(或几个)

除了您要调用的规则是子规则之外,因此没有"EOF"在其中,所以如果我的语法中有某个地方

  numberList:NUMBER(','NUMBER)*; 

...然后 parse("1,2,3","numberList")仅对"1"进行解析.(因为只有一个"EOF"会使解析器饿到足以消耗所有字符串).

编辑规则以添加EOF并非一开始.对于我为其编写测试的每条规则,我都可以添加该规则的测试版本...

  numberList:NUMBER(','NUMBER)*;numberList_TEST:numberList EOF; 

...但是这会使语法变得混乱,并带来担心,必须始终严格维护 _TEST 规则...

当我创建一个解析器时,我想要一个标志,该解析器动态构造该仿制TEST规则,然后从那里进行解析,或者类似的事情...

有没有更好的方法为我尚未解析的解析器编写测试?

解决方案

在Java项目中,我正在使用自定义匹配器来检查解析的令牌是否为令牌流的100%,否则将失败./p>

您似乎使用了TypeScript目标,因此在TypeScript中看起来像这样:

T.g4

 语法T;parse:numberList EOF;numberList:NUMBER(','NUMBER)*;NUMBER:[0-9] +;ID:[a-zA-Z] +;WS:[\ t \ r \ n] +->频道(HIDDEN); 

parserMatchers.ts

 从'../src/parser/TLexer'导入{TLexer};从'antlr4ts'导入{BailErrorStrategy,CharStreams,CommonTokenStream};从'../src/parser/TParser'导入{TParser};从"antlr4ts/Lexer"导入{Lexer};Expect.extend({toBeCompletelyParsedBy:(源:字符串,ruleName:字符串)=>{const lexer = new TLexer(CharStreams.fromString(source));lexer.removeErrorListeners();const tokenStream =新的CommonTokenStream(lexer);const parser = new TParser(tokenStream);parser.removeErrorListeners();parser.errorHandler = new BailErrorStrategy();const context = parser [ruleName]();//收集真实令牌:非隐藏令牌和非EOF令牌const realTokens = tokenStream.getTokens().filter((t)=>t.channel === Lexer.DEFAULT_TOKEN_CHANNEL&&t.type!== Lexer.EOF);让indexOfStop = realTokens.indexOf(context.stop);让传递= realTokens.length ===(indexOfStop +1);let message =()=>{如果(通过){返回`expected'$ {source}'不会被规则'$ {ruleName}'完全解析,但是确实如此.}让违规= realTokens [indexOfStop +1];返回`预期的'$ {source}'将由规则'$ {ruleName}'完全解析,但是'$ {offending.text}'`+`($ {offending.line}:$ {offending.charPositionInLine})不包括在内!};返回{pass,message};}});声明全局{命名空间笑话{接口Matchers< R>{toBeCompletelyParsedBy(ruleName:字符串):R}}}出口 {}; 

在单元测试中,您现在可以执行以下操作:

  import'./parserMatchers';test('numberList parser rule',()=> {Expect('3,4,5').toBeCompletelyParsedBy('numberList');Expect('3,4,5 FOO').not.toBeCompletelyParsedBy('numberList');}); 

I am in the middle of re-writing my translator and I am being much more disciplined about tests this time, since this version is likely to live for more than a few weeks.

Because you can run a visitor starting at any node, you can almost write beautiful small tests like this ...

expect(parse("some test code", "startGrammarRule")).toEqual(new ASTForGrammarRule())

and then write one ( or a few of these ) for each visitor function

EXCEPT that the rule you are invoking is a sub rule, and so does not have "EOF" in it, so if my grammar has somewhere in it

numberList: NUMBER ( ',' NUMBER )* ;

... then parse("1,2,3", "numberList") only parses "1" (because it is only an "EOF" which would make the parser hungry enough to consume all the string).

Editing the rule to add EOF is a non starter. I could, for every rule I write a test for, add a test version of the rule ...

numberList: NUMBER ( ',' NUMBER )* ;
numberList_TEST: numberList EOF ;

... but that is going to make the grammar cluttered and introduce worry that the _TEST rules have to always be maintained scrupulously ...

I want a flag when I create a parser which constructs that faux TEST rule dynamically and then parses from there, or something like that ...

Is there a better way to write tests for my parser that I haven't figured out yet?

解决方案

In a Java project, I'm using a custom matcher to check if the parsed tokens are 100% of the tokenstream, and if not, will fail.

You seem to use the TypeScript target, so in TypeScript that could look like this:

T.g4

grammar T;

parse      : numberList EOF;
numberList : NUMBER ( ',' NUMBER )*;

NUMBER : [0-9]+;
ID     : [a-zA-Z]+;
WS     : [ \t\r\n]+ -> channel(HIDDEN);

parserMatchers.ts

import { TLexer } from '../src/parser/TLexer';
import { BailErrorStrategy, CharStreams, CommonTokenStream } from 'antlr4ts';
import { TParser } from '../src/parser/TParser';
import { Lexer } from 'antlr4ts/Lexer';

expect.extend({
  toBeCompletelyParsedBy: (source: string, ruleName: string) => {
    const lexer = new TLexer(CharStreams.fromString(source));
    lexer.removeErrorListeners();
    const tokenStream = new CommonTokenStream(lexer);
    const parser = new TParser(tokenStream);
    parser.removeErrorListeners();
    parser.errorHandler = new BailErrorStrategy();
    const context = parser[ruleName]();

    // Collect the real tokens: non-HIDDEN and non-EOF tokens
    const realTokens = tokenStream.getTokens().filter((t) =>
      t.channel === Lexer.DEFAULT_TOKEN_CHANNEL && t.type !== Lexer.EOF);

    let indexOfStop = realTokens.indexOf(context.stop);
    let pass = realTokens.length === (indexOfStop + 1);

    let message = () => {

      if (pass) {
        return `Expected '${source}' not to be completely parsed by rule '${ruleName}', but it did.`;
      }

      let offending = realTokens[indexOfStop + 1];

      return `Expected '${source}' to be completely parsed by rule '${ruleName}', but '${offending.text}' ` +
        `(${offending.line}:${offending.charPositionInLine}) was not included!`;
    };

    return { pass, message };
  }
});

declare global {
  namespace jest {
    interface Matchers<R> {
      toBeCompletelyParsedBy(ruleName: string): R
    }
  }
}

export {};

And in you unit tests, you can now do this:

import './parserMatchers';

test('the numberList parser rule', () => {
  expect('3, 4, 5').toBeCompletelyParsedBy('numberList');
  expect('3, 4, 5 FOO').not.toBeCompletelyParsedBy('numberList');
});

这篇关于如何在不向每个规则添加EOF的情况下测试ANTLR转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆