ANTLR 没有为 Scala 语法提供正确的输出标记 [英] ANTLR doesn't give correct output tokens for Scala Grammar

查看:65
本文介绍了ANTLR 没有为 Scala 语法提供正确的输出标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Scala 的新手,我正在尝试使用 Scala 语法和 ANTLR 来解析 Scala 文件.下面是我从 git hub 链接获得的 Scala 语法代码:

这完全是愚蠢的.代替令牌,它给了我简单的 LOC .我针对其他语言 Java 和 C 对其进行了测试,效果很好.它为我提供了以下语法链接所需的正确输出/正确标记:

https://github.com/antlr/grammars-v4

如果我做错了什么,因为我是 Antlr 和 Scala 的新手,请纠正我.

我从 token 中的意思是所有关键字、操作数和所有运算符都在那里.在我看来,它从来都不是简单的代码行.

下面是我使用 Scala.g4(Scala Grammar with ANTLR) 得到的 Scala.tokens 文件.T__0=1T__1=2T__2=3T__3=4T__4=5T__5=6T__6=7T__7=8T__8=9T__9=10T__10=11T__11=12T__12=13T__13=14T__14=15T__15=16T__16=17T__17=18T__18=19T__19=20T__20=21T__21=22T__22=23T__23=24T__24=25T__25=26T__26=27T__27=28T__28=29T__29=30T__30=31T__31=32T__32=33T__33=34T__34=35T__35=36T__36=37T__37=38T__38=39T__39=40T__40=41T__41=42T__42=43T__43=44T__44=45T__45=46T__46=47T__47=48T__48=49T__49=50T__50=51T__51=52T__52=53T__53=54T__54=55T__55=56T__56=57T__57=58T__58=59T__59=60T__60=61布尔文字=62字符文字=63字符串文字=64符号文字=65整数文字=66浮点文字=67编号=68变量=69NL=70半=71父母=72德林=73评论=74'-'=1'空'=2'.'=3','=4'这个'=5'超级'=6'['=7']'=8'==>'=9'('=10')'=11'forSome'=12'{'=13'}'=14'类型'=15'val'=16'与'=17'#'=18':'=19'_'=20'*'=21'隐式'=22'如果'=23'其他'=24'同时'=25'尝试'=26'抓住'=27'终于'=28'做'=29'为'=30'产量'=31'投掷'=32'返回'=33'新'=34'='=35'匹配'=36'+'=37'~'=38'!'=39'懒惰'=40'<-'=41'案例'=42'|'=43'@'=44'>:'=45'<:'=46'<%'=47'var'=48'覆盖'=49'抽象'=50'最终'=51'密封'=52'私人'= 53'受保护'=54'导入'=55'def'=56班级"=57'对象'=58'特质'=59'扩展'=60'包'=61

我确信这些标记是不正确的.谁能确定 Scala Gramma 或 ANTLR 有这个问题吗?

解决方案

这个文件现在似乎解析得很好,所以可能语法已经修复

I am new to Scala and I am trying to parse Scala files with the use of Scala Grammar and ANTLR. Below is the code for Scala Grammar which I got from the git hub link:

https://github.com/antlr/grammars-v4/tree/master/scala

There are chances of repo to be moved so I am pasting the Scala grammar code here:

grammar Scala;

literal           : '-'? IntegerLiteral
                | '-'? FloatingPointLiteral
                | BooleanLiteral
                | CharacterLiteral
                | StringLiteral
                | SymbolLiteral
                | 'null' ;

qualId            : Id ('.' Id)* ;

ids               : Id (',' Id)* ;

stableId          : (Id | (Id '.')? 'this') '.' Id
                | (Id '.')? 'super' classQualifier? '.' Id ;

classQualifier    : '[' Id ']' ;

type              : functionArgTypes '=>' type
                | infixType existentialClause? ;

functionArgTypes  : infixType
                | '(' ( paramType (',' paramType )* )? ')' ;

existentialClause : 'forSome' '{' existentialDcl (Semi existentialDcl)* '}';

existentialDcl    : 'type' typeDcl
                | 'val' valDcl;

infixType         : compoundType (Id Nl? compoundType)*;

compoundType      : annotType ('with' annotType)* refinement?
                | refinement;

annotType         : simpleType annotation*;

simpleType        : simpleType typeArgs
                | simpleType '#' Id
                | stableId
                | (stableId | (Id '.')? 'this') '.' 'type'
                | '(' types ')';

typeArgs          : '[' types ']';

types             : type (',' type)*;

refinement        : Nl? '{' refineStat (Semi refineStat)* '}';

refineStat        : dcl
                | 'type' typeDef
                | ;

typePat           : type;

ascription        : ':' infixType
                | ':' annotation+
                | ':' '_' '*';

expr              : (bindings | 'implicit'? Id | '_') '=>' expr
                | expr1 ;

expr1             : 'if' '(' expr ')' Nl* expr (Semi? 'else' expr)?
                | 'while' '(' expr ')' Nl* expr
                | 'try' ('{' block '}' | expr) ('catch' '{' caseClauses '}')? ('finally' expr)?
                | 'do' expr Semi? 'while' '(' expr ')'
                | 'for' ('(' enumerators ')' | '{' enumerators '}') Nl* 'yield'? expr
                | 'throw' expr
                | 'return' expr?
                | (('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) '.') Id '=' expr
                | simpleExpr1 argumentExprs '=' expr
                | postfixExpr
                | postfixExpr ascription
                | postfixExpr 'match' '{' caseClauses '}' ;

postfixExpr       : infixExpr (Id Nl?)? ;

infixExpr         : prefixExpr
                | infixExpr Id Nl? infixExpr ;

prefixExpr        : ('-' | '+' | '~' | '!')?
                  ('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) ;

simpleExpr1       : literal
                | stableId
                | (Id '.')? 'this'
                | '_'
                | '(' exprs? ')'
                | ('new' (classTemplate | templateBody) | blockExpr ) '.' Id
                | ('new' (classTemplate | templateBody) | blockExpr ) typeArgs
                | simpleExpr1 argumentExprs
      ;

exprs             : expr (',' expr)* ;

argumentExprs     : '(' exprs? ')'
                | '(' (exprs ',')? postfixExpr ':' '_' '*' ')'
                | Nl? blockExpr ;

blockExpr         : '{' caseClauses '}'
                | '{' block '}' ;
block             : blockStat (Semi blockStat)* resultExpr? ;

blockStat         : import_
                | annotation* ('implicit' | 'lazy')? def
                | annotation* localModifier* tmplDef
                | expr1
                | ;

resultExpr        : expr1
                | (bindings | ('implicit'? Id | '_') ':' compoundType) '=>' block ;

enumerators       : generator (Semi generator)* ;

generator         : pattern1 '<-' expr (Semi? guard | Semi pattern1 '=' expr)* ;

caseClauses       : caseClause+ ;

caseClause        : 'case' pattern guard? '=>' block ;

guard             : 'if' postfixExpr ;

pattern           : pattern1 ('|' pattern1 )* ;

pattern1          : Varid ':' typePat
                | '_' ':' typePat
                | pattern2 ;

pattern2          : Varid ('@' pattern3)?
                | pattern3 ;

pattern3          : simplePattern
                | simplePattern (Id Nl? simplePattern)* ;

simplePattern     : '_'
                | Varid
                | literal
                | stableId ('(' patterns ')')?
                | stableId '(' (patterns ',')? (Varid '@')? '_' '*' ')'
                | '(' patterns? ')' ;

patterns          : pattern (',' patterns)*
                | '_' * ;

typeParamClause   : '[' variantTypeParam (',' variantTypeParam)* ']' ;

funTypeParamClause: '[' typeParam (',' typeParam)* ']' ;

variantTypeParam  : annotation? ('+' | '-')? typeParam ;

typeParam         : (Id | '_') typeParamClause? ('>:' type)? ('<:' type)?
                  ('<%' type)* (':' type)* ;

paramClauses      : paramClause* (Nl? '(' 'implicit' params ')')? ;

paramClause       : Nl? '(' params? ')' ;

params            : param (',' param)* ;

param             : annotation* Id (':' paramType)? ('=' expr)? ;

paramType         : type
                | '=>' type
                | type '*';

classParamClauses : classParamClause*
                  (Nl? '(' 'implicit' classParams ')')? ;

classParamClause  : Nl? '(' classParams? ')' ;

classParams       : classParam (',' classParam)* ;

classParam        : annotation* modifier* ('val' | 'var')?
                  Id ':' paramType ('=' expr)? ;

bindings          : '(' binding (',' binding )* ')' ;

binding           : (Id | '_') (':' type)? ;

modifier          : localModifier
                | accessModifier
                | 'override' ;

localModifier     : 'abstract'
                | 'final'
                | 'sealed'
                | 'implicit'
                | 'lazy' ;

accessModifier    : ('private' | 'protected') accessQualifier? ;

accessQualifier   : '[' (Id | 'this') ']' ;

annotation        : '@' simpleType argumentExprs* ;

constrAnnotation  : '@' simpleType argumentExprs ;

templateBody      : Nl? '{' selfType? templateStat (Semi templateStat)* '}' ;

templateStat      : import_
                | (annotation Nl?)* modifier* def
                | (annotation Nl?)* modifier* dcl
                |  expr
                | ;

selfType          : Id (':' type)? '=>'
                | 'this' ':' type '=>' ;

import_           : 'import' importExpr (',' importExpr)* ;

importExpr        : stableId '.' (Id | '_' | importSelectors) ;

importSelectors   : '{' (importSelector ',')* (importSelector | '_') '}' ;

importSelector    : Id ('=>' Id | '=>' '_') ;

dcl               : 'val' valDcl
                | 'var' varDcl
                | 'def' funDcl
                | 'type' Nl* typeDcl ;

valDcl            : ids ':' type ;

varDcl            : ids ':' type ;

funDcl            : funSig (':' type)? ;

funSig            : Id funTypeParamClause? paramClauses ;

typeDcl           : Id typeParamClause? ('>:' type)? ('<:' type)? ;

patVarDef         : 'val' patDef
                | 'var' varDef ;

def               : patVarDef
                | 'def' funDef
                | 'type' Nl* typeDef
                | tmplDef ;

patDef            : pattern2 (',' pattern2)* (':' type)* '=' expr ;

varDef            : patDef
                | ids ':' type '=' '_' ;

funDef            : funSig (':' type)? '=' expr
                | funSig Nl? '{' block '}'
                | 'this' paramClause paramClauses
                  ('=' constrExpr | Nl constrBlock) ;

typeDef           :  Id typeParamClause? '=' type ;

tmplDef           : 'case'? 'class' classDef
                | 'case' 'object' objectDef
                | 'trait' traitDef ;

classDef          : Id typeParamClause? constrAnnotation* accessModifier?
                  classParamClauses classTemplateOpt ;

traitDef          : Id typeParamClause? traitTemplateOpt ;

objectDef         : Id classTemplateOpt ;

classTemplateOpt  : 'extends' classTemplate | ('extends'? templateBody)? ;

traitTemplateOpt  : 'extends' traitTemplate | ('extends'? templateBody)? ;

classTemplate     : earlyDefs? classParents templateBody? ;

traitTemplate     : earlyDefs? traitParents templateBody? ;

classParents      : constr ('with' annotType)* ;

traitParents      : annotType ('with' annotType)* ;

constr            : annotType argumentExprs* ;

earlyDefs         : '{' (earlyDef (Semi earlyDef)*)? '}' 'with' ;

earlyDef          : (annotation Nl?)* modifier* patVarDef ;

constrExpr        : selfInvocation
                | constrBlock ;

constrBlock       : '{' selfInvocation (Semi blockStat)* '}' ;
selfInvocation    : 'this' argumentExprs+ ;

topStatSeq        : topStat (Semi topStat)* ;

topStat           : (annotation Nl?)* modifier* tmplDef
                | import_
                | packaging
                | packageObject
                | ;

packaging         : 'package' qualId Nl? '{' topStatSeq '}' ;

packageObject     : 'package' 'object' objectDef ;

compilationUnit   : ('package' qualId Semi)* topStatSeq ;

// Lexer
BooleanLiteral   :  'true' | 'false';
CharacterLiteral :  '\'' (PrintableChar | CharEscapeSeq) '\'';
StringLiteral    :  '"' StringElement* '"'
               |  '"""' MultiLineChars '"""';
SymbolLiteral    :  '\'' Plainid;
IntegerLiteral   :  (DecimalNumeral | HexNumeral) ('L' | 'l');
FloatingPointLiteral
               :  Digit+ '.' Digit+ ExponentPart? FloatType?
               |  '.' Digit+ ExponentPart? FloatType?
               |  Digit ExponentPart FloatType?
               |  Digit+ ExponentPart? FloatType;
Id               :  Plainid
               |  '`' StringLiteral '`';
Varid            :  Lower Idrest;
Nl               :  '\r'? '\n';
Semi             :  ';' |  Nl+;

Paren            :  '(' | ')' | '[' | ']' | '{' | '}';
Delim            :  '`' | '\'' | '"' | '.' | ';' | ',' ;

Comment          :  '/*' .*?  '*/'
               |  '//' .*? Nl;

// fragments
fragment UnicodeEscape    : '\\' 'u' 'u'? HexDigit HexDigit HexDigit HexDigit ;
fragment WhiteSpace       :  '\u0020' | '\u0009' | '\u000D' | '\u000A';
fragment Opchar           : PrintableChar // printableChar not matched by (whiteSpace | upper | lower |
                        // letter | digit | paren | delim | opchar | Unicode_Sm | Unicode_So)
                        ;
fragment Op               :  Opchar+;
fragment Plainid          :  Upper Idrest
                        |  Varid
                        |  Op;
fragment Idrest           :  (Letter | Digit)* ('_' Op)?;

fragment StringElement    :  '\u0020'| '\u0021'|'\u0023' .. '\u007F'  // (PrintableChar  Except '"')
                        |  CharEscapeSeq;
fragment MultiLineChars   :  ('"'? '"'? .*?)* '"'*;

fragment HexDigit         :  '0' .. '9'  |  'A' .. 'Z'  |  'a' .. 'z' ;
fragment FloatType        :  'F' | 'f' | 'D' | 'd';
fragment Upper            :  'A'  ..  'Z' | '$' | '_';  // and Unicode category Lu
fragment Lower            :  'a' .. 'z'; // and Unicode category Ll
fragment Letter           :  Upper | Lower; // and Unicode categories Lo, Lt, Nl
fragment ExponentPart     :  ('E' | 'e') ('+' | '-')? Digit+;
fragment PrintableChar    : '\u0020' .. '\u007F' ;
fragment CharEscapeSeq    : '\\' ('b' | 't' | 'n' | 'f' | 'r' | '"' | '\'' | '\\');
fragment DecimalNumeral   :  '0' | NonZeroDigit Digit*;
fragment HexNumeral       :  '0' 'x' HexDigit HexDigit+;
fragment Digit            :  '0' | NonZeroDigit;
fragment NonZeroDigit     :  '1' .. '9';

The above Scala grammar is same as what I got from Scala official website:

http://www.scala-lang.org/files/archive/spec/2.11/13-syntax-summary.html

Now I am trying to generate tokens for a scala file named scala.scala. Code for that file is below :

object HelloWorld {
  def main(args: Array[String]) {
    println("Hello, world!")
  }
}

I am running the following command to get the tokens :

grun Scala compilationUnit -tokens scala.scala

or

grun Scala expr -tokens scala.scala

or

grun Scala literal -tokens scala.scala

The output I got is:

[@0,0:18='object HelloWorld {',<68>,1:0]
[@1,19:19='\n',<70>,1:19]
[@2,20:52='  def main(args: Array[String]) {',<68>,2:0]
[@3,53:53='\n',<70>,2:33]
[@4,54:81='    println("Hello, world!")',<68>,3:0]
[@5,82:82='\n',<70>,3:28]
[@6,83:85='  }',<68>,4:0]
[@7,86:86='\n',<70>,4:3]
[@8,87:87='}',<14>,5:0]
[@9,88:88='\n',<70>,5:1]
[@10,89:88='<EOF>',<-1>,6:0]
line 1:19 no viable alternative at input 'object HelloWorld {\n'

Output in the tree form is like this :

(expr object HelloWorld { \n   def main(args: Array[String]) { \n     println("Hello, world!") \n   } \n } \n)

and output in the gui is like this :

That is completely stupid. In place of tokens it's giving me simply LOC . I tested it for the other languages Java and C and it works perfect. It gives me correct output/correct tokens which are expected for the following grammar links:

https://github.com/antlr/grammars-v4

Please correct me If I am doing something wrong because I am new to Antlr and Scala.

What I meant from token is all keywords,operands and all operators are there. According to me it's never meant to be simply Lines of Code.

Below is the Scala.tokens file which I got using Scala.g4(Scala Grammar with ANTLR).



T__0=1
T__1=2
T__2=3
T__3=4
T__4=5
T__5=6
T__6=7
T__7=8
T__8=9
T__9=10
T__10=11
T__11=12
T__12=13
T__13=14
T__14=15
T__15=16
T__16=17
T__17=18
T__18=19
T__19=20
T__20=21
T__21=22
T__22=23
T__23=24
T__24=25
T__25=26
T__26=27
T__27=28
T__28=29
T__29=30
T__30=31
T__31=32
T__32=33
T__33=34
T__34=35
T__35=36
T__36=37
T__37=38
T__38=39
T__39=40
T__40=41
T__41=42
T__42=43
T__43=44
T__44=45
T__45=46
T__46=47
T__47=48
T__48=49
T__49=50
T__50=51
T__51=52
T__52=53
T__53=54
T__54=55
T__55=56
T__56=57
T__57=58
T__58=59
T__59=60
T__60=61
BooleanLiteral=62
CharacterLiteral=63
StringLiteral=64
SymbolLiteral=65
IntegerLiteral=66
FloatingPointLiteral=67
Id=68
Varid=69
Nl=70
Semi=71
Paren=72
Delim=73
Comment=74
'-'=1
'null'=2
'.'=3
','=4
'this'=5
'super'=6
'['=7
']'=8
'=>'=9
'('=10
')'=11
'forSome'=12
'{'=13
'}'=14
'type'=15
'val'=16
'with'=17
'#'=18
':'=19
'_'=20
'*'=21
'implicit'=22
'if'=23
'else'=24
'while'=25
'try'=26
'catch'=27
'finally'=28
'do'=29
'for'=30
'yield'=31
'throw'=32
'return'=33
'new'=34
'='=35
'match'=36
'+'=37
'~'=38
'!'=39
'lazy'=40
'<-'=41
'case'=42
'|'=43
'@'=44
'>:'=45
'<:'=46
'<%'=47
'var'=48
'override'=49
'abstract'=50
'final'=51
'sealed'=52
'private'=53
'protected'=54
'import'=55
'def'=56
'class'=57
'object'=58
'trait'=59
'extends'=60
'package'=61

I am sure that these tokens are not correct. Can anyone make sure is this problem with the Scala Gramma or with the ANTLR?

解决方案

This file seem to parse fine now, so probably grammar has been fixed

这篇关于ANTLR 没有为 Scala 语法提供正确的输出标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆