如何根据语法拆分输入 [英] How to split input according to the grammar

查看:27
本文介绍了如何根据语法拆分输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在尝试为路由器中生成的日志文件构建解析器.我们成功构建并能够在特定文件中打印有效语言.

We are trying to build a parser for log file generated in the router. We successfully build that and able to print the valid language in particular file.

但是如果根据语法输入是无效的,那么我们想把它打印在不同的文件中.我们尝试了一些东西,但它无法正常工作.你能建议我们这样做的方式吗?如果可能,请提供工作示例.

But if the input is not valid according to the grammar, then we want to print it in the different file. We tried something and it's not working properly. Can you please suggest the way by which we can do it? And if possible, kindly give the working example.

这是我们尝试过的.

我们没有使用任何特定的 IDE,只是一个文本编辑器.vANTLR-4.5

We are not using any specific IDE, just a text editor. vANTLR-4.5

我们的输入:(input.txt)

Our input: (input.txt)

Dec 24 15:38:13 103.199.144.14 firewall,info NAT: src-nat2 srcnat: in:(none) out:ether1-WAN, proto TCP (SYN), 10.20.114.212:59559->86.96.88.147:6882, len 52
Dec 24 15:38:13 103.199.144.14 firewall,info src-nat2: forward: in:<pppoe-PDR242> out:ether1-WAN, proto TCP (SYN), 10.20.124.8:50055->111.111.111.111:80, len 52

其中第一行是无效语言.并且不应该通过语法,因此必须打印到failure.txt中,但部分打印在success.txt文件中.

Where the first line is invalid language. And shouldn't pass through the grammar, and hence must print into failure.txt, But is partially printing in the success.txt file.

而第二行是有效的,并且在 success.txt 文件中正确打印,如下所示的输出文件所示.

Whereas the second line is valid, and is printing properly in the success.txt file as shown in the output file shown below.

输出,我们得到:(success.txt)

Output, that we are getting: (success.txt)

Dec 24 15:38:13, 103.199.144.14, .20.114.212, len, 52, , null
Dec 24 15:38:13, 103.199.144.14, pppoe-PDR242, TCP, 10.20.124.8:50055, 111.111.111.111:80, null

语法,我们使用的是:(sys.g)

Grammar, we are using:(sys.g)

grammar sys;

r: IDENT NUM time ip x+ user xout proto xuser ipfull xtra ipfull1 xtra1 (xipfull xtra ipfull2 xtra2 xipfull xtra3)*; 
time: NUM COLN NUM COLN NUM;
ip: NUM DOT NUM DOT NUM DOT NUM ;
ipfull: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;
ipfull1: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;
ipfull2: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;
xipfull: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;

x: (IDENT | COMMA | COLN | BRAC | HYPHN | NUM)+ LTHAN;
user: (IDENT | HYPHN | DOT | NUM)+ ;
xout: GTHAN IDENT+ COLN IDENT+ HYPHN IDENT+ (DOT IDENT)* COMMA IDENT;
proto: IDENT ;
xuser: (IDENT | BRAC | COMMA)+ ;
xtra: HYPHN GTHAN ;
xtra1: COMMA IDENT (BRAC | NUM);
xtra2: BRAC xtra;
xtra3: COMMA IDENT NUM;

IDENT: ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9')* ;
NUM: ('0'..'9')+ ;
LTHAN: '<' ;
GTHAN: '>' ;
COLN: ':';
COMMA: ',';
BRAC: '(' | ')' ;
HYPHN: '-';
DOT: '.';
WS : (' ' | '\t' | '\r' | '\n')+ -> skip ;

我们使用语法生成的解析器和词法分析器的主类.

Our main class where we are using Parser and lexer generated by grammar.

import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import java.io.*;
import org.antlr.v4.runtime.*;

public class SysLogCheck {
    public static void main(String[] args) throws Exception {

        long startTime = System.currentTimeMillis();

        BufferedReader br = new BufferedReader(new FileReader("test123.txt"));
        String s = null;
        //FileWriter out = new FileWriter("abc.txt");
        PrintWriter success = new PrintWriter(new FileWriter("success.csv"));
        PrintWriter failure = new PrintWriter(new FileWriter("failure.csv"));
        while((s=br.readLine())!=null)
        {
            ANTLRInputStream input = new ANTLRInputStream(s);
            sysLexer lexer = new sysLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            sysParser parser = new sysParser(tokens);
            ParseTree tree = parser.r();
            EvalVisitor visitor = new EvalVisitor();
            if((visitor.visit(tree)).equals("failure")) // here visit method of EvalVisitor class returns "failure" then the content should be written 
                                                        //in failure file and else it should be written in success file 
                                                        // but this is not working
            {
                failure.println(s);
            }
            else
            {
                success.println(visitor.visit(tree));
            }
        }
        failure.flush();
        failure.close();
        success.flush();
        success.close();

        long stopTime = System.currentTimeMillis();
        long elapsedTime = stopTime - startTime;

        System.out.println(elapsedTime);
    }
}

我们的 EvalVisitor(主要访客类)代码:

Our EvalVisitor (main visitor class)code:

import org.antlr.v4.runtime.tree.ParseTree;
import java.io.*;

public class EvalVisitor extends sysBaseVisitor
{
        class LogEntry {
        String ident1;
        String dayNum;
        String time;
        String ip;
        String ipfull;
        String user;
        String proto;
        String ipfull1;
        String ipfull2;
        String x;

      }


      static LogEntry logEntry;

      @Override
      public Object visit(ParseTree tree) {
        /* Setup logentry used by all visitors (this case, there is only a single visitor...)*/
        logEntry = new LogEntry();

        final Object o = super.visit(tree);

        //our logic to check whether our input contains "<" or not
        if((logEntry.x).contains("<") )
        {
            return logEntry.ident1 +" " + logEntry.dayNum + " " + logEntry.time+ ", " + logEntry.ip+ ", " + logEntry.user+ ", " + logEntry.proto+ ", " + logEntry.ipfull+ ", " + logEntry.ipfull1+ ", " + logEntry.ipfull2;
        }       
            return "failure"; //else return failure
      }

      StringBuilder stringBuilder;



      @Override
      public Object visitR(sysParser.RContext ctx) {
        logEntry.ident1 = ctx.IDENT().getText();
        logEntry.dayNum = ctx.NUM().getText();
        return super.visitR(ctx);
      }

      @Override
      public Object visitTime(sysParser.TimeContext ctx) {
        logEntry.time = ctx.getText();
        return super.visitTime(ctx);
      }

      @Override
      public Object visitIp(sysParser.IpContext ctx) {
        logEntry.ip = ctx.getText();
        return super.visitIp(ctx);
      }

      @Override
      public Object visitIpfull(sysParser.IpfullContext ctx) {
        logEntry.ipfull = ctx.getText();
        return super.visitIpfull(ctx);
      }

      @Override
      public Object visitIpfull1(sysParser.Ipfull1Context ctx) {
        logEntry.ipfull1 = ctx.getText();
        return super.visitIpfull1(ctx);
      }

      @Override
      public Object visitIpfull2(sysParser.Ipfull2Context ctx) {
        logEntry.ipfull2 = ctx.getText();
        return super.visitIpfull2(ctx);
      }

      @Override
      public Object visitXipfull(sysParser.XipfullContext ctx) {
        return super.visitXipfull(ctx);
      }

      @Override
      public Object visitX(sysParser.XContext ctx) {
        logEntry.x = ctx.getText();
        return super.visitX(ctx);
      }

      @Override
      public Object visitUser(sysParser.UserContext ctx) {
        logEntry.user = ctx.getText();
        return super.visitUser(ctx);
      }

      @Override
      public Object visitXuser(sysParser.XuserContext ctx) {
        return super.visitXuser(ctx);
      }

      @Override
      public Object visitXout(sysParser.XoutContext ctx) {
        return super.visitXout(ctx);
      }

      @Override
      public Object visitProto(sysParser.ProtoContext ctx) {
        logEntry.proto = ctx.getText();
        return super.visitProto(ctx);
      }

      @Override
      public Object visitXtra(sysParser.XtraContext ctx) {
        return super.visitXtra(ctx);
      }

      @Override
      public Object visitXtra1(sysParser.Xtra1Context ctx) {
        return super.visitXtra1(ctx);
      }

      @Override
      public Object visitXtra2(sysParser.Xtra2Context ctx) {
        return super.visitXtra2(ctx);
      }

      @Override
      public Object visitXtra3(sysParser.Xtra3Context ctx) {
        return super.visitXtra3(ctx);
      }   

 }

推荐答案

如果您要做的只是使用您认为有效的行中的数据创建一个文件,那么 ANTLR 可能是矫枉过正(我在邮件中提到了这一点列表线程).我在这里假设您可能想要对解析的结果做更多​​的事情(或者您只是真的想为此使用 ANTLR)

If all you're trying to do is create a file with data from the lines you consider valid, then ANTLR is probably overkill (I mentioned this in the mailing list thread). I'll assume here that you may want to do more with the parsed results (or that you just really want to use ANTLR for this)

我看到您已经在单独解析每个输入行.

I see that you're already parsing each input line individually.

您的 'r' 解析器规则似乎识别有效和无效"行.我建议收紧语法以定义您认为有效的行.如果您的语法只接受(即识别")有效行,则任何无效行都会抛出 RecognitionException.

It appears that your 'r' parser rule recognizes valid as well as "invalid" lines. I'd suggest tightening up the grammar to define what you consider to be a valid line. If your grammar only accepts (i.e. "recognizes") valid lines, then any invalid line will throw a RecognitionException.

您没有提到是什么使第 2 行有效而第 1 行无效,因此我无法真正就如何更正您的r"规则提出建议.

You don't mention what makes line 2 valid and line 1 invalid, so I can't really make a recommendation on how to correct your 'r' rule.

(对你的语法有很多批评,这表明你正在努力学习刚好够用"的 ANTLR 来过关.我不认为你要求对你的语法进行全面批评,所以我会跳过细节.)

(There's a lot to critique about your grammar, and it indicates that you're trying to learn "just enough" ANTLR to get by. I don't think you're asking for a full critique of your grammar, so I'll skip the details.)

在检查您的代码后,您似乎只想识别特定类型的日志行,并从这些行中捕获数据.如果这就是您要完成的任务,那么请查看 Java 正则表达式和捕获组.它会比使用 ANTLR 简单得多(而且我是 ANTLR 的忠实粉丝).

After examination of your code, it appears that you're just wanting to identify log lines of a particular type, and to capture data from those lines. If that's what you're trying to accomplish, then look into Java Regular expressions and capture groups. It'll be a lot simpler than using ANTLR (and I'm a pretty big fan of ANTLR).

这篇关于如何根据语法拆分输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆