有效地从文件输入中替换 ANTLRInputStream (ANTLRStringStream) 的字符串或字符 [英] Efficiently replacing a string or character from file-input for the ANTLRInputStream (ANTLRStringStream)

查看:99
本文介绍了有效地从文件输入中替换 ANTLRInputStream (ANTLRStringStream) 的字符串或字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如我在 Antlr greedy-option 中所述,我对可能包含字符串的语言有一些问题字符串文字中的 -literals,例如:

开始:"img src="test.jpg""

先生Bart Kiers 在我的帖子中提到,无法创建可以解决我的问题的语法.因此我决定将语言更改为:

开始:img src='test.jpg'"

在启动词法分析器(和解析器)之前.

文件输入可以是:

<前>开始:"aaa"aaa"啊啊"啊啊啊啊:END_START开始:"aaa"aaa"啊啊"啊一种啊":END_START开始:aaab"bbaaaa:END_START

所以我有一个解决方案,但它不正确.我有关于我的问题的两个问题(在代码下方).我的代码是:

public static void main(String[] args) {尝试{FileInputStream fis = new FileInputStream("src/file.txt");字符串准备代码 = 准备代码(fis);ANTLRStringStream in = new ANTLRStringStream(preparedCode);TestLexer lex = new TestLexer(in);CommonTokenStream 令牌 = new CommonTokenStream(lex);TestParser parser = new TestParser(tokens);parser.rule();}catch(IOException ex){ex.printStackTrace();} catch (RecognitionException e) {System.out.println(e.getMessage());System.exit(0);}}静态字符串准备代码(文件输入流输入){DataInputStream data = new DataInputStream(input);StringBuilder oldCode = new StringBuilder();StringBuffer newCode = new StringBuffer(oldCode.length());模式 pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");字符串 strLine;尝试{while ((strLine = data.readLine()) != null)oldCode.append(strLine + "\n");}抓住(IOException ex){ex.printStackTrace();}匹配器 matcher = pattern.matcher(oldCode);而 (matcher.find()) {//消除字符串文字中的引号String stringLiteral = matcher.group(2).replaceAll("\"", "'");字符串替换 = matcher.group(1) + stringLiteral + matcher.group(3);matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));}matcher.appendTail(newCode);System.out.println(newCode);返回 newCode.toString();}


我的问题是:

  • 哪种模式是正确的?重要的是,字符串文字可以定义在多于一行,例如"aaaa"\n"bbb",但总是以 "\n:END_START" 行结束.我的愿望是以下结果:
<前>开始:aaa'aaa''aaa'aaaaa":END_START开始:aaa'aaa''aa'aa一种啊":END_START开始:aaab'bbaaaa":END_START

我玩弄模式标志 Pattern.DOTALL

Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);

但这不是解决方案,因为在这种情况下它匹配所有内容...




- 如果我要使用正确的模式,还有其他有效的方法来修复它吗?



解决第一个问题
我必须对模式标志 Pattern.DOTALL 使用非贪婪的方法:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

解决方案

修复第一个问题
我必须对模式标志 Pattern.DOTALL 使用非贪婪的方法:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

代码:

 public static void main(String[] args) {尝试{FileInputStream fis = new FileInputStream("src/file.txt");字符串准备代码 = 准备代码(fis);ANTLRStringStream in = new ANTLRStringStream(preparedCode);TestLexer lex = new TestLexer(in);CommonTokenStream 令牌 = new CommonTokenStream(lex);TestParser parser = new TestParser(tokens);parser.rule();}catch(IOException ex){ex.printStackTrace();} catch (RecognitionException e) {System.out.println(e.getMessage());System.exit(0);}}静态字符串准备代码(文件输入流输入){DataInputStream data = new DataInputStream(input);StringBuilder oldCode = new StringBuilder();StringBuffer newCode = new StringBuffer(oldCode.length());Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);字符串 strLine;尝试{while ((strLine = data.readLine()) != null)oldCode.append(strLine + "\n");}抓住(IOException ex){ex.printStackTrace();}匹配器 matcher = pattern.matcher(oldCode);而 (matcher.find()) {System.out.println("++++"+matcher.group(2));//消除字符串文字中的引号String stringLiteral = matcher.group(2).replaceAll("\"", "'");字符串替换 = matcher.group(1) + stringLiteral + matcher.group(3);matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));}matcher.appendTail(newCode);System.out.println(newCode);返回 newCode.toString();}

那么有没有其他方法可以解决这个问题?

As I described in Antlr greedy-option I have some problems with a language that could include string-literals inside a string-literal, such as:

START: "img src="test.jpg""

Mr. Bart Kiers mentioned in my thread that it is not possible to create a grammar which could solve my problem. Therefore I decided to change the language to:

START: "img src='test.jpg'"

before starting the lexer (and parser).

File-input could be:

START: "aaa"aaa"
 "aaa"aaaaa"
:END_START

START: "aaa"aaa"
 "aaa"aa
 a
 aa"
:END_START

START: "aaab"bbaaaa"
:END_START

So I have got a solution, but it is not correct. I have two questions regarding to my problem (below the code). My code would be:

public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}


My questions are:

  • Which pattern would be the correct one? It is important that the string literal could be defined over more than one line e.g. "aaaa"\n"bbb", but always closes with an "\n:END_START" line. My wish would be the following result:

START: "aaa'aaa'
 'aaa'aaaaa"
:END_START

START: "aaa'aaa'
 'aa'aa
 a
 aa"
:END_START

START: "aaab'bbaaaa"
:END_START

I played around with the pattern flag Pattern.DOTALL

Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);

But this is not the solution, because in this case it matches everything...




- If I would use the correct pattern, is there any other efficient way how to fix it?



Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

解决方案

Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

The code:

 public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
        System.out.println("++++"+matcher.group(2));
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}

So is there any other way how to fix this problem?

这篇关于有效地从文件输入中替换 ANTLRInputStream (ANTLRStringStream) 的字符串或字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆