有效地从文件输入中替换 ANTLRInputStream (ANTLRStringStream) 的字符串或字符 [英] Efficiently replacing a string or character from file-input for the ANTLRInputStream (ANTLRStringStream)
问题描述
正如我在 Antlr greedy-option 中所述,我对可能包含字符串的语言有一些问题字符串文字中的 -literals,例如:
开始:"img src="test.jpg""
先生Bart Kiers 在我的帖子中提到,无法创建可以解决我的问题的语法.因此我决定将语言更改为:
开始:img src='test.jpg'"
在启动词法分析器(和解析器)之前.
文件输入可以是:
所以我有一个解决方案,但它不正确.我有关于我的问题的两个问题(在代码下方).我的代码是:
public static void main(String[] args) {尝试{FileInputStream fis = new FileInputStream("src/file.txt");字符串准备代码 = 准备代码(fis);ANTLRStringStream in = new ANTLRStringStream(preparedCode);TestLexer lex = new TestLexer(in);CommonTokenStream 令牌 = new CommonTokenStream(lex);TestParser parser = new TestParser(tokens);parser.rule();}catch(IOException ex){ex.printStackTrace();} catch (RecognitionException e) {System.out.println(e.getMessage());System.exit(0);}}静态字符串准备代码(文件输入流输入){DataInputStream data = new DataInputStream(input);StringBuilder oldCode = new StringBuilder();StringBuffer newCode = new StringBuffer(oldCode.length());模式 pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");字符串 strLine;尝试{while ((strLine = data.readLine()) != null)oldCode.append(strLine + "\n");}抓住(IOException ex){ex.printStackTrace();}匹配器 matcher = pattern.matcher(oldCode);而 (matcher.find()) {//消除字符串文字中的引号String stringLiteral = matcher.group(2).replaceAll("\"", "'");字符串替换 = matcher.group(1) + stringLiteral + matcher.group(3);matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));}matcher.appendTail(newCode);System.out.println(newCode);返回 newCode.toString();}
我的问题是:
哪种模式是正确的?重要的是,字符串文字可以定义在多于一行,例如"aaaa"\n"bbb",但总是以 "\n:END_START" 行结束.我的愿望是以下结果:
我玩弄模式标志 Pattern.DOTALL
Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
但这不是解决方案,因为在这种情况下它匹配所有内容...
- 如果我要使用正确的模式,还有其他有效的方法来修复它吗?
解决第一个问题
我必须对模式标志 Pattern.DOTALL 使用非贪婪的方法:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
预>解决方案修复第一个问题
我必须对模式标志 Pattern.DOTALL 使用非贪婪的方法:Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
预>代码:
public static void main(String[] args) {尝试{FileInputStream fis = new FileInputStream("src/file.txt");字符串准备代码 = 准备代码(fis);ANTLRStringStream in = new ANTLRStringStream(preparedCode);TestLexer lex = new TestLexer(in);CommonTokenStream 令牌 = new CommonTokenStream(lex);TestParser parser = new TestParser(tokens);parser.rule();}catch(IOException ex){ex.printStackTrace();} catch (RecognitionException e) {System.out.println(e.getMessage());System.exit(0);}}静态字符串准备代码(文件输入流输入){DataInputStream data = new DataInputStream(input);StringBuilder oldCode = new StringBuilder();StringBuffer newCode = new StringBuffer(oldCode.length());Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);字符串 strLine;尝试{while ((strLine = data.readLine()) != null)oldCode.append(strLine + "\n");}抓住(IOException ex){ex.printStackTrace();}匹配器 matcher = pattern.matcher(oldCode);而 (matcher.find()) {System.out.println("++++"+matcher.group(2));//消除字符串文字中的引号String stringLiteral = matcher.group(2).replaceAll("\"", "'");字符串替换 = matcher.group(1) + stringLiteral + matcher.group(3);matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));}matcher.appendTail(newCode);System.out.println(newCode);返回 newCode.toString();}
那么有没有其他方法可以解决这个问题?
As I described in Antlr greedy-option I have some problems with a language that could include string-literals inside a string-literal, such as:
START: "img src="test.jpg""Mr. Bart Kiers mentioned in my thread that it is not possible to create a grammar which could solve my problem. Therefore I decided to change the language to:
START: "img src='test.jpg'"before starting the lexer (and parser).
File-input could be:START: "aaa"aaa" "aaa"aaaaa" :END_START START: "aaa"aaa" "aaa"aa a aa" :END_START START: "aaab"bbaaaa" :END_STARTSo I have got a solution,
but it is not correct. I havetwoquestions regarding to my problem (below the code). My code would be:public static void main(String[] args) { try{ FileInputStream fis = new FileInputStream("src/file.txt"); String preparedCode = preparingCode(fis); ANTLRStringStream in = new ANTLRStringStream(preparedCode); TestLexer lex = new TestLexer(in); CommonTokenStream tokens = new CommonTokenStream(lex); TestParser parser = new TestParser(tokens); parser.rule(); }catch(IOException ex){ ex.printStackTrace(); } catch (RecognitionException e) { System.out.println(e.getMessage()); System.exit(0); } } static String preparingCode(FileInputStream input){ DataInputStream data = new DataInputStream(input); StringBuilder oldCode = new StringBuilder(); StringBuffer newCode = new StringBuffer(oldCode.length()); Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)"); String strLine; try{ while ((strLine = data.readLine()) != null) oldCode.append(strLine + "\n"); } catch(IOException ex){ ex.printStackTrace(); } Matcher matcher = pattern.matcher(oldCode); while (matcher.find()) { //eliminate quotes inside a string literal String stringLiteral = matcher.group(2).replaceAll("\"", "'"); String replace = matcher.group(1) + stringLiteral + matcher.group(3); matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace)); } matcher.appendTail(newCode); System.out.println(newCode); return newCode.toString(); }
My questions are:
Which pattern would be the correct one? It is important that the string literal could be defined over more than one line e.g. "aaaa"\n"bbb", but always closes with an "\n:END_START" line. My wish would be the following result:
START: "aaa'aaa' 'aaa'aaaaa" :END_START START: "aaa'aaa' 'aa'aa a aa" :END_START START: "aaab'bbaaaa" :END_START
I played around with the pattern flag Pattern.DOTALL
Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
But this is not the solution, because in this case it matches everything...
- If I would use the correct pattern, is there any other efficient way how to fix it?
Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
The code:
public static void main(String[] args) {
try{
FileInputStream fis = new FileInputStream("src/file.txt");
String preparedCode = preparingCode(fis);
ANTLRStringStream in = new ANTLRStringStream(preparedCode);
TestLexer lex = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lex);
TestParser parser = new TestParser(tokens);
parser.rule();
}catch(IOException ex){
ex.printStackTrace();
} catch (RecognitionException e) {
System.out.println(e.getMessage());
System.exit(0);
}
}
static String preparingCode(FileInputStream input){
DataInputStream data = new DataInputStream(input);
StringBuilder oldCode = new StringBuilder();
StringBuffer newCode = new StringBuffer(oldCode.length());
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
String strLine;
try{
while ((strLine = data.readLine()) != null)
oldCode.append(strLine + "\n");
}
catch(IOException ex){
ex.printStackTrace();
}
Matcher matcher = pattern.matcher(oldCode);
while (matcher.find()) {
System.out.println("++++"+matcher.group(2));
//eliminate quotes inside a string literal
String stringLiteral = matcher.group(2).replaceAll("\"", "'");
String replace = matcher.group(1) + stringLiteral + matcher.group(3);
matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
}
matcher.appendTail(newCode);
System.out.println(newCode);
return newCode.toString();
}
So is there any other way how to fix this problem?
这篇关于有效地从文件输入中替换 ANTLRInputStream (ANTLRStringStream) 的字符串或字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!