如何在源代码中查找所有注释? [英] How to find all comments in the source code?
问题描述
有两种评论风格,C风格和C ++风格,如何识别它们?
There are two style of comments , C-style and C++ style, how to recognize them?
/* comments */
// comments
我可以随意使用任何方法第3库。
I am feel free to use any methods and 3rd-libraries.
推荐答案
为了可靠地查找Java源文件中的所有注释,我不会使用正则表达式,而是真正的lexer(aka tokenizer)。
To reliably find all comments in a Java source file, I wouldn't use regex, but a real lexer (aka tokenizer).
Java的两个流行选择是:
Two popular choices for Java are:
- JFlex: http://jflex.de
- ANTLR: http://www.antlr.org
- JFlex: http://jflex.de
- ANTLR: http://www.antlr.org
与流行的看法相反,ANTLR也可用于创建仅没有解析器的词法分析器。
Contrary to popular belief, ANTLR can also be used to create only a lexer without the parser.
这是一个快速的ANTLR演示。您需要在同一目录中包含以下文件:
Here's a quick ANTLR demo. You need the following files in the same directory:
- antlr-3.2.jar
- JavaCommentLexer.g(语法)
- Main.java
- Test.java(带有异国情调评论的有效(!)java源文件)
- antlr-3.2.jar
- JavaCommentLexer.g (the grammar)
- Main.java
- Test.java (a valid (!) java source file with exotic comments)
lexer grammar JavaCommentLexer;
options {
filter=true;
}
SingleLineComment
: FSlash FSlash ~('\r' | '\n')*
;
MultiLineComment
: FSlash Star .* Star FSlash
;
StringLiteral
: DQuote
( (EscapedDQuote)=> EscapedDQuote
| (EscapedBSlash)=> EscapedBSlash
| Octal
| Unicode
| ~('\\' | '"' | '\r' | '\n')
)*
DQuote {skip();}
;
CharLiteral
: SQuote
( (EscapedSQuote)=> EscapedSQuote
| (EscapedBSlash)=> EscapedBSlash
| Octal
| Unicode
| ~('\\' | '\'' | '\r' | '\n')
)
SQuote {skip();}
;
fragment EscapedDQuote
: BSlash DQuote
;
fragment EscapedSQuote
: BSlash SQuote
;
fragment EscapedBSlash
: BSlash BSlash
;
fragment FSlash
: '/' | '\\' ('u002f' | 'u002F')
;
fragment Star
: '*' | '\\' ('u002a' | 'u002A')
;
fragment BSlash
: '\\' ('u005c' | 'u005C')?
;
fragment DQuote
: '"'
| '\\u0022'
;
fragment SQuote
: '\''
| '\\u0027'
;
fragment Unicode
: '\\u' Hex Hex Hex Hex
;
fragment Octal
: '\\' ('0'..'3' Oct Oct | Oct Oct | Oct)
;
fragment Hex
: '0'..'9' | 'a'..'f' | 'A'..'F'
;
fragment Oct
: '0'..'7'
;
Main.java
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
JavaCommentLexer lexer = new JavaCommentLexer(new ANTLRFileStream("Test.java"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
for(Object o : tokens.getTokens()) {
CommonToken t = (CommonToken)o;
if(t.getType() == JavaCommentLexer.SingleLineComment) {
System.out.println("SingleLineComment :: " + t.getText().replace("\n", "\\n"));
}
if(t.getType() == JavaCommentLexer.MultiLineComment) {
System.out.println("MultiLineComment :: " + t.getText().replace("\n", "\\n"));
}
}
}
}
测试.java
Test.java
\u002f\u002a <- multi line comment start
multi
line
comment // not a single line comment
\u002A/
public class Test {
// single line "not a string"
String s = "\u005C" \242 not // a comment \\\" \u002f \u005C\u005C \u0022;
/*
regular multi line comment
*/
char c = \u0027"'; // the " is not the start of a string
char q1 = '\u005c''; // == '\''
char q2 = '\u005c\u0027'; // == '\''
char q3 = \u0027\u005c\u0027\u0027; // == '\''
char c4 = '\047';
String t = "/*";
\u002f\u002f another single line comment
String u = "*/";
}
现在,要运行演示,请执行:
Now, to run the demo, do:
bart@hades:~/Programming/ANTLR/Demos/JavaComment$ java -cp antlr-3.2.jar org.antlr.Tool JavaCommentLexer.g
bart@hades:~/Programming/ANTLR/Demos/JavaComment$ javac -cp antlr-3.2.jar *.java
bart@hades:~/Programming/ANTLR/Demos/JavaComment$ java -cp .:antlr-3.2.jar Main
你会看到以下内容被打印到控制台:
and you'll see the following being printed to the console:
MultiLineComment :: \u002f\u002a <- multi line comment start\nmulti\nline\ncomment // not a single line comment\n\u002A/
SingleLineComment :: // single line "not a string"
SingleLineComment :: // a comment \\\" \u002f \u005C\u005C \u0022;
MultiLineComment :: /*\n regular multi line comment\n */
SingleLineComment :: // the " is not the start of a string
SingleLineComment :: // == '\''
SingleLineComment :: // == '\''
SingleLineComment :: // == '\''
SingleLineComment :: \u002f\u002f another single line comment
编辑
你当然,可以用正则表达式创建一种词法分析器。下面的演示不处理源文件中的Unicode文字,但是:
EDIT
You can create a sort of lexer with regex yourself, of course. The following demo does not handle Unicode literals inside source files, however:
/* <- multi line comment start
multi
line
comment // not a single line comment
*/
public class Test2 {
// single line "not a string"
String s = "\" \242 not // a comment \\\" ";
/*
regular multi line comment
*/
char c = '"'; // the " is not the start of a string
char q1 = '\''; // == '\''
char c4 = '\047';
String t = "/*";
// another single line comment
String u = "*/";
}
Main2.java
Main2.java
import java.util.*;
import java.io.*;
import java.util.regex.*;
public class Main2 {
private static String read(File file) throws IOException {
StringBuilder b = new StringBuilder();
Scanner scan = new Scanner(file);
while(scan.hasNextLine()) {
String line = scan.nextLine();
b.append(line).append('\n');
}
return b.toString();
}
public static void main(String[] args) throws Exception {
String contents = read(new File("Test2.java"));
String slComment = "//[^\r\n]*";
String mlComment = "/\\*[\\s\\S]*?\\*/";
String strLit = "\"(?:\\\\.|[^\\\\\"\r\n])*\"";
String chLit = "'(?:\\\\.|[^\\\\'\r\n])+'";
String any = "[\\s\\S]";
Pattern p = Pattern.compile(
String.format("(%s)|(%s)|%s|%s|%s", slComment, mlComment, strLit, chLit, any)
);
Matcher m = p.matcher(contents);
while(m.find()) {
String hit = m.group();
if(m.group(1) != null) {
System.out.println("SingleLine :: " + hit.replace("\n", "\\n"));
}
if(m.group(2) != null) {
System.out.println("MultiLine :: " + hit.replace("\n", "\\n"));
}
}
}
}
如果你运行 Main2
,打印到控制台:
If you run Main2
, the following is printed to the console:
MultiLine :: /* <- multi line comment start\nmulti\nline\ncomment // not a single line comment\n*/
SingleLine :: // single line "not a string"
MultiLine :: /*\n regular multi line comment\n */
SingleLine :: // the " is not the start of a string
SingleLine :: // == '\''
SingleLine :: // another single line comment
这篇关于如何在源代码中查找所有注释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!