使用 JavaCC 解析字符串 [英] Parsing Strings with JavaCC
问题描述
我正在尝试想出一种使用 JavaCC 解析字符串的好方法,而不会错误地将其与另一个标记匹配.这些字符串应该可以有空格、字母和数字.
I'm trying to think of a good way to parse strings using JavaCC without mistakenly matching it to another token. These strings should be able to have spaces, letters, and numbers.
我的标识符和数字令牌如下:
My identifier and number token are as follows:
<IDENTIFIER: (["a"-"z", "A"-"Z"])+>
<NUMBER: (["0"-"9"])+>
我当前的字符串标记是:
My current string token is:
<STRING: "\"" (<IDENTIFIER> | <NUMBERS> | " ")+ "\"">
理想情况下,我只想保存引号内的内容.我有一个单独的文件,我在其中实际保存变量和值.我应该删除那里的引号吗?
Ideally, I want to only save the stuff that's inside of the quotes. I have a separate file in which I do the actual saving of variables and values. Should I remove the quotes in there?
我最初在解析器文件中有一个这样的方法:
I originally had a method in the parser file like this:
variable=<INDENTIFIER> <ASSIGN> <QUOTE> message=<IDENTIFIER> <QUOTE>
{File.saveVariable(variable.image, message.image);}
但是,正如您可能猜到的,这不允许空格或数字.对于变量名等标识符,我只想允许字母.
But, as you might guess, this didn't allow for spaces—or numbers for that matter. For identifiers such as variable names, I only want to allow letters.
所以,我只想就如何捕获字符串文字获得一些建议.特别是,我想制作字符串,例如:
So, I'd just like to get some advice on how I could go about capturing string literals. In particular, I'd like to make strings such as:
" hello", "hello ", " hello " and "\nhello", "hello\n", "\nhello\n"
在我的语法中有效.
推荐答案
当传递第一个 "
时,您的解析器希望进入 STRING 状态并将其保留在下一个(奖励:未加引号)"
.
When passing the first "
your parser would like to go into a STRING STATE and leave it upon the next (Bonus: unquoted) "
.
喜欢:
TOKEN:
{
<QUOTE:"\""> : STRING_STATE
}
<STRING_STATE> MORE:
{
"\\" : ESC_STATE
}
<STRING_STATE> TOKEN:
{
<ENDQUOTE:<QUOTE>> : DEFAULT
| <CHAR:~["\"","\\"]>
}
<ESC_STATE> TOKEN:
{
<CNTRL_ESC:["\"","\\","/","b","f","n","r","t"]> : STRING_STATE
}
你可以这样使用:
/**
* Match a quoted string.
*/
String string() :
{
StringBuilder builder = new StringBuilder();
}
{
<QUOTE> ( getChar(builder) )* <ENDQUOTE>
{
return builder.toString();
}
}
/**
* Match char inside quoted string.
*/
void getChar(StringBuilder builder):
{
Token t;
}
{
( t = <CHAR> | t = <CNTRL_ESC> )
{
if (t.image.length() < 2)
{
// CHAR
builder.append(t.image.charAt(0));
}
else if (t.image.length() < 6)
{
// ESC
char c = t.image.charAt(1);
switch (c)
{
case 'b': builder.append((char) 8); break;
case 'f': builder.append((char) 12); break;
case 'n': builder.append((char) 10); break;
case 'r': builder.append((char) 13); break;
case 't': builder.append((char) 9); break;
default: builder.append(c);
}
}
}
}
HTH.
这篇关于使用 JavaCC 解析字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!