解析除关键字之外的标识符 [英] parsing identifiers except keywords
问题描述
我正在努力编写一个标识符解析器,该解析器解析不是关键字的字母数字字符串. 关键字全部在表格中:
I am struggeling writing a identifier parser, which parses a alphanum string which is not a keyword. the keywords are all in a table:
struct keywords_t : x3::symbols<x3::unused_type> {
keywords_t() {
add("for", x3::unused)
("in", x3::unused)
("while", x3::unused);
}
} const keywords;
,标识符的解析器应为:
and the parser for a identifier should be this:
auto const identifier_def =
x3::lexeme[
(x3::alpha | '_') >> *(x3::alnum | '_')
];
现在我尝试将它们组合在一起,因此标识符解析器在解析关键字时失败. 我这样尝试过:
now i try to combine these so an identifier parser fails on parsing a keyword. I tried it like this:
auto const identifier_def =
x3::lexeme[
(x3::alpha | '_') >> *(x3::alnum | '_')
]-keywords;
和这个:
auto const identifier_def =
x3::lexeme[
(x3::alpha | '_') >> *(x3::alnum | '_') - keywords
];
它适用于大多数输入,但是如果字符串以诸如int, whilefoo, forbar
之类的关键字开头,则解析器将无法解析此字符串.
我如何才能正确解析该解析器?
it works on most inputs but if a string starts with a keyword like like int, whilefoo, forbar
the parser fails to parse this strings.
how can i get this parser correct?
推荐答案
您的问题是由Spirit中差异运算符的语义引起的.当您拥有a - b
精神时,将执行以下操作:
Your problem is caused by the semantics of the difference operator in Spirit. When you have a - b
Spirit does the following:
- 检查
b
是否匹配:- 如果是,则
a - b
失败并且没有任何解析. - 如果
b
失败,则检查a
是否匹配:- 如果
a
失败,则a - b
失败并且不解析任何内容. - 如果
a
成功,则a - b
成功并解析任何a
解析的内容.
- check whether
b
matches:- if it does,
a - b
fails and nothing is parsed. - if
b
fails then it checks whethera
matches:- if
a
fails,a - b
fails and nothing is parsed. - if
a
succeeds,a - b
succeeds and parses whatevera
parses.
在您的情况下(
unchecked_identifier - keyword
),只要标识符以关键字开头,keyword
将匹配并且解析器将失败.因此,您需要将keyword
交换为与某个匹配的东西交换,只要传递了一个不同的关键字,该交换便会匹配,但是只要该关键字后面跟着其他东西,交换它就会失败.not predicate
(!
)可以帮助您解决问题.In your case (
unchecked_identifier - keyword
) as long as the identifier starts with a keyword,keyword
will match and your parser will fail. So you need to exchangekeyword
with something that matches whenever a distinct keyword is passed, but fails whenever the keyword is followed by something else. Thenot predicate
(!
) can help with that.auto const distinct_keyword = x3::lexeme[ keyword >> !(x3::alnum | '_') ];
完整示例(在Coliru上运行):
//#define BOOST_SPIRIT_X3_DEBUG #include <iostream> #include <boost/spirit/home/x3.hpp> namespace parser { namespace x3 = boost::spirit::x3; struct keywords_t : x3::symbols<x3::unused_type> { keywords_t() { add("for", x3::unused) ("in", x3::unused) ("while", x3::unused); } } const keywords; x3::rule<struct identifier_tag,std::string> const identifier ("identifier"); auto const distinct_keyword = x3::lexeme[ keywords >> !(x3::alnum | '_') ]; auto const unchecked_identifier = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))]; auto const identifier_def = unchecked_identifier - distinct_keyword; //This should also work: //auto const identifier_def = !distinct_keyword >> unchecked_identifier BOOST_SPIRIT_DEFINE(identifier); bool is_identifier(const std::string& input) { auto iter = std::begin(input), end= std::end(input); bool result = x3::phrase_parse(iter,end,identifier,x3::space); return result && iter==end; } } int main() { std::cout << parser::is_identifier("fortran") << std::endl; std::cout << parser::is_identifier("for") << std::endl; std::cout << parser::is_identifier("integer") << std::endl; std::cout << parser::is_identifier("in") << std::endl; std::cout << parser::is_identifier("whileechoyote") << std::endl; std::cout << parser::is_identifier("while") << std::endl; }
这篇关于解析除关键字之外的标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- if
- if it does,
- 如果
- 如果是,则