不区分大小写的关键字匹配 [英] Case-insensitive keyword matching

查看:73
本文介绍了不区分大小写的关键字匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写用于解析计算机语言的语法,可以与 解析::Eyapp.这是一个 Perl 包,可简化为常规语言编写解析器.它类似于 yacc 和其他 LALR 解析器生成器,但有一些有用的扩展,例如根据正则表达式定义标记.

I'm writing a grammar for parsing a computer language, that can be used with Parse::Eyapp. This is a Perl package that simplifies writing parsers for regular languages. It is similar to yacc and other LALR parser generators, but has some useful extensions, like defining tokens in terms of regular expressions.

我想解析的语言使用关键字来表示部分和描述控制流.它还支持用作数据占位符的标识符.标识符永远不能与关键字同名.

The language I want to parse uses keywords to denote sections and describe control flow. It also supports identifiers that serve as placeholders for data. An identifier can never have the same name as a keyword.

现在,棘手的部分来了:我需要将关键字与标识符分开,但它们可能看起来很相似,所以我需要一个正则表达式模式,它不区分大小写地匹配标识符,仅此而已.

Now, here comes the tricky part: I need to separate keywords from identifiers, but they may look similar, so I need a regular expression pattern that matches an identifier case-insensitively, and nothing else.

我想出的解决方案如下:

The solution I came up with is the following:

  1. 每个关键字由以下形式的标记标识:/((?i)keyword)(?!\w)/
    • (?i) 将对以下子模式应用不区分大小写的匹配
    • (?!\w) 不接受关键字后的任何单词字符(a-z、0-9 等)
    • 这些字符将不会成为匹配的一部分
  1. Each keyword is identified by a token of the following form: /((?i)keyword)(?!\w)/
    • (?i) will apply case-insensitive matching for the following subpattern
    • (?!\w) will not accept any word characters (a-z, 0-9, etc.) after the keyword
    • those characters will not be part of the match

到目前为止,我提出的标记定义和部分语法运行良好,但仍有很多工作要做.不过,这不是我的问题.

The token definitions and part of the grammar I came up with work well so far, but there is still a lot to do. However, that is not my question.

我想问的是,我是否在正确的轨道上;是否有更好、更简单的正则表达式来匹配这些关键字?我是否应该停止并完全使用不同的语言解析方法?

What I wanted to ask is, am I on the right track here; are there better, simpler regular expressions for matching those keywords? Should I stop and use a different approach for language parsing altogether?

顺便说一下,使用分词器匹配整个字符串而不是单个字符的想法来自 Parse::Eyapp 文档.我首先从逐个字符的语法开始,但这种方法不是很优雅,而且似乎与解析器生成器的灵活性相矛盾.写起来也很麻烦.

The idea of using the tokenizer to match whole strings instead of single characters came from the Parse::Eyapp documentation, by the way. I started with a character-by-character grammar first, but that approach wasn't very elegant and seems to contradict the flexible nature of the parser generator. It was very cumbersome to write, too.

推荐答案

如果你想解析一种语言,Marpa 也许更适合你.这是一个教程.您还可以使用 regexp 语法.

If you would like to parse a language, Marpa maybe much better suited for you. Here's a tutorial. You could also use regexp grammars.

这篇关于不区分大小写的关键字匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆