正则表达式的替代方案 [英] Alternatives to Regular Expressions

查看:370
本文介绍了正则表达式的替代方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组字符串,其中嵌入了数字。它们看起来像/ cal / long / 3/4/145:999或/ pa / metrics / CosmicRay / 24:4:bgp:EnergyKurtosis。我想要一个表达式解析器

I have a set of strings with numbers embedded in them. They look something like /cal/long/3/4/145:999 or /pa/metrics/CosmicRay/24:4:bgp:EnergyKurtosis. I'd like to have an expression parser that is


  • 易于使用。举几个例子,某人应该能够形成一个新的表达。我希望最终用户能够形成新的表达式来查询这组字符串。一些潜在用户是软件工程师,其他人是测试人员,一些是科学家。

  • 允许对数字进行约束。类似于'/ cal / long / 3/4/143:#> 100&< 1110'来指定带有'/ cal / long / 3/4/143:'的字符串前缀,然后是(100, 1110)是预期的。

  • 支持'|'和。因此表达式'/ cal /(long | short)/ 3/4 / '将匹配'/ cal / long / 3/4/1:2'以及'/ cal / short / 3/4 / 1:2'。

  • 有Java实现可用或易于用Java实现。

  • Easy to use. Given a few examples someone should be able to form a new expression. I want end users to be able to form new expressions to query this set of strings. Some of the potential users are software engineers, others are testers and some are scientists.
  • Allows for constraints on numbers. Something like '/cal/long/3/4/143:#>100&<1110' to specify that a string prefix with '/cal/long/3/4/143:' and then a number between (100,1110) is expected.
  • Supports '|' and . So the expression '/cal/(long|short)/3/4/' would match '/cal/long/3/4/1:2' as well as '/cal/short/3/4/1:2'.
  • Has a Java implementation available or would be easy to implement in Java.

有趣的替代想法会很有用。我也很乐意只需要实现我需要的正则表达式子集以及数值约束。

Interesting alternative ideas would be useful. I'm also entertaining the idea of just implementing the subset of regular expressions that I need plus the numerical constraints.

谢谢!

推荐答案

我倾向于同意Rex M,尽管你对数值约束的第二个要求使事情复杂化。除非你只允许非常基本的约束,否则我不知道在正则表达式中简洁地表达它的方法。如果有这样的方式,请忽略我的其余答案并遵循其他建议。 :)

I'm inclined to agree with Rex M, although your second requirement for numerical constraints complicates things. Unless you only allowed very basic constraints, I'm not aware of a way to succinctly express that in a regular expression. If there is such a way, please disregard the rest of my answer and follow the other suggestions here. :)

你可能想要考虑一个解析器生成器 - 比如经典的lex和yacc。我对Java选择并不熟悉,但这里有一个列表:

You might want to consider a parser generator - things like the classic lex and yacc. I'm not really familiar with the Java choices, but here's a list:

http://java-source.net/open-source/parser-generators

如果你'不熟悉,标准的方法是首先创建一个词法分析器,将你的字符串变成代币。然后你将这些标记传递给一个解析器,将你的语法应用到它们并吐出某种结果。

If you're not familiar, the standard approach would be to first create a lexer that turns your strings into tokens. Then you would pass those tokens onto a parser that applies your grammar to them and spits out some kind of result.

在你的情况下,我设想解析器导致一个组合正则表达式和附加条件。对于数值约束示例,它可能会为您提供正则表达式 \ / cal / long / 3/4/143:(\d +)\ 以及约束适用于要求数字介于100和1100之间的第一个分组( \d + 部分)。然后,您可以将RE应用于候选字符串,并且将约束应用于那些候选人以找到你的匹配。

In your case, I envision the parser resulting in a combination of a regular expression and additional conditions. For your numerical constraint example, it might give you the regular expression \/cal/long/3/4/143:(\d+)\ and a constraint to apply to the first grouping (the \d+ portion) that requires that the number lie between 100 and 1100. You'd then apply the RE to your strings for candidates, and apply the constraint to those candidates to find your matches.

这是一个非常复杂的方法,所以希望有一个更简单的方法。我希望至少能给你一些想法。

It's a pretty complicated approach, so hopefully there's a simpler way. I hope that gives you some ideas, at least.

这篇关于正则表达式的替代方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆