如何降低正则表达式的复杂性? [英] how to reduce complexity in regex?

查看:95
本文介绍了如何降低正则表达式的复杂性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 正则表达式 ,它会找到以美元表示的所有款项,例如 $ 290,USD240,$ 234.45,234.5 $,234.6usd

(\$)[0-9]+\.?([0-9]*)|usd+[0-9]+\.?([0-9]*)|[0-9]+\.?[0-9]*usd|[0-9]+\.?[0-9]*(\$)

这似乎有效,但我怎样才能避免我的复杂性正则表达式?

This seems to works, but how can i avoid the complexity in my regex?

推荐答案

通过折叠货币指标可以缩短正则表达式:

你可以说 USD $ 金额而不是 USD 金额 $ 金额。这导致以下正则表达式:

It is possible to make the regex a bit shorter by collapsing the currency indicators:
You can say USD OR $ amount instead of USD amount OR $ amount. This results in the following regex:

((\$|usd)[0-9]+\.?([0-9]*))|([0-9]+\.?[0-9]*(\$|usd))

我不确定你是否会发现这不那么复杂,但至少它更容易阅读,因为它更短

Im not sure if you'll find this less complex, but at least it's easier to read because it's shorter

字符集 [0-9] 也可以替换为 \d - 与任何匹配的字符类数字 - 使正则表达式更短。

执行此操作,正则表达式将如下所示:

The character set [0-9] can also be replaced by \d -- the character class which matches any digit -- making the regex even shorter.
Doing this, the regex will look as follows:

((\$|usd)\d+\.?\d*)|(\d+\.?\d*(\$|usd))



更新:




  • 根据< a href =https://stackoverflow.com/users/372239/toto> @Toto 这个正则表达式使用非捕获组会更高效(也删除了<必需的捕获组) a href =https://stackoverflow.com/users/622391/simon-m%E1%B6%9Ckenzie>@SimonMᶜKenzie):

    Update:

    • According to @Toto this regex would be more performant using non-capturing groups (also removed the not-necessary capture group as pointed out by @Simon MᶜKenzie):

      (?:\$|usd)\d+\.?\d*|\d+\.?\d*(?:\$|usd)
      


    • $。 0 正如 @Gangnus 指出的那样,正则表达式不匹配金额。我更新了正则表达式来解决这个问题:

    • $.0 like amounts are not matched by the regex as @Gangnus pointed out. I updated the regex to fix this:

      ((\$|usd)((\d+\.?\d*)|(\.\d+)))|(((\d+\.?\d*)|(\.\d+))(\$|usd))
      

      请注意,我更改了 \d + \。?\d * 进入((\d + \。?\d *)|(\.\ d +)):它现在要么匹配一个或更多数字,可选地后跟一个点,后跟零个或多个数字;或者一个点后跟一个或多个数字。

      Note that I changed \d+\.?\d* into ((\d+\.?\d*)|(\.\d+)): It now either matches one or more digits, optionally followed by a dot, followed by zero or more digits; OR a dot followed by one or more digits.

      没有不必要的捕获组并使用非捕获组:

      Without unnecessary capturing groups and using non-capturing groups:

      (?:\$|usd)(?:\d+\.?\d*|\.\d+)|(?:\d+\.?\d*|\.\d+)(?:\$|usd)
      


    • 这篇关于如何降低正则表达式的复杂性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆