如何降低正则表达式的复杂性? [英] how to reduce complexity in regex?
问题描述
我有 正则表达式 ,它会找到以美元表示的所有款项,例如 $ 290,USD240,$ 234.45,234.5 $,234.6usd
(\$)[0-9]+\.?([0-9]*)|usd+[0-9]+\.?([0-9]*)|[0-9]+\.?[0-9]*usd|[0-9]+\.?[0-9]*(\$)
这似乎有效,但我怎样才能避免我的复杂性正则表达式?
This seems to works, but how can i avoid the complexity in my regex?
推荐答案
通过折叠货币指标可以缩短正则表达式:
你可以说 USD
或 $
金额而不是 USD
金额或 $
金额。这导致以下正则表达式:
It is possible to make the regex a bit shorter by collapsing the currency indicators:
You can say USD
OR $
amount instead of USD
amount OR $
amount. This results in the following regex:
((\$|usd)[0-9]+\.?([0-9]*))|([0-9]+\.?[0-9]*(\$|usd))
我不确定你是否会发现这不那么复杂,但至少它更容易阅读,因为它更短
Im not sure if you'll find this less complex, but at least it's easier to read because it's shorter
字符集 [0-9]
也可以替换为 \d
- 与任何匹配的字符类数字 - 使正则表达式更短。
执行此操作,正则表达式将如下所示:
The character set [0-9]
can also be replaced by \d
-- the character class which matches any digit -- making the regex even shorter.
Doing this, the regex will look as follows:
((\$|usd)\d+\.?\d*)|(\d+\.?\d*(\$|usd))
更新:
-
根据< a href =https://stackoverflow.com/users/372239/toto> @Toto 这个正则表达式使用非捕获组会更高效(也删除了<必需的捕获组) a href =https://stackoverflow.com/users/622391/simon-m%E1%B6%9Ckenzie>@SimonMᶜKenzie):
Update:
According to @Toto this regex would be more performant using non-capturing groups (also removed the not-necessary capture group as pointed out by @Simon MᶜKenzie):
(?:\$|usd)\d+\.?\d*|\d+\.?\d*(?:\$|usd)
-
$。 0
正如 @Gangnus 指出的那样,正则表达式不匹配金额。我更新了正则表达式来解决这个问题: $.0
like amounts are not matched by the regex as @Gangnus pointed out. I updated the regex to fix this:((\$|usd)((\d+\.?\d*)|(\.\d+)))|(((\d+\.?\d*)|(\.\d+))(\$|usd))
请注意,我更改了
\d + \。?\d *
进入((\d + \。?\d *)|(\.\ d +))
:它现在要么匹配一个或更多数字,可选地后跟一个点,后跟零个或多个数字;或者一个点后跟一个或多个数字。Note that I changed
\d+\.?\d*
into((\d+\.?\d*)|(\.\d+))
: It now either matches one or more digits, optionally followed by a dot, followed by zero or more digits; OR a dot followed by one or more digits.没有不必要的捕获组并使用非捕获组:
Without unnecessary capturing groups and using non-capturing groups:
(?:\$|usd)(?:\d+\.?\d*|\.\d+)|(?:\d+\.?\d*|\.\d+)(?:\$|usd)
这篇关于如何降低正则表达式的复杂性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!