正则表达式解析国际浮点数 [英] Regex to parse international floating-point numbers
问题描述
我需要一个正则表达式来获取可以是
I need a regex to get numeric values that can be
111.111,11
111,111.11
111,111
然后将整数和小数部分分开,以便我可以使用正确的语法存储在数据库中
And separate the integer and decimal portions so I can store in a DB with the correct syntax
我尝试了([0-9]{1,3}[,.]?)+([,.][0-9]{2})?
,但没有成功,因为它没有检测到第二部分:(
I tried ([0-9]{1,3}[,.]?)+([,.][0-9]{2})?
With no success since it doesn't detect the second part :(
结果应类似于:
111.111,11 -> $1 = 111111; $2 = 11
推荐答案
第一个答案:
此匹配#,###,##0.00
:
^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$
这与#.###.##0,00
匹配:
^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$
结合这两种方式(有更聪明/更短的编写方式,但是可以使用):
Joining the two (there are smarter/shorter ways to write it, but it works):
(?:^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$)
|(?:^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$)
您还可以将捕获组添加到最后一个逗号(或点),以检查使用了哪个捕获组.
You can also, add a capturing group to the last comma (or dot) to check which one was used.
第二个答案:
正如 Alan M 所指出的那样,我以前的解决方案可能无法拒绝缺少逗号的类似11,111111.00
的值,而没有这样的值.经过一些测试后,我达到了避免此问题的以下正则表达式:
As pointed by Alan M, my previous solution could fail to reject a value like 11,111111.00
where a comma is missing, but the other isn't. After some tests I reached the following regex that avoids this problem:
^[+-]?[0-9]{1,3}
(?:(?<comma>\,?)[0-9]{3})?
(?:\k<comma>[0-9]{3})*
(?:\.[0-9]{2})?$
这值得一些解释:
-
^[+-]?[0-9]{1,3}
匹配前1到3个数字;
^[+-]?[0-9]{1,3}
matches the first (1 to 3) digits;
(?:(?<comma>\,?)[0-9]{3})?
匹配可选的逗号,后跟更多的3位数字,并捕获称为逗号"的组中的逗号(或一个不存在的逗号);
(?:(?<comma>\,?)[0-9]{3})?
matches on optional comma followed by more 3 digits, and captures the comma (or the inexistence of one) in a group called 'comma';
(?:\k<comma>[0-9]{3})*
匹配零位至零位的逗号重复(如果有的话),后跟3位数字;
(?:\k<comma>[0-9]{3})*
matches zero-to-any repetitions of the comma used before (if any) followed by 3 digits;
(?:\.[0-9]{2})?$
匹配字符串末尾的可选分".
(?:\.[0-9]{2})?$
matches optional "cents" at the end of the string.
当然,这只会覆盖#,###,##0.00
(不包括#.###.##0,00
),但是您总是可以像上面一样加入正则表达式.
Of course, that will only cover #,###,##0.00
(not #.###.##0,00
), but you can always join the regexes like I did above.
最终答案:
现在,一个完整的解决方案.缩进和换行符仅供参考.
Now, a complete solution. Indentations and line breaks are there for readability only.
^[+-]?[0-9]{1,3}
(?:
(?:\,[0-9]{3})*
(?:.[0-9]{2})?
|
(?:\.[0-9]{3})*
(?:\,[0-9]{2})?
|
[0-9]*
(?:[\.\,][0-9]{2})?
)$
此变体捕获了使用的分隔符:
And this variation captures the separators used:
^[+-]?[0-9]{1,3}
(?:
(?:(?<thousand>\,)[0-9]{3})*
(?:(?<decimal>\.)[0-9]{2})?
|
(?:(?<thousand>\.)[0-9]{3})*
(?:(?<decimal>\,)[0-9]{2})?
|
[0-9]*
(?:(?<decimal>[\.\,])[0-9]{2})?
)$
修改1 :分"现在是可选的; 编辑2 :添加了文字; 编辑3 :添加了第二个解决方案; 编辑4 :添加了完整的解决方案; 修改5 :添加了标题; 编辑6 :已添加捕获功能; 编辑7 :最后一个答案分为两个版本;
edit 1: "cents" are now optional; edit 2: text added; edit 3: second solution added; edit 4: complete solution added; edit 5: headings added; edit 6: capturing added; edit 7: last answer broke in two versions;
这篇关于正则表达式解析国际浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!