正则表达式解析国际浮点数 [英] Regex to parse international floating-point numbers

查看:85
本文介绍了正则表达式解析国际浮点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个正则表达式来获取可以是

I need a regex to get numeric values that can be

111.111,11

111,111.11

111,111

然后将整数和小数部分分开,以便我可以使用正确的语法存储在数据库中

And separate the integer and decimal portions so I can store in a DB with the correct syntax

我尝试了([0-9]{1,3}[,.]?)+([,.][0-9]{2})?,但没有成功,因为它没有检测到第二部分:(

I tried ([0-9]{1,3}[,.]?)+([,.][0-9]{2})? With no success since it doesn't detect the second part :(

结果应类似于:

111.111,11 -> $1 = 111111; $2 = 11

推荐答案

第一个答案:

此匹配#,###,##0.00:

^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$

这与#.###.##0,00匹配:

^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$

结合这两种方式(有更聪明/更短的编写方式,但是可以使用):

Joining the two (there are smarter/shorter ways to write it, but it works):

(?:^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$)
|(?:^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$)

您还可以将捕获组添加到最后一个逗号(或点),以检查使用了哪个捕获组.

You can also, add a capturing group to the last comma (or dot) to check which one was used.

第二个答案:

正如 Alan M 所指出的那样,我以前的解决方案可能无法拒绝缺少逗号的类似11,111111.00的值,而没有这样的值.经过一些测试后,我达到了避免此问题的以下正则表达式:

As pointed by Alan M, my previous solution could fail to reject a value like 11,111111.00 where a comma is missing, but the other isn't. After some tests I reached the following regex that avoids this problem:

^[+-]?[0-9]{1,3}
(?:(?<comma>\,?)[0-9]{3})?
(?:\k<comma>[0-9]{3})*
(?:\.[0-9]{2})?$

这值得一些解释:

  • ^[+-]?[0-9]{1,3}匹配前1到3个数字;

  • ^[+-]?[0-9]{1,3} matches the first (1 to 3) digits;

(?:(?<comma>\,?)[0-9]{3})?匹配可选的逗号,后跟更多的3位数字,并捕获称为逗号"的组中的逗号(或一个不存在的逗号);

(?:(?<comma>\,?)[0-9]{3})? matches on optional comma followed by more 3 digits, and captures the comma (or the inexistence of one) in a group called 'comma';

(?:\k<comma>[0-9]{3})*匹配零位至零位的逗号重复(如果有的话),后跟3位数字;

(?:\k<comma>[0-9]{3})* matches zero-to-any repetitions of the comma used before (if any) followed by 3 digits;

(?:\.[0-9]{2})?$匹配字符串末尾的可选分".

(?:\.[0-9]{2})?$ matches optional "cents" at the end of the string.

当然,这只会覆盖#,###,##0.00(不包括#.###.##0,00),但是您总是可以像上面一样加入正则表达式.

Of course, that will only cover #,###,##0.00 (not #.###.##0,00), but you can always join the regexes like I did above.

最终答案:

现在,一个完整的解决方案.缩进和换行符仅供参考.

Now, a complete solution. Indentations and line breaks are there for readability only.

^[+-]?[0-9]{1,3}
(?:
    (?:\,[0-9]{3})*
    (?:.[0-9]{2})?
|
    (?:\.[0-9]{3})*
    (?:\,[0-9]{2})?
|
    [0-9]*
    (?:[\.\,][0-9]{2})?
)$

此变体捕获了使用的分隔符:

And this variation captures the separators used:

^[+-]?[0-9]{1,3}
(?:
    (?:(?<thousand>\,)[0-9]{3})*
    (?:(?<decimal>\.)[0-9]{2})?
|
    (?:(?<thousand>\.)[0-9]{3})*
    (?:(?<decimal>\,)[0-9]{2})?
|
    [0-9]*
    (?:(?<decimal>[\.\,])[0-9]{2})?
)$


修改1 :分"现在是可选的; 编辑2 :添加了文字; 编辑3 :添加了第二个解决方案; 编辑4 :添加了完整的解决方案; 修改5 :添加了标题; 编辑6 :已添加捕获功能; 编辑7 :最后一个答案分为两个版本;


edit 1: "cents" are now optional; edit 2: text added; edit 3: second solution added; edit 4: complete solution added; edit 5: headings added; edit 6: capturing added; edit 7: last answer broke in two versions;

这篇关于正则表达式解析国际浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆