codingBat 使用正则表达式(和单元测试方法)将数千个分隔开 [英] codingBat separateThousands using regex (and unit testing how-to)
问题描述
本题结合了正则表达式练习和单元测试练习.
This question is a combination of regex practice and unit testing practice.
我编写了这个问题 separateThousands
用于个人练习:
I authored this problem separateThousands
for personal practice:
给定一个字符串形式的数字,引入逗号来分隔千位.该数字可能包含一个可选的减号和一个可选的小数部分.不会有任何多余的前导零.
Given a number as a string, introduce commas to separate thousands. The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.
这是我的解决方案:
String separateThousands(String s) {
return s.replaceAll(
String.format("(?:%s)|(?:%s)",
"(?<=\\G\\d{3})(?=\\d)",
"(?<=^-?\\d{1,3})(?=(?:\\d{3})+(?!\\d))"
),
","
);
}
它的工作方式是对两种类型的逗号进行分类,first 和 rest.在上面的正则表达式中,rest 子模式实际上出现在 first 之前.匹配项将始终为零长度,即 replaceAll
与 ","
.
The way it works is that it classifies two types of commas, the first, and the rest. In the above regex, the rest subpattern actually appears before the first. A match will always be zero-length, which will be replaceAll
with ","
.
rest 基本上是向后看是否有匹配后跟 3 个数字,然后向前看是否有数字.这是上一场比赛触发的某种连锁反应机制.
The rest basically looks behind to see if there was a match followed by 3 digits, and looks ahead to see if there's a digit. It's some sort of a chain reaction mechanism triggered by the previous match.
first 基本上是在后面寻找 ^
锚点,后跟一个可选的减号,以及 1 到 3 位数字.从该点开始的字符串的其余部分必须匹配数字的三元组,后跟一个非数字(可以是 $
或 \.
).
The first basically looks behind for ^
anchor, followed by an optional minus sign, and between 1 to 3 digits. The rest of the string from that point must match triplets of digits, followed by a nondigit (which could either be $
or \.
).
我对这部分的问题是:
- 这个正则表达式可以简化吗?
- 能否进一步优化?
- 在first之前订购rest是故意的,因为first只需要一次
- 没有捕获组
- Can this regex be simplified?
- Can it be optimized further?
- Ordering rest before first is deliberate, since first is only needed once
- No capturing group
正如我所提到的,我是这个问题的作者,所以我也是负责为他们提出测试用例的人.他们在这里:
As I've mentioned, I'm the author of this problem, so I'm also the one responsible for coming up with testcases for them. Here they are:
INPUT, OUTPUT "1000", "1,000" "-12345", "-12,345" "-1234567890.1234567890", "-1,234,567,890.1234567890" "123.456", "123.456" ".666666", ".666666" "0", "0" "123456789", "123,456,789" "1234.5678", "1,234.5678" "-55555.55555", "-55,555.55555" "0.123456789", "0.123456789" "123456.789", "123,456.789"
我在工业强度单元测试方面没有太多经验,所以我想知道其他人是否可以评论这是否是一个很好的覆盖范围,我是否遗漏了任何重要的东西,等等(我总是可以添加更多测试,如果我错过了一个场景).
I haven't had much experience with industrial-strength unit testing, so I'm wondering if others can comment whether this is a good coverage, whether I've missed anything important, etc (I can always add more tests if there's a scenario I've missed).
推荐答案
这对我有用:
return s.replaceAll("(\\G-?\\d{1,3})(?=(?:\\d{3})++(?!\\d))", "$1,");
第一次通过,
\G
和^
作用相同,先行迫使\d{1,3}
消耗只需要尽可能多的字符,以将匹配位置保留在三位数边界处.之后,\d{1,3}
每次最多消耗三位数字,使用\G
将其锚定到上一场比赛的末尾.The first time through,
\G
acts the same as^
, and the lookahead forces\d{1,3}
to consume only as many characters as necessary to leave the match position at a three-digit boundary. After that,\d{1,3}
consumes the maximum three digits every time, with\G
to keep it anchored to the end of the previous match.至于你的单元测试,我只是在问题描述中明确说明输入将始终是有效数字,最多有一个小数点.
As for your unit tests, I would just make it clear in the problem description that the input will always be valid number, with at most one decimal point.
这篇关于codingBat 使用正则表达式(和单元测试方法)将数千个分隔开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!