codingBat 使用正则表达式(和单元测试方法)将数千个分隔开 [英] codingBat separateThousands using regex (and unit testing how-to)

查看:26
本文介绍了codingBat 使用正则表达式(和单元测试方法)将数千个分隔开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本题结合了正则表达式练习和单元测试练习.

This question is a combination of regex practice and unit testing practice.

我编写了这个问题 separateThousands 用于个人练习:

I authored this problem separateThousands for personal practice:

给定一个字符串形式的数字,引入逗号来分隔千位.该数字可能包含一个可选的减号和一个可选的小数部分.不会有任何多余的前导零.

Given a number as a string, introduce commas to separate thousands. The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.

这是我的解决方案:

String separateThousands(String s) {
  return s.replaceAll(
      String.format("(?:%s)|(?:%s)",
        "(?<=\\G\\d{3})(?=\\d)",
        "(?<=^-?\\d{1,3})(?=(?:\\d{3})+(?!\\d))"
      ),
      ","
  );
}

它的工作方式是对两种类型的逗号进行分类,firstrest.在上面的正则表达式中,rest 子模式实际上出现在 first 之前.匹配项将始终为零长度,即 replaceAll",".

The way it works is that it classifies two types of commas, the first, and the rest. In the above regex, the rest subpattern actually appears before the first. A match will always be zero-length, which will be replaceAll with ",".

rest 基本上是向后看是否有匹配后跟 3 个数字,然后向前看是否有数字.这是上一场比赛触发的某种连锁反应机制.

The rest basically looks behind to see if there was a match followed by 3 digits, and looks ahead to see if there's a digit. It's some sort of a chain reaction mechanism triggered by the previous match.

first 基本上是在后面寻找 ^ 锚点,后跟一个可选的减号,以及 1 到 3 位数字.从该点开始的字符串的其余部分必须匹配数字的三元组,后跟一个非数字(可以是 $\.).

The first basically looks behind for ^ anchor, followed by an optional minus sign, and between 1 to 3 digits. The rest of the string from that point must match triplets of digits, followed by a nondigit (which could either be $ or \.).

我对这部分的问题是:

  • 这个正则表达式可以简化吗?
  • 能否进一步优化?
    • first之前订购rest是故意的,因为first只需要一次
    • 没有捕获组
    • Can this regex be simplified?
    • Can it be optimized further?
      • Ordering rest before first is deliberate, since first is only needed once
      • No capturing group

      正如我所提到的,我是这个问题的作者,所以我也是负责为他们提出测试用例的人.他们在这里:

      As I've mentioned, I'm the author of this problem, so I'm also the one responsible for coming up with testcases for them. Here they are:

      INPUT, OUTPUT
      "1000", "1,000"
      "-12345", "-12,345"
      "-1234567890.1234567890", "-1,234,567,890.1234567890"
      "123.456", "123.456"
      ".666666", ".666666"
      "0", "0"
      "123456789", "123,456,789"
      "1234.5678", "1,234.5678"
      "-55555.55555", "-55,555.55555"
      "0.123456789", "0.123456789"
      "123456.789", "123,456.789"
      

      我在工业强度单元测试方面没有太多经验,所以我想知道其他人是否可以评论这是否是一个很好的覆盖范围,我是否遗漏了任何重要的东西,等等(我总是可以添加更多测试,如果我错过了一个场景).

      I haven't had much experience with industrial-strength unit testing, so I'm wondering if others can comment whether this is a good coverage, whether I've missed anything important, etc (I can always add more tests if there's a scenario I've missed).

      推荐答案

      这对我有用:

      return s.replaceAll("(\\G-?\\d{1,3})(?=(?:\\d{3})++(?!\\d))", "$1,");
      

      第一次通过,\G^ 作用相同,先行迫使 \d{1,3} 消耗只需要尽可能多的字符,以将匹配位置保留在三位数边界处.之后, \d{1,3} 每次最多消耗三位数字,使用 \G 将其锚定到上一场比赛的末尾.

      The first time through, \G acts the same as ^, and the lookahead forces \d{1,3} to consume only as many characters as necessary to leave the match position at a three-digit boundary. After that, \d{1,3} consumes the maximum three digits every time, with \G to keep it anchored to the end of the previous match.

      至于你的单元测试,我只是在问题描述中明确说明输入将始终是有效数字,最多有一个小数点.

      As for your unit tests, I would just make it clear in the problem description that the input will always be valid number, with at most one decimal point.

      这篇关于codingBat 使用正则表达式(和单元测试方法)将数千个分隔开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆