固定宽度字段的正则表达式 [英] Regex for fixed width field

查看:190
本文介绍了固定宽度字段的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将文件布局上的固定宽度字段与正则表达式匹配.该字段为数字/整数,始终包含四个字符,并且包含在0..1331的范围内.当数字小于1000时,字符串将填充为左零.因此,所有这些示例都是有效的:

I need to match a fixed width field on a file layout with a regular expression. The field is numeric/integer, always have four characters and is included in the range of 0..1331. When the number is smaller than 1000, the string is filled with left zeros. So all these examples are valid:

  • 0000
  • 0001
  • 0010
  • 1000
  • 1331

但是必须接受以下内容:

But the following must be not accepted:

  • 1
  • 01
  • 10
  • 100
  • 4759

如果我只能使用正则表达式来强制执行此限制,那就太好了.播放了一段时间后,我产生了表达式\0*[0-1331]\.问题在于它没有将大小限制为四个字符.当然我可以做\000[0-9]|00[10-99]|0[100-999]|[1000-1331]\,但是我拒绝使用这么讨厌的东西.谁能想到更好的方法?

It would be nice if I could enforce this restriction only with regex. After playing a bit, I yielded the expression \0*[0-1331]\. The problem is that it does not restrict the size to four characters. Of course I could do \000[0-9]|00[10-99]|0[100-999]|[1000-1331]\ but I refuse to use something so nasty. Can anyone think of a better way?

推荐答案

正则表达式不能解决每个单个问题.我的建议是做类似的事情:

Regular expression are not the answer to every single problem. My advice would be to do something like:

boolean isValidSomethingOrOther (string):
    if string.length() != 4:
        return false
    for each character in string:
        if not character.isNumeric():
            return false
    if string.toInt() > 1331:
        return false
    return true

如果您必须使用正则表达式,则您的解决方案没有任何问题,但是我可能会使用以下变体(仅基于我对RE引擎及其工作方式的理解):

If you must use a regex, there's nothing wrong with your solution but I'd probably use the following variant (just based on my understanding of RE engines and how they work):

^0[0-9]{3}|1[0-2][0-9]{2}|13[0-2][0-9]|133[01]$

  • 第一部分匹配0000-0999.
  • 第二场比赛1000-1299.
  • 第三场比赛是1300-1329.
  • 最后一场比赛是1330和1331.
    • The first section matches 0000-0999.
    • The second matches 1000-1299.
    • The third matches 1300-1329.
    • The final one matches 1330 and 1331.
    • 更新:

      仅就优雅评论而言,正则表达式就是其中一种形式的优雅.您还可以通过将验证抽象到一个单独的函数或宏,然后从您的代码中调用它来实现优雅:

      Just on the elegance comment, there are many forms of elegance of which regexes are one. You can also achieve elegance just by abstracting the validation out to a separate function or macro and then call it from your code:

      if isValidSomethingOrOther(str) ...
      

      其中,SomethingOrOther是具体的业务对象.这使您可以轻松更改有效对象的概念,甚至可以根据需要使用正则表达式或您认为适当的任何其他检查(例如上面的函数).

      where SomethingOrOther is a concrete business object. This allows you to change your idea of a valid object easily, even using a regex as you desire or any other checks you deem appropriate (such as my function above).

      这使您可以适应行中的所有更改,例如要求这些对象现在必须是质数.

      This allows you to cater for any changes down the line such as the requirement that these object now have to be prime numbers.

      我确定我可以编写一个"prime-number-less-than-小于1332"正则表达式.我同样确定我不会想要-我更愿意将其编码为一个函数(或用于原始速度的查找表),尤其是因为正则表达式很可能看起来像:

      I'm sure I could write a "prime-number-less-than-1332" regex. I'm equally sure I wouldn't want to - I'd prefer to code that up as a function (or lookup table for raw speed), especially since the regex would most likely just look like:

      ^2|3|5|7| ... |1327$
      

      无论如何.

      这篇关于固定宽度字段的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆