使用正则表达式检查数字可除性 [英] Check number divisibility with regular expressions

查看:90
本文介绍了使用正则表达式检查数字可除性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个十进制数字N作为一串数字,我如何检查它是否可以被M 仅使用正则表达式整除,而不会转换为int?

Given a decimal number N as a string of digits, how do I check if it's divisible by M using regular expressions only, without converting to int?

M = 2、4、5、10很明显.对于M = 3,这里有一些有趣的见解: Regex过滤器数可被3整除

M=2, 4, 5, 10 are obvious. For M=3 some interesting insights here: Regex filter numbers divisible by 3

有人可以为M = 7、9、11、13等提供解决方案吗?通用的吗?

Can anyone provide a solution for M=7, 9, 11, 13 etc? A generic one?

测试代码(使用python,但可以使用任何语言):

Testing code (in python, but feel free to use any language):

M = your number, e.g. 2
R = your regexp, e.g., '^[0-9]*[02468]$'

import re
for i in range(1, 2000):
    m = re.match(R, str(i))
    if i % M:
        assert not m, '%d should not match' % i
    else:
        assert m, '%d must match' % i

对于那些好奇的人,下面是M=3的示例(假定引擎具有递归支持):

For those curious, here's an example for M=3 (assumes an engine with recursion support):

^
(
    | [0369]+ (?1)
    | [147] (?1) [258] (?1)
    | [258] (?1) [147] (?1)
    | ( [258] (?1) ) {3}
    | ( [147] (?1) ) {3}
)
$

更新:有关更多讨论和示例,请参见此线程.在那里张贴的表情原来是越野车(70 * N失败),但是如何到达"这一部分很有教育意义.

Upd: for more discussion and examples see this thread. The expression posted there turned out to be buggy (fails on 70*N), but "how to get there" part is very educative.

推荐答案

也许令人惊讶的结果是,这样的正则表达式始终存在.不足为奇的是,它通常没有用.

The perhaps-surprising result is that such a regular expression always exists. The much-less-surprising one is that it's usually not useful.

存在结果来自确定性有限自动机(DFA)和正则表达式之间的对应关系.因此,让我们制作一个DFA.用N表示模数(不必是质数),用B表示数值基,对于普通的十进制数,该数字为10.具有标记为0到N-1的N个状态的DFA.初始状态为0.DFA的符号为数字0到B-1.状态代表输入字符串左前缀的其余部分,除以N时将被解释为整数.边沿表示在右侧添加数字时的状态变化.可以说,这是状态图S(状态,数字)= B *状态+数字(模N).接受状态为0,因为余数为零表示可分割.因此,我们有一个DFA. DFA可以识别的语言与正则表达式可以识别的语言相同,因此存在.因此,尽管这很有趣,但是它并没有帮助,因为它并不能告诉您如何确定表达式.

The existence result comes from the correspondence between deterministic finite automata (DFA) and regular expressions. So let's make a DFA. Denote the modulus by N (it doesn't need to be prime) and denote the numerical base by B, which is 10 for ordinary decimal numbers. The DFA with N states labelled 0 through N-1. The initial state is 0. The symbols of the DFA are the digits 0 through B-1. The states represent the remainder of the left-prefix of the input string, interpreted as an integer, when divided by N. The edges represent the change of state when you add a digit to the right. Arithmetically, this is the state map S(state,digit) = B * state + digit (modulo N). The accepting state is 0, since a zero remainder indicates divisibility. So we have a DFA. The languages recognized by the DFA are the same as that recognized by regular expressions, so one exists. So while this is interesting, it's not helpful, since it doesn't tell you much about how to determine the expression.

如果您要使用通用算法,则可以在运行时轻松构建此类DFA,并通过直接计算填充其状态表.初始化只是运行时间为O(M * N)的一对嵌套循环.机器识别每个输入字符的时间是恒定的.这确实非常快,但是如果您真正需要的话,它就不会使用正则表达式库.

If you want a generic algorithm, it's easy to build such a DFA at run-time and populate its state table by direct computation. Initialization is just a pair of nested loops with run time O(M * N). Recognition with the machine is a constant time per input character. This is perfectly fast, but doesn't use a regexp library, if that's what you really need.

在获取实际的正则表达式时,我们需要查看费马小定理.根据定理,我们知道B ^(N-1)== 1(模N).例如,当N = 7且B = 10时,这意味着6位数字的每个块等效于0 ... 6范围内的某个数字.指数可以小于N-1;通常,这是 Euler拉伸函数的因数N.将块的大小称为D.有N D个数字块的正则表达式,每个表达式代表一个特定的等价类,以N为模.至多,这些表达式的长度为O(B ^ D),这是很大的.对于N = 7,这是一组正则表达式,长度为一百万个字符;我想这会破坏大多数正则表达式库.

In getting toward an actual regular expression, we need to look at Fermat's Little Theorem. From the theorem, we know that B^(N-1) == 1 (modulo N). For example, when N=7 and B=10, what this means is that every block of 6 digits is equivalent to some single digit in the range 0 .. 6 for the purpose of divisibility. The exponent can be smaller than N-1; in general it's a factor of the Euler totient function of N. Call the size of the block D. There are N regular expressions for blocks of D digits, each one representing a particular equivalence class of remainders modulo N. At most, these expressions have length O(B^D), which is large. For N=7 that's a set of regular expressions a million characters long; I would imagine that would break most regexp libraries.

这与示例代码中表达式的工作方式有关;表达式(?1)是等于0(mod 3)的匹配字符串.这适用于N = 3,因为10 ^ 1 == 1(mod 3),这意味着A0B == AB(mod 3).当指数大于1时,这会更复杂,但是原理是相同的. (请注意,示例代码使用的识别器不仅仅是正则表达式,严格来讲.)表达式[0369][147][258]是a中数字0、1和2的正则表达式.模3表达式.概括地说,您将以类似方式使用上述regexp-digits.

This relates to how the expression in the example code works; the expression (?1) is matching strings that are equal to 0 (mod 3). This works with N=3 since 10^1 == 1 (mod 3) which means that A0B == AB (mod 3). This is more complicated when the exponent is greater than 1, but the principle is the same. (Note that the example code uses a recognizer which is more than just regular expressions, strictly speaking.) The expressions [0369], [147], and [258] are the regular expressions for the digits 0, 1, and 2 in a modulo 3 expression. Generalizing, you would use the regexp-digits as above in an analogous manner.

我之所以没有提供代码,是因为(1)写这个代码要比回答这个问题花费更长的时间,并且(2)我真的怀疑它是否可以在任何已知的实现中执行.

I'm not providing code because (1) it would take longer to write than this answer has been, and (2) I really doubt it would execute in any known implementation.

这篇关于使用正则表达式检查数字可除性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆