使用字符串函数查找周期性字符串 [英] Finding periodic strings using string functions

查看:126
本文介绍了使用字符串函数查找周期性字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法,使用JavaScript检查字符串是否是周期性的。

I'm looking for a way to check if a string is periodic or not using JavaScript.

要匹配的样本字符串可以是 11223331122333 。然而, 10101 不应该匹配。

Sample string to match can be 11223331122333. Whereas, 10101 should not match.

来自python,我使用了RegEx

Coming from python, I used the RegEx

/(.+?)\1+$/

但它很慢。有没有可以解决这个问题的字符串方法?

But it is quite slow. Are there any string methods that can do the trick?

推荐答案

下面代码的想法是考虑原始字符串可以均匀划分的所有长度的子字符串,并检查是否重复原始字符串。一种简单的方法是检查长度从1到长度的平方根的所有除数。如果除法产生一个整数,它们也是除数,它也是一个互补的除数。例如,对于长度为100的字符串,除数为1,2,4,5,10,并且互补除数为100(子字符串长度无效,因为子字符串只出现一次),50,25,20(和10) ,我们已经找到了。)

The idea of the code below is to consider substrings of all lengths the original string can be divided into evenly, and to check whether they repeat across the original string. A simple method is to check all divisors of the length from 1 to the square root of the length. They are divisors if the division yields an integer, which is also a complementary divisor. E.g., for a string of length 100 the divisors are 1, 2, 4, 5, 10, and the complementary divisors are 100 (not useful as substring length because the substring would appear only once), 50, 25, 20 (and 10, which we already found).

function substr_repeats(str, sublen, subcount)
{
   for (var c = 0; c < sublen; c++) {
      var chr = str.charAt(c);
      for (var s = 1; s < subcount; s++) {
         if (chr != str.charAt(sublen * s + c)) {
            return false;
         }
      }
   }
   return true;
}

function is_periodic(str)
{
   var len = str.length;
   if (len < 2) {
      return false;
   }
   if (substr_repeats(str, 1, len)) {
      return true;
   }
   var sqrt_len = Math.sqrt(len);
   for (var n = 2; n <= sqrt_len; n++) { // n: candidate divisor
      var m = len / n; // m: candidate complementary divisor
      if (Math.floor(m) == m) {
         if (substr_repeats(str, m, n) || n != m && substr_repeats(str, n, m)) {
            return true;
         }
      }
   }
   return false;
}

不幸的是,没有String方法可以比较另一个字符串的子串到位(例如,在C语言中 strncmp(str1,str2 + offset,length))。

Unfortunately there is no String method for comparing to a substring of another string in place (e.g., in C language that would be strncmp(str1, str2 + offset, length)).

假设你的字符串长度为120,并且由长度为6的子字符串组成,重复20次。你也可以把它看作是由子力长度(子串的长度)12重复10次,子能力24次重复5次,子能力30次重复4次,或者是子能力60次重复2次(子力量由20的素数因子给出) (2 * 2 * 5)以不同的组合应用于6)。现在,如果你检查你的字符串是否包含重复的60次sublength,并且检查失败,你也可以确定它不包含任何sublength,这是一个除数(即,素因子的组合)60包括6.换句话说,上述代码进行的许多检查都是多余的。例如,在长度为120的情况下,上面的代码检查(幸运的是大部分时间都很快失败)以下的长度:1,2,3,4,5,6,8,10,12,15,20,24, 30,40,60(按此顺序:1,60,2,40,3,30,4,24,5,20,6,15,8,12,10)。其中,只需要以下内容:24,40,60。这些是2 * 2 * 2 * 3,2 * 2 * 2 * 5,2 * 2 * 3 * 5,即素数组合120( 2 * 2 * 2 * 3 * 5)每个(非重复)素数中的一个取出,或者,如果您愿意,120 / 5,120 / 3,120 / 2。因此,暂时忘记有效的素数因子化不是一项简单的任务,我们可以将重复子串的检查限制为子长度/ p的p子串,其中p是长度的主要因素。以下是最简单的非平凡实现:

Say your string has a length of 120, and consists of a substring of length 6 repeated 20 times. You can look at it also as consisting of a sublength (length of substring) 12 repeated 10 times, sublength 24 repeated 5 times, sublength 30 repeated 4 times, or sublength 60 repeated 2 times (the sublengths are given by the prime factors of 20 (2*2*5) applied in different combinations to 6). Now, if you check whether your string contains a sublength of 60 repeated 2 times, and the check fails, you can also be sure that it won't contain any sublength which is a divisor (i.e., a combination of prime factors) of 60, including 6. In other words, many checks made by the above code are redundant. E.g., in the case of length 120, the above code checks (luckily failing quickly most of the time) the following sublengths: 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60 (in this order: 1, 60, 2, 40, 3, 30, 4, 24, 5, 20, 6, 15, 8, 12, 10). Of these, only the following are necessary: 24, 40, 60. These are 2*2*2*3, 2*2*2*5, 2*2*3*5, i.e., the combinations of primes of 120 (2*2*2*3*5) with one of each (nonrepeating) prime taken out, or, if you prefer, 120/5, 120/3, 120/2. So, forgetting for a moment that efficient prime factorization is not a simple task, we can restrict our checks of repeating substrings to p substrings of sublength length/p, where p is a prime factor of length. The following is the simplest nontrivial implementation:

function substr_repeats(str, sublen, subcount) { see above }

function distinct_primes(n)
{
   var primes = n % 2 ? [] : [2];
   while (n % 2 == 0) {
      n /= 2;
   }
   for (var p = 3; p * p <= n; p += 2) {
      if (n % p == 0) {
         primes.push(p);
         n /= p;
         while (n % p == 0) {
            n /= p;
         }
      }
   }
   if (n > 1) {
      primes.push(n);
   }
   return primes;
}

function is_periodic(str)
{
   var len = str.length;
   var primes = distinct_primes(len);
   for (var i = primes.length - 1; i >= 0; i--) {
      var sublen = len / primes[i];
      if (substr_repeats(str, sublen, len / sublen)) {
         return true;
      }
   }
   return false;
}






试用此代码我的Linux PC让我大吃一惊:在Firefox上它比第一个版本要快得多,但在Chromium上它更慢,只有长度为数千的字符串变得更快。最后我发现问题与 distinct_primes()创建并传递给 is_periodic()的数组有关。 。解决方案是通过合并这两个函数来摆脱数组。代码如下,测试结果在 http://jsperf.com/periodic-strings-1 / 5


Trying out this code on my Linux PC I had a surprise: on Firefox it was much faster than the first version, but on Chromium it was slower, becoming faster only for strings with lengths in the thousands. At last I found out that the problem was related to the array that distinct_primes() creates and passes to is_periodic(). The solution was to get rid of the array by merging these two functions. The code is below and the test results are on http://jsperf.com/periodic-strings-1/5

function substr_repeats(str, sublen, subcount) { see at top }

function is_periodic(str)
{
   var len = str.length;
   var n = len;
   if (n % 2 == 0) {
      n /= 2;
      if (substr_repeats(str, n, 2)) {
         return true;
      }
      while (n % 2 == 0) {
         n /= 2;
      }
   }
   for (var p = 3; p * p <= n; p += 2) {
      if (n % p == 0) {
         if (substr_repeats(str, len / p, p)) {
            return true;
         }
         n /= p;
         while (n % p == 0) {
            n /= p;
         }
      }
   }
   if (n > 1) {
      if (substr_repeats(str, len / n, n)) {
         return true;
      }
   }
   return false;
}

请记住 jsperf.org 是绝对的,不同机器的不同实验者将为不同的渠道组合做出贡献。如果要可靠地比较两个JavaScript引擎,则需要编辑实验的新私有版本。

Please remember that the timings collected by jsperf.org are absolute, and that different experimenters with different machines will contribute to different combinations of channels. You need to edit a new private version of the experiment if you want to reliably compare two JavaScript engines.

这篇关于使用字符串函数查找周期性字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆