查找最长的重复字符串? [英] Find longest repeating strings?

查看:111
本文介绍了查找最长的重复字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些HTML / CSS / JavaScript的痛苦与长的类,ID,变量和函数名等,结合字符串习惯了个遍。我大概可以重命名或重组他们几个切的文字了一半。

I have some HTML/CSS/JavaScript with painfully long class, id, variable and function names and other, combined strings that get used over and over. I could probably rename or restructure a few of them and cut the text in half.

所以,我在找一个简单的算法在文本中的最长重复字符串报告。理想的情况下,就成了排序扭转由长度倍实例,以便突出到的是,如果在全球范围重命名,将产生最节省字符串。

So I'm looking for a simple algorithm that reports on the longest repeated strings in text. Ideally, it would reverse sort by length times instances, so as to highlight to strings that, if renamed globally, would yield the most savings.

这感觉就像我的东西可以在100行code,对此有一些优雅的,10行的递归正则表达式做痛苦。它还听起来像一个家庭作业的问题,但我向你保证,这不是。

This feels like something I could do painfully in 100 lines of code, for which there's some elegant, 10-line recursive regex. It also sounds like a homework problem, but I assure you it's not.

我在PHP中工作,但会喜欢看到在任何语言的东西。

I work in PHP, but would enjoy seeing something in any language.

注:我不是在寻找的HTML / CSS / JavaScript的微小本身。我喜欢有意义的文本,所以我想通过做手工,并权衡对易读性膨胀。

NOTE: I'm not looking for HTML/CSS/JavaScript minification per se. I like meaningful text, so I want to do it by hand, and weigh legibility against bloat.

推荐答案

这会发现所有重复的字符串:

This will find all repeated strings:

(?=((.+)(?:.*?\2)+))

使用与 preg_match_all 并选择最长的一个。

Use that with preg_match_all and select the longest one.

function len_cmp($match1,$match2) {
  return $match2[0] - $match1[0];
}

preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $text, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
  $match[0] = substr_count($match[1], $match[2]) * strlen($match[2]);
}

usort($matches, "len_cmp");

foreach ($matches as $match) {
  echo "($matches[2]) $matches[1]\n";
}

此方法可能比较慢,虽然,因为有可能是字符串重复很多。可以通过指定的最小长度,并且重复在模式的最低数量有所减少它

This method could be quite slow though, as there could be a LOT of strings repeating. You could reduce it somewhat by specifying a minimum length, and a minimum number of repetitions in the pattern.

(?=((.{3,})(?:.*?\2){2,}))

这将限制的字符数重复到至少三个,并且重复到三个(第一+ 2)的数量。

This will limit the number of characters repeating to at least three, and the number of repetitions to three (first + 2).

编辑:修改,允许重复的字符
编辑:更改排列顺序,以反映最佳匹配

Changed to allow characters between the repetitions.
Changed sorting order to reflect best match.

这篇关于查找最长的重复字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆