循环滚动算法 [英] loop rolling algorithm

查看:278
本文介绍了循环滚动算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想出了这个词的循环滚动的自己,希望它 不重叠与现有术语。基本上我试图拿出一个 算法寻找环在打印文本。

I have come up with the term loop rolling myself with the hope that it does not overlap with an existing term. Basically I'm trying to come up with an algorithm to find loops in a printed text.

从简单的一些例子,以复杂的

Some examples from simple to complicated

由于:

a a a a a b c d

我想说的:

5x(a) b c d

或算法:

for 1 .. 5
    print a
end
print b
print c
print d

例2

由于:

a b a b a b a b c d

我想说的:

4x(a b) c d

或算法:

for 1 .. 4
    print a
    print b
end
print c
print d

例3

由于:

a b c d b c d b c d b c e

我想说的:

a 3x(b c d) b c e

或算法:

print a
for 1 .. 3
    print b
    print c
    print d
end
print b
print c
print d

这没有提醒,我知道任何算法的我。我觉得像一些 问题可能会模棱两可,但找到的解决方案之一是够我 现在。效率永远是欢迎的,但不是强制性的。我怎样才能做到这一点?

It didn't remind me of any algorithm that I know of. I feel like some of the problems can be ambiguous but finding one of the solutions is enough to me for now. Efficiency is always welcome but not mandatory. How can I do this?

修改

首先,感谢所有的讨论。我已经适应了的 LZW 的算法 从罗塞塔并运行它在我的 输入:

First of all, thanks for all the discussion. I have adapted an LZW algorithm from rosetta and ran it on my input:

abcdbcdbcdbcdef

这给了我:

a
b
c
d
8  => bc
10 => db
9  => cd
11 => bcd
e
f

在那里我有一本字典:

where I have a dictionary of:

a a
c c
b b
e e
d d
f f
8 bc
9 cd
10 db
11 bcd
12 dbc
13 cdb
14 bcde
15 ef
7 ab

这看起来很不错的COM pression但它并不完全是我想要的。我需要 更像是COM pression从我的例子算法的再presentation 这将有:

It looks good for compression but it's not quite what I wanted. What I need is more like compression in the algorithmic representation from my examples which would have:

  • 后续序列(如果一个序列被重复,就没有​​其他 序列之间)
  • 在没有字典,但只有回路
  • irreducable
  • 在最大序列的大小(这将算法的最小化 再presentation)
  • 和假设嵌套的循环被允许(相反的是我之前说 评论)
  • subsequent sequences (if a sequence is repeating, there would be no other sequence in between)
  • no dictionary but only loops
  • irreducable
  • with maximum sequence sizes (which would minimize the algorithmic representation)
  • and let's say nested loops are allowed (contrary to what I said before in the comment)

推荐答案

我开始一个算法,这给最大序列大小。虽然这将不总是最小化算法重新presentation,它可被用作一种近似算法。或者,它可以扩展到最优算法

I start with an algorithm, which gives maximum sequence sizes. Though it would not always minimize the algorithmic representation, it may be used as an approximation algorithm. Or it may be extended to optimal algorithm.

  1. 在开始带的后缀阵列为您的文字:/ /en.wikipedia.org/wiki/LCP_array相对=nofollow> LCP阵列。
  2. 排序LCP数组的索引数组,更大的LCP数组元素的索引是第一位的。这组一起重复相同长度的序列,并允许处理序列在贪婪方式,从最大序列尺寸开始。
  3. 提取后缀数组项,分组由LCP值(组我的意思是有选择的LCP值以及较大的LCP值的所有项的所有项),并在文中的位置进行排序。
  4. 在过滤掉与位置差异的条目不等于LCP。对于剩余的条目,获得长prefixes,等于LCP。这给了所有可能的序列中的文本。
  5. 添加顺序,排序起始位置,以有序的集合(例如,二叉搜索树)。序列相加出场顺序的排序LCP,所以较长序列首先加入。序列仅加入如果它们是独立的,或者如果是完全嵌套在另一个其中之一。相交的时间间隔会被忽略。例如,在 CABA CABA BAB AB 相交 CABA 并因此将其忽略。但在 cababa cababa babab AB 的一个实例被丢弃,2个实例是完全内部的大序列,和2个实例完全在它之外。
  6. 在最后,这个有序集合包含了所有的信息,需要生产的算法再presentation。
  1. Start with constructing Suffix array for your text along with LCP array.
  2. Sort an array of indexes of LCP array, indexes of larger elements of LCP array come first. This groups together repeating sequences of the same length and allows to process sequences in greedy manner, starting from maximum sequence sizes.
  3. Extract suffix array entries, grouped by LCP value (by group I mean all the entries with selected LCP value as well as all entries with larger LCP values), and sort them by position in the text.
  4. Filter out entries with positional difference not equal to LCP. For remaining entries, get prefixes of length, equal to LCP. This gives all possible sequences in the text.
  5. Add sequences, sorted by starting position, to ordered collection (for example, binary search tree). Sequences are added in order of appearance in sorted LCP, so longer sequences are added first. Sequences are added only if they are independent or if one of them is completely nested inside the other one. Intersecting intervals are ignored. For example, in caba caba bab sequence ab intersects with caba and so it is ignored. But in cababa cababa babab one instance of ab is dropped, 2 instances are completely inside larger sequence, and 2 instances are completely outside of it.
  6. At the end, this ordered collection contains all the information, needed to produce the algorithmic representation.

示例:

Text          ababcabab

Suffix array  ab abab ababcabab abcabab b bab babcabab bcabab cabab
LCP array       2    4         2       0 1   3        1      0

Sorted LCP            4 3 2 2 1 1 0 0
Positional difference 5 5 2 2 2 2 - -
Filtered LCP          - - 2 2 - - - -
Filtered prefixes   (ab ab) (ab ab)


素描的一种算法,产生最少的算法再presentation。

先说第4步previous算法。第五步应该进行修改。现在就不可能忽略交叉间隔,所以每一个序列被添加到集合。因为收藏现在包含相交的间隔,最好是实现它的一些先进的数据结构,例如,间隔树

Start with the first 4 steps of previous algorithm. Fifth step should be modified. Now it is not possible to ignore intersecting intervals, so every sequence is added to the collection. Since the collection now contains intersecting intervals, it is better to implement it as some advanced data structure, for example, Interval tree.

然后递归确定算法重新presentation长度为所有序列,包含任何嵌套序列,从最小的国家开始。当每一个序列进行评估,计算出最优的算法再presentation的全部文本。算法处理任何一个序列或整个文本使用动态规划:分配矩阵的列数,等于文本/序列长度和行数,等于算法重新presentation的长度;在做序遍历区间树,更新该矩阵所有序列,可能每个文本位置;当对于一些细胞一个以上的值是可能的,要么选择其中任何一个,或给予preference至更长或更短的子序列

Then recursively determine the length of algorithmic representation for all sequences, that contain any nested sequences, starting from the smallest ones. When every sequence is evaluated, compute optimal algorithmic representation for whole text. Algorithm for processing either a sequence or whole text uses dynamic programming: allocate a matrix with number of columns, equal to text/sequence length and number of rows, equal to the length of algorithmic representation; doing in-order traversal of interval tree, update this matrix with all sequences, possible for each text position; when more than one value for some cell is possible, either choose any of them, or give preference to longer or shorter sub-sequences.

这篇关于循环滚动算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆