找到一个重复序列号码的序列的末尾 [英] Finding a repeating sequence at the end of a sequence of numbers

查看:159
本文介绍了找到一个重复序列号码的序列的末尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是这样的:我有一个数字的大序列。我知道,后一些点,它变得周期 - 即,有k个编号在序列的开头,然后有M更多的数字,即重复的序列的其余部分。作为一个例子来说明这更加清晰,顺序可能是这样的:1,2,5,3,4,2,1,1,3,2,1,1,3,2,1,1,3 ,...],其中k是5,m是4,并且重复块然后〔2,1,1,3]。正如从这个例子清楚,我可以有重复位的更大的块里面,所以它并不能帮助只是寻找重复的第一个实例。

My problem is this: I have a large sequence of numbers. I know that, after some point, it becomes periodic - that is, there are k numbers at the beginning of the sequence, and then there are m more numbers that repeat for the rest of the sequence. As an example to make this more clear, the sequence might look like this: [1, 2, 5, 3, 4, 2, 1, 1, 3, 2, 1, 1, 3, 2, 1, 1, 3, ...], where k is 5 and m is 4, and the repeating block is then [2, 1, 1, 3]. As is clear from this example, I can have repeating bits inside of the larger block, so it doesn't help to just look for the first instances of repetition.

不过,我不知道什么是k或m是 - 我的目标是把序列[A_1,A_2,...,A_N]作为输入和输出序列[A_1,...,a_k,[ A_第(k + 1),...,A_(K + M)]] - 。基本上通过列出它的大部分作为重复块截断较长序列

However, I do not know what k or m are - my goal is to take the sequence [a_1, a_2, ... , a_n] as an input and output the sequence [a_1, ... , a_k, [a_(k+1), ... , a_(k+m)]] - basically truncating the longer sequence by listing the majority of it as a repeating block.

有没有一种有效的方法来做到这一点问题呢?此外,可能更困难,但更理想的计算 - 是有可能做到这一点,因为我产生问题的顺序,使我不得不产生一个最小量?我看了看这个网站其他类似的问题,但他们似乎都应付不开始的非重复比特序列,并常常将不必担心内部的重复。

Is there an efficient way to do this problem? Also, likely harder but more ideal computationally - is it possible to do this as I generate the sequence in question, so that I have to generate a minimal amount? I've looked at other, similar questions on this site, but they all seem to deal with sequences without the beginning non-repeating bit, and often without having to worry about internal repetition.

如果它有助于/将是有益的,我也可以进入,为什么我在看这一点,我会用它。

If it helps/would be useful, I can also get into why I am looking at this and what I will use it for.

谢谢!

编辑:首先,我应该提到,我如果输入序列重复块完全结束时结束不知道

EDITS: First, I should have mentioned that I do not know if the input sequence ends at exactly the end of a repeated block.

在现实世界中的问题,我在尝试工作正在写一个不错的,封闭形式EX pression的连分数扩展(CFE值)二次无理的(实际上,负CFE)。这是非常简单的生成部分商*这些CFE值以任何准确度 - 然而,在某些时候CFE的二次无理的尾巴变成一个重复的块。我需要的部分商工作,在此重复块。

The real-world problem that I am attempting to work on is writing a nice, closed-form expression for continued fraction expansions (CFEs) of quadratic irrationals (actually, the negative CFE). It is very simple to generate partial quotients* for these CFEs to any degree of accuracy - however, at some point the tail of the CFE for a quadratic irrational becomes a repeating block. I need to work with the partial quotients in this repeating block.

我目前的想法是这样的:也许我可以适应一些算法的建议,从正确的工作与这些序列中的一个工作。或者,也许有东西在,为什么二次无理是周期性的,这将有助于证明我明白他们为什么开始重复,这将帮助我想出一些简单的标准来检查。

My current thoughts are this: perhaps I can adapt some of the algorithms suggested that work from the right to work with one of these sequences. Alternatively, perhaps there is something in the proof of why quadratic irrationals are periodic that will help me see why they begin to repeat, which will help me come up with some easy criteria to check.

*如果我写一个连分数扩展为[A_0,A_1,...],我指的是A_I的作为部分商。

*If I am writing a continued fraction expansion as [a_0, a_1, ...], I refer to the a_i's as partial quotients.

一些背景信息可以在这里找到适合那些有兴趣: http://en.wikipedia.org/wiki / Periodic_continued_fraction

Some background info can be found here for those interested: http://en.wikipedia.org/wiki/Periodic_continued_fraction

推荐答案

您可以使用滚动哈希 达到线性时间复杂度和O(1)空间复杂度(我认为是这样,因为我不相信你可以有两个频率这是不是一个无限重复序列彼此的倍数)。

You can use a rolling hash to achieve linear time complexity and O(1) space complexity (I think this is the case, since I don't believe you can have an infinite repeating sequence with two frequencies which are not multiples of each other).

算法:你只要保持两个滚动哈希其扩展是这样的:

Algorithm: You just keep two rolling hashes which expand like this:

                       _______  _______  _______
                      /       \/       \/       \
...2038975623895769874883301010883301010883301010
                      .        .        .      ||
                      .        .        .    [][]
                      .        .        .  [ ][ ]
                      .        .        .[  ][  ]
                      .        .       [.  ][   ]
                      .        .     [  . ][    ]
                      .        .   [    .][     ]
                      .        . [      ][      ]
                      .        [       ][       ]

请在做这整个序列。第一遍将只检测重复重复2 * N次的n一些价值。但是,这不是我们的目标:我们在第一阶段的目标是检测所有可能的时期,这这样做。因为我们走的顺序执行此过程中,我们也跟踪所有相对黄金时期,我们将稍后需要检查:

Keep on doing this for the entire sequence. The first pass will only detect repetitions repeated 2*n times for some value of n. However that's not our goal: our goal in the first pass is to detect all possible periods, which this does. As we go along the sequence performing this process, we also keep track of all relatively prime periods we will need to later check:

periods = Set(int)
periodsToFurthestReach = Map(int -> int)

for hash1,hash2 in expandedPairOfRollingHashes(sequence):
    L = hash.length
    if hash1==hash2:
        if L is not a multiple of any period:
            periods.add(L)
            periodsToFurthestReach[L] = 2*L
        else L is a multiple of some periods:
            for all periods P for which L is a multiple:
                periodsToFurthestReach[P] = 2*L

在这个过程之后,我们有各个时期的名单,他们已经在多大程度上达到了。我们的答案很可能是一个与最远的范围,但我们检查所有其他时段进行重复(快的,因为我们知道我们正在检查的周期)。如果这是难以计算,我们可以通过修剪远段(其中停止重复),因为我们正在经历的列表,非常喜欢埃拉托色尼的筛子,通过保持我们接下来希望一段重复的优先级队列优化。

After this process, we have a list of all periods and how far they've reached. Our answer is probably the one with the furthest reach, but we check all other periods for repetition (fast because we know the periods we're checking for). If this is computationally difficult, we can optimize by pruning away periods (which stop repeating) as we're going through the list, very much like the sieve of Eratosthenes, by keeping a priority queue of when we next expect a period to repeat.

最后,我们仔细检查的结果,以确保没有哈希冲突(在不可能的,即使有,黑名单和重复)。

At the end, we double-check the result to make sure there was no hash collision (in unlikely even there is, blacklist and repeat).

下面我你的假定目标是最小化非重复的长度,而不是给它们可以被进一步分解的重复元件;您可以修改此算法来找到所有其他COM pressions,如果存在的话。

Here I assumed your goal was to minimize non-repeating-length, and not give a repeating element which can be further factored; you can modify this algorithm to find all other compressions, if they exist.

这篇关于找到一个重复序列号码的序列的末尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆