在列表中查找模式 [英] Finding patterns in list
问题描述
我目前正在寻找一种在整数列表中查找模式的方法,但是我要使用的方法当然适用于字符串和其他具有不同元素的列表。现在,让我解释一下我要寻找的内容。
I am currently searching for a way to find patterns in a list of integers, but the method I am going to use would be applicable to strings and other lists with different elements of course. Now let me explain what I am looking for.
我想在整数列表中找到最长的重复模式。例如,
I want to find the longest repeating pattern in a list of integers. For example,
[1, 2, 3, 4, 1, 2, 3]
# This list would give 1, 2, 3
应该丢弃重叠模式。 (不确定)
Overlapping patterns should be discarded. ( Not certain )
[1, 1, 1, 1, 1]
# Should give 1, 1 Not 1, 1, 1, 1
以下是对我没有帮助的内容。
在列表中查找模式(不理解第一个答案的逻辑,很少解释。第二个答案只有在解决之前知道模式才解决问题。)
Finding patterns in a list (Did not understand the logic behind the first answer, very little explanation. And second answer solves the problem only if the pattern is known before solving.)
< a href = https://stackoverflow.com/questions/33928775/finding-integer-pattern-from-a-list>从列表中查找整数模式(给出了模式并希望出现次数。与我的问题不同。)
Finding integer pattern from a list (Pattern is given and number of occurrence is wanted. Different than my question.)
最长的普通子序列问题(大多数人都处理过这个问题,但是这不是我的问题。在寻找模式时,我需要连续的元素。但是
Longest common subsequence problem (Most people dealed with this problem however it is not close to mine. I need consecutive elements while searching for a pattern. However in this, seperate elements also counted as subsequences.)
这是我尝试过的内容。
def pattern(seq):
n = len(seq)
c = defaultdict(int) # Counts of each subsequence
for i in xrange(n):
for j in xrange(i + 1, min(n, n / 2 + i)):
# Used n / 2 because I figured if a pattern is being searched
# It cant be longer that the half of the list.
c[tuple(seq[i:j])] += 1
return c
如您所见,它将找到所有子列表并检查重复。我发现这种方法有些幼稚(效率低下),我需要一种更好的方法。请帮我。
As you see, It finds all the sublists and check for repeats. I found this approach a bit naive(and inefficient) and I am in need of a better way. Please help me. Thanks in advance.
注意1:该列表是预先确定的,但是由于我的算法失败,我只能在冻结之前检查列表的某些部分电脑。因此,我尝试查找的模式可能会比搜索列表的一半长得多,甚至会比搜索列表本身更长,因为我只搜索原始列表的一部分。我正在使用,我可以搜索原始列表的较大部分,因此可以更好地找到模式。 (如果有的话。)
Note1: The list is predetermined but because of my algorithms failure, I can only check some parts of the list before freezing the computer. So the pattern I am trying to find can very well be longer than the half of the search list, It can even be longer than the search list itself because I am searching only a part of the original list.If you present a better method than I am using, I can search a larger part of the original list so I will have a better chance at finding the pattern. (If there is one.)
注意2::如果您想自己进行测试,则这是列表的一部分。似乎确实存在一种模式,但是在使用可靠的代码测试它之前我无法确定。
示例列表
Note2: Here is a part of the list if you want to test it yourself. It really seems like that there is a pattern, but I cannot be sure before I test it with a reliable code. Sample List
注意3:我认为这是一个严重的数据挖掘问题。并会尝试学习如果您进行详细说明。感觉这比LCS更为重要,但是LCS更为流行:D我正在尝试寻找这种方法,就像科学家用来发现DNA模式的方法一样。
Note3: I approach this as a serious problem of data mining. And will try to learn if you make a long explanation. This feels like a much more important problem than LCS, however LCS is much more popular :D This method I am trying to find feels like the methods scientists use to find DNA patterns.
推荐答案
代码
忽略不重叠的要求,这是我使用的代码:
The Code
Ignoring the "no overlapping" requirement, here's the code I used:
def pattern(seq):
storage = {}
for length in range(1,len(seq)/2+1):
valid_strings = {}
for start in range(0,len(seq)-length+1):
valid_strings[start] = tuple(seq[start:start+length])
candidates = set(valid_strings.values())
if len(candidates) != len(valid_strings.values()):
print "Pattern found for " + str(length)
storage = valid_strings
else:
print "No pattern found for " + str(length)
return set(filter(lambda x: storage.values().count(x) > 1, storage.values()))
return storage
使用它,我发现8数据集中长度303的不同模式。该程序也运行得非常快。
Using that, I found 8 distinct patterns of length 303 in your dataset. The program ran pretty fast, too.
define patterns(sequence):
list_of_substrings = {}
for each valid length: ### i.e. lengths from 1 to half the list's length
generate a dictionary my_dict of all sub-lists of size length
if there are repeats:
list_of_substrings = my_dict
else:
return all repeated values in list_of_substrings
return list_of_substrings #### returns {} when there are no patterns
这篇关于在列表中查找模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!