用于检查字符串是否由子字符串列表构建的算法 [英] Algorithm for checking if a string was built from a list of substrings
问题描述
将为您提供一个字符串和一个字符串数组。
You are given a string and an array of strings. How to quickly check, if this string can be built by concatenating some of the strings in the array?
这是一个理论上的问题,实用上我不需要它,如何快速检查是否可以通过将数组中的某些字符串串联来构建此字符串?原因。但是我想知道,是否有一些好的算法。
This is a theoretical question, I don't need it for practical reasons. But I would like to know, if there is some good algorithm for this.
EDIT
阅读我注意到的一些答案,这可能是NP完全问题。甚至找到一个字符串子集,它们的长度也一样,因为给定的字符串是一个经典的子集和问题。
EDIT Reading some answer I have noticed, that this is probably NP-Complete problem. Even finding a subset of strings, that will together have same length, as a given string is a classic subset sum problem.
所以我想对此没有简单的答案。
So I guess there is no easy answer to this.
编辑
现在看来,它不是NP-毕竟是完整的问题。这样比较酷:-)
Now it seems, that it is not a NP-Complete problem after all. That's way cooler :-)
编辑
我想出了一个通过一些测试的解决方案:
I have came up with a solution that passes some tests:
def can_build_from_substrings(string, substrings):
prefixes = [True] + [False] * (len(string) - 1)
while True:
old = list(prefixes)
for s in substrings:
for index, is_set in enumerate(prefixes):
if is_set and string[index:].startswith(s):
if string[index:] == s:
return True
prefixes[index + len(s)] = True
if old == prefixes: # nothing has changed in this iteration
return False
我相信时间是 O(n * m ^ 3)
,其中 n
是子字符串
和 m
的长度为 string
。您如何看待?
I believe the time is O(n * m^3)
, where n
is length of substrings
and m
is length of string
. What do you think?
推荐答案
注意:这里我假定您可以多次使用每个子字符串。您可以通过更改定义子问题的方式来概括解决方案以包括此限制。这将对空间以及预期的运行时间产生负面影响,但是问题仍然是多项式。
这是一个动态编程问题。 (还有一个很好的问题!)
This is a dynamic programming problem. (And a great question!)
让我们将 composable(S,W)
定义为true,如果字符串 S
可以使用子字符串列表 W
来写。
Let's define composable(S, W)
to be true if the string S
can be written using the list of substrings W
.
S
是可组合的,并且仅当以下条件:
S
is composable if and only if:
-
S
以W
中的子字符串w
开头。 -
w
之后的S
其余部分也是可组合的。
S
starts with a substringw
inW
.- The remainder of
S
afterw
is also composable.
让我们写一些伪代码:
COMPOSABLE(S, W):
return TRUE if S = "" # Base case
return memo[S] if memo[S]
memo[S] = false
for w in W:
length <- LENGTH(w)
start <- S[1..length]
rest <- S[length+1..-1]
if start = w AND COMPOSABLE(rest, W) :
memo[S] = true # Memoize
return memo[S]
该算法具有O(m * n)运行时间,假设子字符串的长度不相对于字符串本身的线性w / r / t,在这种情况下,运行时间为O(m * n ^ 2)(其中m是子字符串列表的大小,n是所讨论的字符串的长度)。它使用O(n)空间进行记忆。
This algorithm has O(m*n) runtime, assuming the length of the substrings is not linear w/r/t to the string itself, in which case runtime would be O(m*n^2) (where m is the size of the substring list and n is the length of the string in question). It uses O(n) space for memoization.
(注意,伪代码使用O(n ^ 2)空间,但是对记忆键进行哈希处理可以缓解这种情况。)
(N.B. as written the pseudocode uses O(n^2) space, but hashing the memoization keys would alleviate this.)
编辑
这是一个有效的Ruby实现:
Here is a working Ruby implementation:
def composable(str, words)
composable_aux(str, words, {})
end
def composable_aux(str, words, memo)
return true if str == "" # The base case
return memo[str] unless memo[str].nil? # Return the answer if we already know it
memo[str] = false # Assume the answer is `false`
words.each do |word| # For each word in the list:
length = word.length
start = str[0..length-1]
rest = str[length..-1]
# If the test string starts with this word,
# and the remaining part of the test string
# is also composable, the answer is true.
if start == word and composable_aux(rest, words, memo)
memo[str] = true # Mark the answer as true
end
end
memo[str] # Return the answer
end
这篇关于用于检查字符串是否由子字符串列表构建的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!