如何查找具有开始和结束索引的字符串的所有子字符串 [英] How to find all substrings of a String with start and end indices
问题描述
我最近编写了一些Scala代码,它处理一个String,找到它的所有子字符串并保留在字典中找到的那些列表。整个字符串中的子字符串的开始和结尾也必须保留以备后用,所以最简单的方法是使用嵌套for循环,如下所示:
$ b $对于(j < - 0直到word.length)($ <$ p $
) val sub = word.substring(i,j + 1)
//在这里的字典中查找sub并添加新的匹配,如果找到
}
作为练习,我决定在Haskell中做同样的事情。看起来很简单,不需要子字符串索引 - 我可以使用诸如这种方法之类的东西来获取子字符串索引,字符串,然后调用递归函数来累积匹配。但如果我也想要索引,它似乎更棘手。
如何编写一个函数,它返回一个列表,其中包含每个连续的子字符串及其开始和结束索引例如令牌blah
会给 [( b,0,0),(bl,0,1),(bla,0,2),...]
更新
有很多可供选择的答案和大量的新内容。在搞乱了一些之后,我已经提出了第一个答案,Daniel建议允许使用 [0 ..]
。
数据令牌=令牌字符串Int Int
continuousSubSeqs =过滤器(非null)。 concatMap尾巴。 inits
tokenize xs = map(\(s,l) - > Token s(head l)(last l))$ zip s ind
where s = continuousSubSeqs xs
ind = continuousSubSeqs [0 ..]
这看起来相对容易理解,因为我有限的Haskell知识。
import Data.List
continuousSubSeqs = filter不是。null)。 concatMap inits。 tails
tokens xs = map(\(s,l) - >(s,head l,last l))$ zip s ind
where s = continuousSubSeqs xs
ind = continuousSubSeqs [0..length(xs)-1]
像这样工作:
令牌blah
[(b,0,0),(bl,0,1) ( BLA,0,2),( 等等,0,3),( L,1,1),( LA,1,2),( LAH,1,3) ,(a,2,2),(ah,2,3),(h,3,3)]
I've recently written some Scala code which processes a String, finding all its sub-strings and retaining a list of those which are found in a dictionary. The start and end of the sub-strings within the overall string also have to be retained for later use, so the easiest way to do this seemed to be just to use nested for loops, something like this:
for (i <- 0 until word.length)
for (j <- i until word.length) {
val sub = word.substring(i, j + 1)
// lookup sub in dictionary here and add new match if found
}
As an exercise, I decided to have a go at doing the same thing in Haskell. It seems straightforward enough without the need for the sub-string indices - I can use something like this approach to get the sub-strings, then call a recursive function to accumulate the matches. But if I want the indices too it seems trickier.
How would I write a function which returns a list containing each continuous sub-string along with its start and end index within the "parent" string?
For example tokens "blah"
would give [("b",0,0), ("bl",0,1), ("bla",0,2), ...]
Update
A great selection of answers and plenty of new things to explore. After messing about a bit, I've gone for the first answer, with Daniel's suggestion to allow the use of [0..]
.
data Token = Token String Int Int
continuousSubSeqs = filter (not . null) . concatMap tails . inits
tokenize xs = map (\(s, l) -> Token s (head l) (last l)) $ zip s ind
where s = continuousSubSeqs xs
ind = continuousSubSeqs [0..]
This seemed relatively easy to understand, given my limited Haskell knowledge.
import Data.List
continuousSubSeqs = filter (not . null) . concatMap inits . tails
tokens xs = map (\(s, l) -> (s, head l, last l)) $ zip s ind
where s = continuousSubSeqs xs
ind = continuousSubSeqs [0..length(xs)-1]
Works like this:
tokens "blah"
[("b",0,0),("bl",0,1),("bla",0,2),("blah",0,3),("l",1,1),("la",1,2),("lah",1,3),("a",2,2),("ah",2,3),("h",3,3)]
这篇关于如何查找具有开始和结束索引的字符串的所有子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!