如何查找具有开始和结束索引的字符串的所有子字符串 [英] How to find all substrings of a String with start and end indices

查看:133
本文介绍了如何查找具有开始和结束索引的字符串的所有子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近编写了一些Scala代码,它处理一个String,找到它的所有子字符串并保留在字典中找到的那些列表。整个字符串中的子字符串的开始和结尾也必须保留以备后用,所以最简单的方法是使用嵌套for循环,如下所示:

$ b $对于(j < - 0直到word.length)
($ <$ p $ ) val sub = word.substring(i,j + 1)
//在这里的字典中查找sub并添加新的匹配,如果找到
}

作为练习,我决定在Haskell中做同样的事情。看起来很简单,不需要子字符串索引 - 我可以使用诸如这种方法之类的东西来获取子字符串索引,字符串,然后调用递归函数来累积匹配。但如果我也想要索引,它似乎更棘手。



如何编写一个函数,它返回一个列表,其中包含每个连续的子字符串及其开始和结束索引例如令牌blah会给 [( b,0,0),(bl,0,1),(bla,0,2),...]



更新



有很多可供选择的答案和大量的新内容。在搞乱了一些之后,我已经提出了第一个答案,Daniel建议允许使用 [0 ..]

 数据令牌=令牌字符串Int Int 

continuousSubSeqs =过滤器(非null)。 concatMap尾巴。 inits

tokenize xs = map(\(s,l) - > Token s(head l)(last l))$ zip s ind
where s = continuousSubSeqs xs
ind = continuousSubSeqs [0 ..]

这看起来相对容易理解,因为我有限的Haskell知识。

解决方案

import Data.List

continuousSubSeqs = filter不是。null)。 concatMap inits。 tails

tokens xs = map(\(s,l) - >(s,head l,last l))$ zip s ind
where s = continuousSubSeqs xs
ind = continuousSubSeqs [0..length(xs)-1]

像这样工作:

 令牌blah
[(b,0,0),(bl,0,1) ( BLA,0,2),( 等等,0,3),( L,1,1),( LA,1,2),( LAH,1,3) ,(a,2,2),(ah,2,3),(h,3,3)]


I've recently written some Scala code which processes a String, finding all its sub-strings and retaining a list of those which are found in a dictionary. The start and end of the sub-strings within the overall string also have to be retained for later use, so the easiest way to do this seemed to be just to use nested for loops, something like this:

for (i <- 0 until word.length)
  for (j <- i until word.length) {
    val sub = word.substring(i, j + 1)
    // lookup sub in dictionary here and add new match if found
  }

As an exercise, I decided to have a go at doing the same thing in Haskell. It seems straightforward enough without the need for the sub-string indices - I can use something like this approach to get the sub-strings, then call a recursive function to accumulate the matches. But if I want the indices too it seems trickier.

How would I write a function which returns a list containing each continuous sub-string along with its start and end index within the "parent" string?

For example tokens "blah" would give [("b",0,0), ("bl",0,1), ("bla",0,2), ...]

Update

A great selection of answers and plenty of new things to explore. After messing about a bit, I've gone for the first answer, with Daniel's suggestion to allow the use of [0..].

data Token = Token String Int Int 

continuousSubSeqs = filter (not . null) . concatMap tails . inits

tokenize xs = map (\(s, l) -> Token s (head l) (last l)) $ zip s ind
    where s = continuousSubSeqs xs
          ind = continuousSubSeqs [0..]

This seemed relatively easy to understand, given my limited Haskell knowledge.

解决方案

import Data.List

continuousSubSeqs = filter (not . null) . concatMap inits . tails

tokens xs = map (\(s, l) -> (s, head l, last l)) $ zip s ind
    where s   = continuousSubSeqs xs
          ind = continuousSubSeqs [0..length(xs)-1]

Works like this:

tokens "blah"
[("b",0,0),("bl",0,1),("bla",0,2),("blah",0,3),("l",1,1),("la",1,2),("lah",1,3),("a",2,2),("ah",2,3),("h",3,3)]

这篇关于如何查找具有开始和结束索引的字符串的所有子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆