如何计算与模式匹配的可能单词子序列? [英] How to calculate possible word subsequences matching a pattern?

查看：90 发布时间：2020/5/6 14:40:37 matlab word sequence distance

本文介绍了如何计算与模式匹配的可能单词子序列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个序列:

    Seq = 'hello my name'

和一个字符串:

    Str = 'hello hello my friend, my awesome name is John, oh my god!'

然后我在字符串中寻找我的序列的匹配项，因此我获得了单元格数组中该序列的每个单词的每个匹配项的单词"索引，因此第一个元素是一个包含与'匹配项的单元格你好"，第二个元素包含"my"的匹配项，第三个元素包含"name"的匹配项.

And then I look for matches for my sequence within the string, so I get the "word" index of each match for each word of the sequence in a cell array, so the first element is a cell containing the matches for 'hello', the second element contains the matches for 'my' and the third for 'name'.

    Match = {[1 2];      %'hello' matches
             [3 5 11];   %'my' matches
             [7]}        %'name' matches

我需要代码才能以某种方式获得答案，说明可能的子序列匹配为:

I need code to somehow get an answer saying that possible sub-sequence matches are:

    Answer = [1 3 7;     %[hello my name]
              1 5 7;     %[hello my name]
              2 3 7;     %[hello my name]
              2 5 7;]    %[hello my name]

以答案"包含所有可能的有序序列的方式(这就是为什么my(word 11)永远不会出现在答案"中的原因，位置11之后必须有一个名称"匹配项.

In such a way that "Answer" contains all possible ordered sequences (that's why my(word 11) never appears in "Answer", there would have to be a "name" match after position 11.

注意:"Seq"的长度和匹配项可能会有所不同.

NOTE: The length and number of matches of "Seq" may vary.

推荐答案

由于Matches的长度可能有所不同，因此您需要使用其他答案).然后使用 diff 和逻辑索引:

Since the length of Matches may vary, you need to use comma-separated lists, together with ndgrid to generate all combinations (the approach is similar to that used in this other answer). Then filter out combinations where the indices are not increasing, using diff and logical indexing:

cc = cell(1,numel(Match)); %// pre-shape to be used for ndgrid output
[cc{end:-1:1}] = ndgrid(Match{end:-1:1}); %// output is a comma-separated list
cc = cellfun(@(v) v(:), cc, 'uni', 0) %// linearize each cell
combs = [cc{:}]; %// concatenate into a matrix
ind = all(diff(combs.')>0); %'// index of wanted combinations
combs = combs(ind,:); %// remove unwanted combinations

所需的结果在变量combs中.在您的示例中，

The desired result is in the variable combs. In your example,

combs =
     1     3     7
     1     5     7
     2     3     7
     2     5     7

这篇关于如何计算与模式匹配的可能单词子序列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算与模式匹配的可能单词子序列? [英] How to calculate possible word subsequences matching a pattern?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何计算与模式匹配的可能单词子序列? [英] How to calculate possible word subsequences matching a pattern?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭