将字符串组合成对(Matlab) [英] Combining pairs in a string (Matlab)

查看:112
本文介绍了将字符串组合成对(Matlab)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串:

 sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG'

如何将最后一对字符为1对的第二对组合为字符串?并且新字符串必须包含所有字符'A','B','C','D','E','F','G',这些字符都出现在sup_pairs字符串中. 预期输出应为:

How can I combine pairs which have the last character of 1 pair is the first character of the follow pairs into strings? And the new strings must contain all of the character 'A','B','C','D','E','F' , 'G', those characters are appeared in the sup_pairs string. The expected output should be:

 S1 = 'BAEFCGD' % because BA will be followed by AE in sup_pairs string, so we combine BAE, and so on...we continue the rule to generate S1

 S2 = 'DFCEABG'

如果我有AB,BC和BD,则生成的字符串应该都是:ABC和ABD. 如果对中有重复的字符,例如:AB BC CA CE.我们将跳过第二个A,得到ABCE.

If I have AB, BC and BD, the generated strings should be both : ABC and ABD . If there is any repeated character in the pairs like : AB BC CA CE . We will skip the second A , and we get ABCE .

推荐答案

与生活中的所有美好事物一样,这也是一个图形问题.每个字母都是一个节点,每对都是一个边.

This, like all good things in life, is a graph problem. Each letter is a node, and each pair is an edge.

首先,我们必须将成对的字符串转换为数字格式,以便我们可以将字母用作下标.我将使用A=2B=3,...,G=8:

First we must transform your string of pairs into a numeric format so we can use the letters as subscripts. I will use A=2, B=3, ..., G=8:

sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG';
p=strsplit(sup_pairs,' ');
m=cell2mat(p(:));
m=m-'?';
A=sparse(m(:,1),m(:,2),1);

稀疏矩阵A现在是表示我们的对的邻接矩阵(实际上,更像是邻接表).如果您查看A的完整矩阵,则它看起来像这样:

The sparse matrix A is now the adjacency matrix (actually, more like an adjacency list) representing our pairs. If you look at the full matrix of A, it looks like this:

>> full(A)
ans =
   0   0   0   0   0   0   0   0
   0   0   1   0   0   1   0   0
   0   1   0   0   0   0   0   1
   0   0   0   0   0   1   0   1
   0   1   0   0   0   0   1   0
   0   1   0   0   0   0   1   0
   0   0   0   1   0   0   0   0
   0   0   0   0   1   0   0   0

如您所见,边缘BA转换为下标(3,2)等于1.

As you can see, the edge BA, which translates to subscript (3,2) is equal to 1.

现在,您可以使用自己喜欢的深度优先搜索(DFS)实施从所选的起始节点遍历图形.从根到叶节点的每个路径代表一个有效的字符串.然后,您将路径转换回字母序列:

Now you can use your favorite implementation of Depth-first Search (DFS) to perform a traversal of the graph from your starting node of choice. Each path from the root to a leaf node represents a valid string. You then transform the path back into your letter sequence:

treepath=[3,2,6,7,4,8,5];
S1=char(treepath+'?');

Output:
S1 = BAEFCGD


这是DFS的递归实现,可以助您一臂之力.通常在MATLAB中,您不必担心不会达到递归深度的默认限制,但是您会在这里找到NP完整的哈密顿路径.如果您到达递归限制附近的任何地方,计算时间将非常庞大,以至于增加深度将是您的后顾之忧.


Here's a recursive implementation of DFS to get you going. Normally in MATLAB you have to worry about not hitting the default limitation on recursion depth, but you're finding Hamiltonian paths here, which is NP-complete. If you ever get anywhere near the recursion limit, the computation time will be so huge that increasing the depth will be the least of your worries.

function full_paths = dft_all(A, current_path)
   % A - adjacency matrix of graph
   % current_path - initially just the start node (root)
   % full_paths - cell array containing all paths from initial root to a leaf

   n = size(A, 1);   % number of nodes in graph 
   full_paths = cell(1,0);   % return cell array 

   unvisited_mask = ones(1, n);
   unvisited_mask(current_path) = 0;   % mask off already visited nodes (path)
   % multiply mask by array of nodes accessible from last node in path
   unvisited_nodes = find(A(current_path(end), :) .* unvisited_mask);

   % add restriction on length of paths to keep (numel == n)
   if isempty(unvisited_nodes) && (numel(current_path) == n)
      full_paths = {current_path};   % we've found a leaf node
      return;
   end

   % otherwise, still more nodes to search
   for node = unvisited_nodes
      new_path = dft_all(A, [current_path node]);   % add new node and search
      if ~isempty(new_path)   % if this produces a new path...
         full_paths = {full_paths{1,:}, new_path{1,:}};   % add it to output 
      end
   end

end

这是正常的深度优先遍历,除了是第15行中路径长度的附加条件:

This is a normal Depth-first traversal except for the added condition on the length of the path in line 15:

   if isempty(unvisited_nodes) && (numel(current_path) == n)

if条件的前半部分isempty(unvisited_nodes)是标准的.如果仅使用条件的这一部分,则将获得从起始节点到叶子的所有路径,而不管路径长度如何. (因此,将输出单元格数组.)下半部分(numel(current_path) == n)强制设置路径的长度.

The first half of the if condition, isempty(unvisited_nodes) is standard. If you only use this part of the condition you'll get all paths from the start node to a leaf, regardless of path length. (Hence the cell array output.) The second half, (numel(current_path) == n) enforces the length of the path.

我在这里采取了一条捷径,因为n是邻接矩阵中的节点数,在示例情况下为8,而不是7(字母表中的字符数).但是结点1没有出入任何边缘,因为我显然计划使用一种我从来不会告诉您的技巧.不必从每个节点开始运行DFS来获取所有路径,而是可以创建一个虚拟节点(在本例中为节点1),并创建一个从其到所有其他实际节点的边.然后,您只需在节点1上调用DFS一次即可获得所有路径.这是更新的邻接矩阵:

I took a shortcut here because n is the number of nodes in the adjacency matrix, which in the sample case is 8 rather than 7, the number of characters in your alphabet. But there are no edges into or out of node 1 because I was apparently planning on using a trick that I never got around to telling you about. Rather than run DFS starting from each of the nodes to get all of the paths, you can make a dummy node (in this case node 1) and create an edge from it to all of the other real nodes. Then you just call DFS once on node 1 and you get all the paths. Here's the updated adjacency matrix:

A =
   0   1   1   1   1   1   1   1
   0   0   1   0   0   1   0   0
   0   1   0   0   0   0   0   1
   0   0   0   0   0   1   0   1
   0   1   0   0   0   0   1   0
   0   1   0   0   0   0   1   0
   0   0   0   1   0   0   0   0
   0   0   0   0   1   0   0   0

如果不想使用此技巧,可以将条件更改为n-1,或更改邻接矩阵以不包括节点1.请注意,如果确实将节点1留在其中,则需要将其删除.从结果路径.

If you don't want to use this trick, you can change the condition to n-1, or change the adjacency matrix not to include node 1. Note that if you do leave node 1 in, you need to remove it from the resulting paths.

以下是使用更新后的矩阵的函数输出:

Here's the output of the function using the updated matrix:

>> dft_all(A, 1)
ans =
{
  [1,1] =

     1   2   3   8   5   7   4   6

  [1,2] =

     1   3   2   6   7   4   8   5

  [1,3] =

     1   3   8   5   2   6   7   4

  [1,4] =

     1   3   8   5   7   4   6   2

  [1,5] =

     1   4   6   2   3   8   5   7

  [1,6] =

     1   5   7   4   6   2   3   8

  [1,7] =

     1   6   2   3   8   5   7   4

  [1,8] =

     1   6   7   4   8   5   2   3

  [1,9] =

     1   7   4   6   2   3   8   5

  [1,10] =

     1   8   5   7   4   6   2   3

}

这篇关于将字符串组合成对(Matlab)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆