识别给定边的连通图 [英] identifying connected graphs given edges

查看:23
本文介绍了识别给定边的连通图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何对相关的人进行分组,甚至是间接的?具体来说,使用如下数据集的前两列,我如何在 SAS(可能使用 DATA 步或 PROC SQL)中以编程方式导出第三列?有非迭代算法吗?

How do I group people who are related, even indirectly? Concretely, using the first two columns of the data set like below, how do I in SAS (maybe using a DATA step or PROC SQL) programmatically derive the third column? Is there a non-iterative algorithm?

背景:每个人都有多个地址.通过每个地址,每个人都连接到零个或多个人.如果两个人已连接,他们将获得相同的组 ID.如果 A 直接连接到 B,B 连接到 C,则 A、B 和 C 共享一个组.

Background: Each person has multiple addresses. Through each address, each person is connected to zero or more persons. If two people are connected, they get the same group ID. If person A is directly connected to B and B is connected to C, then persons A, B, and C share a group.

<代码>数据人;输入person_id address_id $family_id;数据线;1 一个 12 乙 23 乙 24 C 35 C 35 天 36 天 3;

推荐答案

查找图的所有连通分量的一般方法是 广度优先搜索深度优先搜索一个>.SAS 不是实现此类算法的最佳工具,因为它们需要使用诸如队列之类的数据结构.

The general methods for finding all connected components of a graph are Breadth-First-Search or Depth-First-Search. SAS is not the best tool for implementing such algorithms since they require using such data structures as queues.

仍然可以使用哈希对象来完成.这是BF-search的代码.

Still it can be done with hash objects. Here's the code for BF-search.

data people;
  input person_id address_id $ household_id;
  datalines;
1 A 1
2 B 2
3 B 2
4 C 3
5 C 3
5 D 3
6 D 3
;
run;

创建邻接列表 - 所有具有共同地址的人.和空变量 cluster 稍后将使用组的 ID 填充:

Create adjacency list - all pairs of people with a common address. And empty variable cluster which will be populated later with groups' IDs:

proc sql;
  create table connections as
  select distinct a.person_id as person_id_a, b.person_id as person_id_b, . as cluster
  from people a
  inner join people b
  on a.address_id=b.address_id
;
quit;

BF 搜索本身如下:

data _null_;

为所有唯一的人(图的顶点)声明哈希对象及其迭代器:

Declare hash object and its iterator for all unique people (vertices of the graph):

  if 0 then set Connections;
  dcl hash V(dataset:'Connections', ordered:'y');
  V.defineKey('person_id_a');
  V.defineData('person_id_a','cluster');
  dcl hiter Vi('V');
  V.defineDone();

为所有连接(图的边缘)声明哈希对象:

Declare hash object for all connections (edges of the graph):

  dcl hash E(dataset:'Connections', multidata:'y');
  E.defineKey('person_id_a');
  E.defineData('person_id_a','person_id_b');
  E.defineDone();

为队列声明哈希对象及其迭代器:

Declare hash object and its iterator for the queue:

  dcl hash Q(ordered:'y');
  Q.defineKey('qnum','person_id_a');
  Q.defineData('qnum','person_id_a');
  dcl hiter Qi('Q');
  Q.defineDone();

最外层循环 - 当队列为空时,将没有分配集群的新人作为下一个集群的根:

The outermost loop - for taking a new person without assigned cluster to be a root of the next cluster, when the queue is empty:

  rc1=Vi.first();
  do while(rc1=0);
      if missing(cluster) then do;
          qnum=1; Q.add(); *qnum-number of the person in the queue, to ensure that new     people are added to the end of the queue.;
          n+1; cluster=n;
          V.replace();*assign cluster number to a person;

在下面的两个嵌套循环中,我们将队列中的第一个人出列,并在邻接列表中查找与此人相关的所有人.我们将每个找到的连接"添加到队列的末尾.处理完第一个人后,我们删除他/她并将下一个(现在成为第一个)出队.他们都将在同一个集群中.以此类推,直到队列为空.然后我们为一个新的集群选择一个新的根人.

In the following two nested loops we de-queue the first person in the queue and look for all people connected to this person in adjacency list. Every found 'connection' we add to the end of the queue. When done with the first person, we delete him/her and de-queue the next one (who became the first now). All of them will be in the same cluster. And so on, until the queue is empty. Then we take a new root person for a new cluster.

          rc2=Qi.first();
          do while(rc2=0);
              qnum=qnum+Q.num_items-1;
              rc3=E.find();
              do while(rc3=0);
                   person_id_a=person_id_b;
                   rc4=V.find();
                   if rc4=0 and missing(cluster) then do;
                       qnum+1; Q.add();
                       cluster=n;
                       V.replace();
                   end;
                   rc3=E.find_next();
              end;
              Qi.first();
              Qi.delete();
              Q.remove();
              Qi=_new_ hiter ('Q');
              rc2=Qi.first();
          end;
      end;
      rc1=Vi.next();
  end;

输出具有分配集群的人员列表.

Output list of people with assigned clusters.

  V.output(dataset:'clusters');
run;


proc sort data=clusters; by cluster; run;

这篇关于识别给定边的连通图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆