使用 sas 查找数据集中所有可能的路径 [英] Finding all possible paths in a dataset using sas

查看:48
本文介绍了使用 sas 查找数据集中所有可能的路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较下面显示的数据集中的两列

I want to compare two columns in the dataset shown below

Pid       cid
1          2
2          3
2          5
3          6
4          8
8          9
9          4

然后产生如下结果

1 2 3 6
1 2 5
2 3 6
2 5
3 6
4 8 9 4
8 9 4
9 4

首先我们打印前两个值 1 和 2,在第一列中搜索 2,如果它存在,则从第 2 列打印其对应的值,即 3.在第 1 列中搜索 3,如果存在,则从第 2 列是 6

First we print the first two values 1 and 2, search for 2 in first column, if its present print its corresponding value from column 2, which is 3. Search for 3 in column 1, if present print the corresponding value from column 2 which is 6

如何使用 SAS 做到这一点?

How can this be done using SAS?

推荐答案

链接构成一个有向图,需要递归遍历路径.

The links comprise a directed graph and need recursion to traverse the paths.

在数据步骤中,父节点的多个子节点可以存储在哈希结构的哈希中,但是数据步骤中的递归非常笨拙(您必须在另一个哈希中手动维护自己的堆栈和局部变量)

In data step, the multiple children of a parent can be stored in a Hash of Hashes structure, but recursion in data step is quite awkward (you would have to manually maintain your own stack and local variables in yet another hash)

Proc DS2 中递归更加传统和明显,并且Package Hash 是可用的.但是,Package Hash 散列与数据步骤不同.数据值只允许是标量,所以Hash of Hashes 出来了:(.

In Proc DS2 recursion is far more traditional and obvious, and Package Hash is available. However, the Package Hash hashing is different than data step. The data values are only allowed to be scalars, so Hash of Hashes is out :(.

可以通过将散列设置为具有 multidata 来修复散列散列的缺失.使用模式 find 检索键(父)的每个数据(子),并使用 find_next 循环 has_next.

The lack of hash of hashes can be remediated by setting up the hash to have multidata. Each data (child) of a key (parent) are retrieved with the pattern find, and loop for has_next, with find_next.

DS2 中散列的另一个问题是它们对于 data 步骤必须是全局的,并且对于用于键和数据的任何主机变量也是如此.这使得递归期间变量管理变得棘手.作用域深度为 N 的代码不能依赖可以在作用域深度 N+1 处更改的全局变量.

Another issue with hashes in DS2 is that they must be global to the data step, and the same for any host variables used for keys and data. This makes for tricky management of variables during recursion. The code at scope depth N can not have any reliance on global variables that can get changed at scope depth N+1.

幸运的是,可以在任何范围内创建匿名哈希,并且它的引用在本地维护......但键和数据变量仍然必须是全局的;所以需要更加小心.

Fortunately, an anonymous hash can be created in any scope and it's reference is maintained locally... but the key and data variables must still be global; so more careful attention is needed.

匿名散列用于存储通过键检索到的多数据;这是必要的,因为递归会影响 has_next get_next 操作.

The anonymous hash is used to store the multidata retrieved by a key; this is necessary because recursion would affect the has_next get_next operation.

示例代码.需要一个 rownum 变量,以防止在允许孩子作为前一行的父母时发生的循环.

Sample code. Requires a rownum variable to prevent cycling that would occur when a child is allowed to act as a parent in a prior row.

data have; rownum + 1;input
Pid       cid;datalines;
1          2
2          3
2          5
3          6
4          8
5          12
6          2
8          9
9          4
12         1
12         2
12         14
13         15
14         20
14         21
14         21
15         1
run;

proc delete data=paths;
proc delete data=rows;

%let trace=;

proc ds2 libs=work;
data _null_ ;
  declare double rownum pid cid id step pathid;
  declare int hIndex;

  declare package hash rows();
  declare package hash links();
  declare package hash path();
  declare package hash paths();

  method leaf(int _rootRow, int _step);
    declare double _idLast _idLeaf;

&trace. put ' ';
&trace. put 'LEAF';
&trace. put ' ';
    * no children, at a leaf -- output path;
    rownum = _rootRow;
    if _step < 2 then return;

    * check if same as last one;

    do step = 0 to _step;
      paths.find();  _idLast = id;
      path.find();   _idLeaf = id;
      if _idLast ne _idLeaf then leave;
    end;

    if _idLast = _idLeaf then return;

    pathid + 1;

    do step = 0 to _step;
      path.find();
      paths.add();
    end;
  end;

  method saveStep(int _step, int _id);
&trace. put 'PATH UPDATE' _step ',' _id '               <-------';
    step = _step;
    id = _id;
    path.replace();
  end;

  method descend(int _rootRow, int _fromRow, int _id, int _step);
    declare package hash h;
    declare double _hIndex;
    declare varchar(20) p;

    if _step > 10 then return;

    p = repeat (' ', _step-1);
&trace. put p 'DESCEND:' _rootRow= _fromRow= _id= _step=;

    * given _id as parent, track in path and descend by child(ren);

    * find links to children;
    pid = _id;
&trace. put p 'PARENT KEY:' pid=;
    if links.find() ne 0 then do;
&trace. put p 'NO KEY';
      saveStep(_step, _id);
      leaf(_rootRow, _step);
      return; 
    end;

    * convert multidata to hash, emulating hash of hash;
    * if not, has_next / find_next multidata traversal would be
    * corrupted by a find in the recursive use of descent;

        * new hash reference in local variable;
        h = _new_ hash ([hindex], [cid rownum], 0,'','ascending');

        hIndex = 1;

&trace. put p 'CHILD' hIndex= cid= rownum=;
        if rownum > _fromRow then h.add();

        do while (links.has_next() = 0);
          hIndex + 1;
          links.find_next();

&trace. put p 'CHILD' hIndex= cid= rownum=;
          if rownum > _fromRow then h.add();
        end;

    if h.num_items = 0 then do;
      * no eligble (forward rowed) children links;
&trace. put p 'NO FORWARD CHILDREN';
      leaf(_rootRow, _step-1);
      return;
    end;

    * update data for path step;
    saveStep (_step, _id);

    * traverse hash that was from multidata;
    * locally instantiated hash is protected from meddling outside current scope;
    * hIndex is local variable;
    do _hIndex = 1 to hIndex;
      hIndex = _hIndex;
      h.find();

&trace. put p 'TRAVERSE:' hIndex= cid= rownum= ;

      descend(_rootRow, rownum, cid, _step+1);
    end;

&trace. put p 'TRAVERSE DONE:' _step=;
  end;

  method init(); 
    declare int index;

    * data keyed by rownum;
    rows.keys([rownum]);
    rows.data([rownum pid cid]);
    rows.ordered('A');
    rows.defineDone();

    * multidata keyed by pid;
    links.keys([pid]);
    links.data([cid rownum]);
    links.multidata('yes');
    links.defineDone();

    * recursively discovered ids of path;
    path.keys([step]);
    path.data([step id]);
    path.ordered('A');
    path.defineDone();

    * paths discovered;
    paths.keys([pathid step]);
    paths.data([pathid step id]);
    paths.ordered('A');
    paths.defineDone();
  end;

  method run();
    set have;
    rows.add();
    links.add();
  end;

  method term();
    declare package hiter rowsiter('rows');
    declare int n;

    do while (rowsiter.next() = 0);
      step = 0;
      saveStep (step, pid);
      descend (rownum, rownum, cid, step+1);
    end;

    paths.output('paths');
    rows.output('rows');
  end;
run;
quit;

proc transpose data=paths prefix=ID_ out=paths_across(drop=_name_);
  by pathid;
  id step;
  var id;
  format id_: 4.;
run;

这篇关于使用 sas 查找数据集中所有可能的路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆