SAS中的条件概率表 [英] Conditional Probability Table in SAS

查看:120
本文介绍了SAS中的条件概率表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SAS工作,试图创建条件概率表.

I am working in SAS trying to create a conditional probability table.

表的当前结构为:5列x 10行->每个单元格中的值均为二进制. 当前数据表

The current structure of the table is: 5 columns x 10 rows --> the value in each cell is binary. Current Data Table

col1    col2    col3    col4    col5
1   0   1   0   0
0   0   0   1   1
0   0   0   0   0
1   0   0   0   0
1   0   0   0   1
0   1   0   0   0
0   1   0   1   0
1   1   1   1   0
1   0   1   0   1
1   0   1   0   0

我想创建一个表,该表具有每列与每列之间的条件概率. 理想的输出

I would like to create a table with the conditional probability for every column vs every other column. Ideal Output

--- col1    col2    col3    col4    col5
col1    1.0 0.3 1.0 0.3 0.7
col2    0.2 1.0 0.3 0.7 0.0
col3    0.7 0.3 1.0 0.3 0.3
col4    0.2 0.7 0.3 1.0 0.3
col5    0.3 0.0 0.3 0.3 1.0

这是我正在处理的实际问题的简单得多的版本(100行和数百万列,因此,理想情况下,我将有一个可以根据表的大小进行调整的解决方案).

This is a much simpler version of the actual problem I am working on (100s of rows & millions of columns, so I'd ideally have a solution which could adjust based on the size of the table).

我一直在处理数组并执行循环,但是还不能走得很远.

I've been working with the array and do loop, but haven't been able to get very far.

我当前的代码如下(尚未完成):

My current code looks like this (not close to complete):

data ideal_output;
    set binary_table;
    array obs(10,5);
    array output(5,5);
    do i=1 to 5;
        do j=1 to 5;
            do k=1 to 10;
                do l=1 to 10;
        output(m,n) = sum(obs(k,i)*obs(l,j))/sum(obs(k,i));
    end;end;end;end;
run;

推荐答案

您有正确的想法-棘手的部分是将所有变量加载到适当的数组中.如果您的完整数据集太大而无法容纳到内存中,则可能需要一次处理它的一个子集.

You have the right sort of idea - the tricky part is loading all your variables into the appropriate arrays. If your full dataset is too large to fit into memory you may need to process one subset of it at a time.

data have;
/*Set length 3 for binary vars to save a bit of memory later*/
length col1-col5 3;
input col1-col5;
cards;
1   0   1   0   0
0   0   0   1   1
0   0   0   0   0
1   0   0   0   0
1   0   0   0   1
0   1   0   0   0
0   1   0   1   0
1   1   1   1   0
1   0   1   0   1
1   0   1   0   0
;
run;

%let NCOLS = 5;
%let NOBS = 10;

data want;
    if 0 then set have;
    array obs[&NOBS,&NCOLS];
    array p[&NCOLS];
    array col[&NCOLS];

    /*Use a DOW-loop to populate the 2-d array*/
    do _n_ = 1 by 1 until (eof);
        set have end = eof;
        do i = 1 to &NCOLS;
            obs[_n_,i] = col[i];
        end;
    end;

    do i=1 to &NCOLS;
        do j=1 to &NCOLS;
            x = 0;
            y = 0;
            do k=1 to &NOBS;
                x + obs[k,i]*obs[k,j];
                y + obs[k,j];
            end;
            p[j] = x / y;
        end;
        output;
    end;
    keep p1-p5; 
run;

这篇关于SAS中的条件概率表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆