在sas中使用proc sql从仓库(oracle引擎)中提取数据时进行简单的随机采样 [英] simple random sampling while pulling data from warehouse(oracle engine) using proc sql in sas

查看:133
本文介绍了在sas中使用proc sql从仓库(oracle引擎)中提取数据时进行简单的随机采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要提取大量数据,例如从数据仓库中的不同表中提取600-700个变量...现在,原始格式的数据集将轻松达到150 gigs-79 MM行,出于分析目的,我需要只有一百万行...如何通过对行进行简单的随机采样直接使用proc sql从仓库中提取数据.

I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows.

以下代码将无法正常运行,因为oracle不支持ranuni

Below code wont work as ranuni is not supported by oracle

    proc sql outobs =1000000;
    select * from connection to oracle(
    select * from tbl1 order by ranuni(12345);
    quit;

您如何建议我这样做

推荐答案

发布的任何答案或评论都没有助我一臂之力,虽然可以,但是我们有87毫米的行

None of the answers posted or comments helped my cause, it could but we have 87 MM rows

现在,我想在sas的帮助下得到答案:这是我所做的:并且有效.谢谢大家!

Now I wanted the answer with the help of sas: here is what I did: and it works. Thanks all!

    libname dwh path username pwd;
    proc sql;
    create table sample as
    (select 
     <all the variables>, ranuni(any arbitrary seed)
     from dwh.<all the tables>
     <bunch of where conditions goes here>);
     quit);

这篇关于在sas中使用proc sql从仓库(oracle引擎)中提取数据时进行简单的随机采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆