如何检测数据集中的所有空列并删除它们? [英] How to detect all empty columns in a dataset and deletedrop them?

查看:18
本文介绍了如何检测数据集中的所有空列并删除它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如标题所示,我想删除所有空列变量(所有记录为空或等于null或"),以减少以后执行的时间成本.

As suggested in the title, I'd like to drop all empty columnsvariables(where all records are empty or equal null or ""), so as to reduce time cost in later execution.

详细场景:

我有一个包含 1000 列的 dataset(),其中一些是空的.现在我想创建一个新数据集,我需要在之前数据集的某些条件下添加列.

I have a dataset() with 1000 columns, somelots of which are empty. Now I want to create a new dataset in which I need to add columns under some conditions of previous dataset.

data new;

   set old;

   if oldcol1 ne "" then newcol1='<a>'||strip(oldcol1)||'</a>';

   end;

   if oldcol2 ne "" then newcol2='<a>'||strip(oldcol2)||'</a>';

   end;

   ...

   ...;

   drop oldcol1 oldcol2.....oldcol1000;
   run;

由于以下原因,执行需要相当长的时间:

It takes quite some time to execute given the following reason:

  1. 旧列的数量很大

  1. number of old columns is huge

其实我需要在另一个数据集中做一个循环来设置oldcol之后的数字

in fact I need to do a loop in another dataset to set the number after oldcol

ColNumber

 1

 2

 3

...

1000

所以你可以想象在搜索、查找和设置值方面要执行多少次.

So you can imagine how many times to be executed in terms of searching, finding and setting values.

因此,我能想到的一种减少时间成本的方法是首先删除所有空列.但也非常欢迎任何有关优化算法的意见.

Hence one way I could think of to reduce time cost is drop all empty columns first. But any inputs regarding optimizing the algorithm is highly welcomed as well.

谢谢

推荐答案

这是一个通用宏,您可以使用它来生成源数据集中的空列列表,然后您可以将其传递给 drop 语句.它使用proc格式和proc freq,因此速度相对较快.

Here's a generic macro that you can use to generate a list of the empty columns in the source data set, which you can then pass to a drop statement. It uses proc format and proc freq so it is relatively fast.

%macro findmiss(ds,macvar);
%local noteopt;
%let noteopt=%sysfunc(getoption(notes));
option nonotes;
*ds is the data set to parse for missing values;
*macvar is the macro variable that will store the list of empty columns;
%global &macvar; 
proc format;
  value nmis  .-.z =' ' other='1';
  value $nmis ' '=' ' other='1';
run;
ods listing close;
ods output OneWayFreqs=OneValue(
  where=(frequency=cumfrequency 
  AND CumPercent=100));

proc freq data=&ds;
  table _All_ / Missing ;
  format _numeric_ nmis. 
        _character_ $nmis.;
  run;
ods listing;
data missing(keep=var);
  length var $32.;
  set OneValue end=eof;
    if percent eq 100 AND sum(of F_:) < 1 ;
    var = scan(Table,-1,' ');
run;
proc sql noprint;
  select var into: &macvar separated by " "
  from missing;quit;
option &noteopt.;
%mend;

以下是您可以如何使用它:

Here is how you might use it:

%findmiss(old,droplist); /*generate the list of empty columns */
data new;
  set old(drop=&droplist);
run;

这篇关于如何检测数据集中的所有空列并删除它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆