SAS Proc 导入 CSV 和缺失数据 [英] SAS Proc Import CSV and missing data

查看:17
本文介绍了SAS Proc 导入 CSV 和缺失数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我正在尝试在 SAS 中导入一些数据集并加入它们,唯一的问题是加入它们后出现此错误 -

So, I'm trying to import some datasets in SAS and join them, the only problem is that I get this error after joining them -

    proc import datafile='filepath/datasetA.csv'
    out = dataA
    dbms= csv
    replace;
    run;


    proc import datafile='filepathdatasetB.csv'
    out = dataB
    dbms= csv
    replace;
    run;



    /* combine them all into one dataset*/


    data DataC;
    set &dataA. &dataB;

    run;



    ERROR: Variable column_k has been defined as both character and numeric

在我尝试加入的两个数据集中,有问题的列看起来像这样 -

The column in question looks something like this in both of the data sets that I'm trying to join -

+----------+
| column_k |
+----------+
| 0        |
| 1        |
| 5        |
| 4        |
| NA       |
| NA       |
| 4        |
| 3        |
| NA       |
+----------+

基本上,如果可能的话,我想将该列中的 NA 数据导入为缺失"?我需要整个列保持数字,因为我正计划对该列中的数据进行一些数学运算.

Basically, I would like to import the NA data in that column as 'missing', if that's possible? I need the entire column to remain numeric as I'm planning on doing some mathematical stuff with the data in that column further down the line.

感谢您的帮助!

推荐答案

如果您希望继续使用 Proc IMPORT,那么您需要确保列的类型相同.在您的情况下,您知道 column_k 应该是数字,因此 DATA 步骤可以使用 INPUT 函数将字符值转换为数字.

If you wish to continue using Proc IMPORT then you will need to ensure the columns are like-typed. In your case you know column_k should be numeric, so a DATA step can convert the character values to numeric using the INPUT function.

proc import … out = dataA;
proc import … out = dataB;

data dataA;
  set dataA;
  _num = input(column_k, best12.);
  drop column_k;
  rename _num = column_k;
run;

data dataB;
  set dataB;
  _num = input(column_k, best12.);
  drop column_k;
  rename _num = column_k;
run;

data want;
  set dataA dataB;
run;

在较大范围内,列名的数据类型不匹配可能会出现在处理多年导入等场景中.

In a larger scope mismatched data types for a column name can occur in a scenario such as dealing with multi-year imports.

假设旧数据无法重新读取,新数据具有不同的列类型.

Suppose the older data can't be re-read and the newer data has different column type.

对于需要数值的情况,一种方法是使用宏编写源代码,在必要时将指定的变量从字符转换为数字.

For the case of wanting numeric values, one approach is to have macro that writes source code that converts, if necessary, specified variables from character to numeric.

例子:

%enforce_num (perm.loans2015, age amount remaining, out=work.loans2015)
%enforce_num (perm.loans2016, age amount remaining, out=work.loans2016)
%enforce_num (perm.loans2017, age amount remaining, out=work.loans2017)

data loans_3yrs; 
  set work.loans2015-loans2017;
run;

回到你更简单的例子:

proc import … out = dataA;
proc import … out = dataB;

%enforce_num(dataA, column_k)
%enforce_num(dataB, column_k)

data want;
  set dataA dataB;
run;

enforce_num 是什么样的?它必须:

What would the macro enforce_num look like? It would have to:

  • 扫描输入数据集元数据
  • 确定变量是否是指定变量之一并且是字符类型
    • 编写源代码将变量转换为数值
    • 保持原有的变量顺序
    %macro enforce_num(data, vars, out=&data);
    
      /*
       * Arguments:
       *   data - name of input data set
       *   vars - space separated list of variables that must be numeric, convert type if necessary
       *   out  - name of output data set, default same as input data set
       *
       * Output:
       *   - Unchanged data set if data and out are the same and no conversion needed
       *   - Changed data set if some columns in data need conversion to numeric
       *     - replaces data if out is same as data
       *     - replaces out if out is different then data
       *     - the column order of the changed data set will be the same as the original data set
       */
    
      %local dsid index index2 vars varname vartype varnames debug;
    
      %let index2 = 0;  %* number of variables determined to be requiring conversion;
      %let debug = 0;
    
      %if &debug %then %put NOTE: &SYSMACRONAME: data=%superq(data);
    
      %let dsid = %sysfunc(open(&data));
      %if &dsid %then %do;
        %do index = 1 %to %sysfunc(attrn(&dsid, nvars));
          %let varname = %sysfunc(varname(&dsid, &index));
    
          %let varnames = &varnames &varname;
    
          %if %sysfunc(indexw(&varname, &vars)) %then %do;
            %if C = %sysfunc(vartype(&dsid, &index)) %then %do;
              %* Data contains character variable requiring enforcement;
              %let index2 = %eval(&index2+1);
              %local convert&index2;
              %let convert&index2 = &varname;
    
              %let varnames = &varnames ___&index2 ;   %* Variables that will be converted will be named __<#> during conversion;
            %end;
          %end;
        %end;
        %let dsid = %sysfunc(close(&dsid));
      %end;
      %else
        %put %sysfunc(sysmsg());
    
      %*put NOTE: &=vars;
      %*put NOTE: &=varnames;
    
      %if &index2 = 0 %then %do;
        %* No columns need to be converted to numeric, copy to out if necessary;
        %if &data ne &out %then %do;
          data &out;
            set &data;
          run;
        %end;
        %return;
      %end;
    
      %* Some columns need to be converted to numeric;
      %* Ensure the converted column is at the same position (varnum) as in the original data set;
    
      data &out;
        retain &varnames;
    
        set &data;
    
        %do index = 1 %to &index2;
          ___&index = input(&&convert&index,?? best12.);
        %end;
    
        drop
          %do index = 1 %to &index2;
            &&convert&index
          %end;
        ;
    
        rename
          %do index = 1 %to &index2;
            ___&index = &&convert&index
          %end;
        ;
      run;
    
      %put NOTE: ------------------------------------------------;
      %put NOTE: &data has been subjected to numeric enforcement.;
      %put NOTE: ------------------------------------------------;
    %mend enforce_num;
    

    这篇关于SAS Proc 导入 CSV 和缺失数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆