SAS Proc导入CSV和丢失的数据 [英] SAS Proc Import CSV and missing data

查看:190
本文介绍了SAS Proc导入CSV和丢失的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我试图在SAS中导入一些数据集并将它们连接起来,唯一的问题是加入它们后出现此错误-

So, I'm trying to import some datasets in SAS and join them, the only problem is that I get this error after joining them -

    proc import datafile='filepath/datasetA.csv'
    out = dataA
    dbms= csv
    replace;
    run;


    proc import datafile='filepath\datasetB.csv'
    out = dataB
    dbms= csv
    replace;
    run;



    /* combine them all into one dataset*/


    data DataC;
    set &dataA. &dataB;

    run;



    ERROR: Variable column_k has been defined as both character and numeric

有问题的列在我要加入的两个数据集中都看起来像这样-

The column in question looks something like this in both of the data sets that I'm trying to join -

+----------+
| column_k |
+----------+
| 0        |
| 1        |
| 5        |
| 4        |
| NA       |
| NA       |
| 4        |
| 3        |
| NA       |
+----------+

基本上,如果可能的话,我想将该列中的NA数据导入为丢失"?我需要整列保持数字,因为我打算对那列中的数据进行一些数学处理.

Basically, I would like to import the NA data in that column as 'missing', if that's possible? I need the entire column to remain numeric as I'm planning on doing some mathematical stuff with the data in that column further down the line.

感谢您的帮助!

推荐答案

如果您希望继续使用Proc IMPORT,则需要确保这些列的类型相同.在您的情况下,您知道column_k应该是数字,因此DATA步骤可以使用INPUT函数将字符值转换为数字.

If you wish to continue using Proc IMPORT then you will need to ensure the columns are like-typed. In your case you know column_k should be numeric, so a DATA step can convert the character values to numeric using the INPUT function.

proc import … out = dataA;
proc import … out = dataB;

data dataA;
  set dataA;
  _num = input(column_k, best12.);
  drop column_k;
  rename _num = column_k;
run;

data dataB;
  set dataB;
  _num = input(column_k, best12.);
  drop column_k;
  rename _num = column_k;
run;

data want;
  set dataA dataB;
run;

在更大的范围内,例如处理多年期进口的情况下,可能会出现列名数据类型不匹配的情况.

In a larger scope mismatched data types for a column name can occur in a scenario such as dealing with multi-year imports.

假设无法重新读取较旧的数据,而较新的数据具有不同的列类型.

Suppose the older data can't be re-read and the newer data has different column type.

对于需要数字值的情况,一种方法是使用宏来编写源代码,该源代码在必要时将指定的变量从字符转换为数字.

For the case of wanting numeric values, one approach is to have macro that writes source code that converts, if necessary, specified variables from character to numeric.

示例:

%enforce_num (perm.loans2015, age amount remaining, out=work.loans2015)
%enforce_num (perm.loans2016, age amount remaining, out=work.loans2016)
%enforce_num (perm.loans2017, age amount remaining, out=work.loans2017)

data loans_3yrs; 
  set work.loans2015-loans2017;
run;

回到更简单的情况:

proc import … out = dataA;
proc import … out = dataB;

%enforce_num(dataA, column_k)
%enforce_num(dataB, column_k)

data want;
  set dataA dataB;
run;

enforce_num是什么样的?它必须:

What would the macro enforce_num look like? It would have to:

  • 扫描输入数据集的元数据
  • 确定变量是否是指定的变量之一并且是字符类型
    • 编写源代码以将变量转换为数字
    • 保持原始变量顺序
    • scan the input data set meta data
    • determine if a variable is one of those specified and is character type
      • write source code to convert the variable to numeric
      • maintain original variable order
      %macro enforce_num(data, vars, out=&data);
      
        /*
         * Arguments:
         *   data - name of input data set
         *   vars - space separated list of variables that must be numeric, convert type if necessary
         *   out  - name of output data set, default same as input data set
         *
         * Output:
         *   - Unchanged data set if data and out are the same and no conversion needed
         *   - Changed data set if some columns in data need conversion to numeric
         *     - replaces data if out is same as data
         *     - replaces out if out is different then data
         *     - the column order of the changed data set will be the same as the original data set
         */
      
        %local dsid index index2 vars varname vartype varnames debug;
      
        %let index2 = 0;  %* number of variables determined to be requiring conversion;
        %let debug = 0;
      
        %if &debug %then %put NOTE: &SYSMACRONAME: data=%superq(data);
      
        %let dsid = %sysfunc(open(&data));
        %if &dsid %then %do;
          %do index = 1 %to %sysfunc(attrn(&dsid, nvars));
            %let varname = %sysfunc(varname(&dsid, &index));
      
            %let varnames = &varnames &varname;
      
            %if %sysfunc(indexw(&varname, &vars)) %then %do;
              %if C = %sysfunc(vartype(&dsid, &index)) %then %do;
                %* Data contains character variable requiring enforcement;
                %let index2 = %eval(&index2+1);
                %local convert&index2;
                %let convert&index2 = &varname;
      
                %let varnames = &varnames ___&index2 ;   %* Variables that will be converted will be named __<#> during conversion;
              %end;
            %end;
          %end;
          %let dsid = %sysfunc(close(&dsid));
        %end;
        %else
          %put %sysfunc(sysmsg());
      
        %*put NOTE: &=vars;
        %*put NOTE: &=varnames;
      
        %if &index2 = 0 %then %do;
          %* No columns need to be converted to numeric, copy to out if necessary;
          %if &data ne &out %then %do;
            data &out;
              set &data;
            run;
          %end;
          %return;
        %end;
      
        %* Some columns need to be converted to numeric;
        %* Ensure the converted column is at the same position (varnum) as in the original data set;
      
        data &out;
          retain &varnames;
      
          set &data;
      
          %do index = 1 %to &index2;
            ___&index = input(&&convert&index,?? best12.);
          %end;
      
          drop
            %do index = 1 %to &index2;
              &&convert&index
            %end;
          ;
      
          rename
            %do index = 1 %to &index2;
              ___&index = &&convert&index
            %end;
          ;
        run;
      
        %put NOTE: ------------------------------------------------;
        %put NOTE: &data has been subjected to numeric enforcement.;
        %put NOTE: ------------------------------------------------;
      %mend enforce_num;
      

      这篇关于SAS Proc导入CSV和丢失的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆