SAS:使用PROC IMPORT导入.xlsx时定义类型 [英] SAS: Define type when importing .xlsx with PROC IMPORT

查看:901
本文介绍了SAS:使用PROC IMPORT导入.xlsx时定义类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:在使用PROC IMPORT时如何定义从.xlsx文件导入的变量的变量类型?

Questions: How do I define the variable type of variables being imported from a .xlsx file when using PROC IMPORT?

我的工作

我正在使用SAS v9.4.据我所知,它是香草SAS.我没有SAS/ACCESS等.

I am using SAS v9.4. So far as I'm aware, it is vanilla SAS. I do not have SAS/ACCESS etc.

我的数据如下:

ID1        ID2  MONTH   YEAR    QTR VAR1    VAR2
ABC_1234   1    1       2010    1   869     3988
ABC_1235   12   2       2010    1   639     3144
ABC_1236   13   3       2010    2   698     3714
ABC_1237   45   4       2010    2   630     3213

我正在运行的过程是:

proc import out=rawdata
    datafile = "c:\rawdata.xlsx"
        dbms = xlsx replace;

    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;
run;

运行此步骤时,将获得以下日志输出:

When I run this step, I get the following log output:

错误:您正在尝试在数据集WORK.RAWDATA中使用字符格式$和数字变量ID2.

ERROR: You are trying to use the character format $ with the numeric variable ID2 in data set WORK.RAWDATA.

这似乎告诉我,SAS自动分配变量类型.我希望能够手动控制它.我找不到说明如何执行此操作的文档. INFORMAT,LENGTH和INPUT语句似乎不适用于PROC IMPORT.

What this seems to tell me is that SAS automatically assigns the variable type. I want to be able to control it manually. I cannot find documentation which explains how to do this. INFORMAT, LENGTH, and INPUT statements do not seem to work for PROC IMPORT.

我之所以使用PROC IMPORT,是因为它对.xlsx文件整体而言取得了最大的成功.我可以想到的两个可能的解决方案是:1)将.xlsx转换为.csv并在DATA步骤中使用INFILE,以及2)将数据作为数字输入并在后续步骤中将其转换为字符.我不喜欢第一个解决方案,因为它需要我手动处理数据,这是潜在的错误来源(例如删除了前导零).我不喜欢第二个,因为第二个可能会无意间引入错误(例如,前导零),并引入多余的工作.

I am using PROC IMPORT because it has yielded the greatest success with .xlsx files overall. Two possible solutions I can think of are 1) convert .xlsx to .csv and use INFILE in a DATA step and 2) bring the data in as numeric and convert it to character in a later step. I dislike the first solution because it requires me to manually manipulate the data, a potential source of error (such as leading zeros being removed). I dislike the second because it may unintentionally introduce errors (again, such as with leading zeros) and introduces extraneous work.

推荐答案

您可以尝试在Excel中将列类型设置为文本",以查看SAS是否会从中确定.值得一试.

You can try to set the columns type as "Text" in Excel to see if SAS will determine it from that. Worth a shot.

如果这不起作用,除非您使用PC Files Server,或者在同一SAS服务器上安装了相同位数的Excel以直接访问文件,否则您将需要使用单独的数据步骤来转换列.

If that doesn't work, unless you use PC Files Server, or have Excel of the same bitness installed on the same SAS server for direct access to the file, you will need to use a separate data step to convert the columns.

proc import 
    file = "c:\rawdata.xlsx"
    out=_rawdata(rename=(ID2 = _ID2) )
    dbms = xlsx replace;
run;

data rawdata;
    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;

    set _rawdata;

    ID2 = cats(_ID2);

    drop _:;
run;

如果您拥有SAS/对Excel的访问权限,则可以直接使用

If you do have SAS/Access to Excel, you can control these variables directly with the DBDSOPTS data set option. For example:

libname myxlsx Excel 'C:\rawdata.xlsx';

data rawdata;
    set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;

发生此问题的原因是因为proc import中的xlsx引擎在SAS内部,并且与Excel引擎分开. Excel引擎使用Microsoft Jet或Ace,而xlsx引擎使用专有系统,该系统没有Microsoft的控制权.为什么会这样,我不知道.

The reason why the problem is occurring is because the xlsx engine in proc import is internal to SAS, and is separate from the Excel engine. The Excel engine uses Microsoft Jet or Ace, whereas the xlsx engine uses a proprietary system that does not have as much control as Microsoft's. Why this is the case, I have no idea.

运行proc import时,SAS将尝试猜测其应为哪种格式(您可以使用guessingrows选项控制xls文件的格式).如果检测到所有数字,则将假定为数字变量.不幸的是,如果没有安装SAS/ACCESS to Excel或PC Files Server,就无法直接控制变量类型.

When proc import is run, SAS will try to guess what format it should be (which you do have control over with xls files using the guessingrows option). If it detects all numbers, it will assume a numeric variable. Unfortunately, without SAS/ACCESS to Excel or PC Files Server installed, you cannot control the variable type directly.

这篇关于SAS:使用PROC IMPORT导入.xlsx时定义类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆