SAS:使用 PROC IMPORT 导入 .xlsx 时定义类型 [英] SAS: Define type when importing .xlsx with PROC IMPORT

查看:35
本文介绍了SAS:使用 PROC IMPORT 导入 .xlsx 时定义类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:在使用 PROC IMPORT 时如何定义从 .xlsx 文件导入的变量的变量类型?

Questions: How do I define the variable type of variables being imported from a .xlsx file when using PROC IMPORT?

我的工作

我正在使用 SAS v9.4.据我所知,它是香草 SAS.我没有 SAS/ACCESS 等.

I am using SAS v9.4. So far as I'm aware, it is vanilla SAS. I do not have SAS/ACCESS etc.

我的数据如下所示:

ID1        ID2  MONTH   YEAR    QTR VAR1    VAR2
ABC_1234   1    1       2010    1   869     3988
ABC_1235   12   2       2010    1   639     3144
ABC_1236   13   3       2010    2   698     3714
ABC_1237   45   4       2010    2   630     3213

我正在运行的程序是:

proc import out=rawdata
    datafile = "c:
awdata.xlsx"
        dbms = xlsx replace;

    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;
run;

当我运行这一步时,我得到以下日志输出:

When I run this step, I get the following log output:

错误:您试图在数据集 WORK.RAWDATA 中使用字符格式 $ 和数字变量 ID2.

ERROR: You are trying to use the character format $ with the numeric variable ID2 in data set WORK.RAWDATA.

这似乎告诉我的是 SAS 自动分配变量类型.我希望能够手动控制它.我找不到解释如何执行此操作的文档.INFORMAT、LENGTH 和 INPUT 语句似乎不适用于 PROC IMPORT.

What this seems to tell me is that SAS automatically assigns the variable type. I want to be able to control it manually. I cannot find documentation which explains how to do this. INFORMAT, LENGTH, and INPUT statements do not seem to work for PROC IMPORT.

我使用 PROC IMPORT 是因为它在 .xlsx 文件中取得了最大的成功.我能想到的两种可能的解决方案是 1)将 .xlsx 转换为 .csv 并在 DATA 步骤中使用 INFILE 以及 2)将数据作为数字输入并在稍后的步骤中将其转换为字符.我不喜欢第一个解决方案,因为它需要我手动操作数据,这是潜在的错误来源(例如删除前导零).我不喜欢第二个,因为它可能会无意中引入错误(同样,例如前导零)并引入无关的工作.

I am using PROC IMPORT because it has yielded the greatest success with .xlsx files overall. Two possible solutions I can think of are 1) convert .xlsx to .csv and use INFILE in a DATA step and 2) bring the data in as numeric and convert it to character in a later step. I dislike the first solution because it requires me to manually manipulate the data, a potential source of error (such as leading zeros being removed). I dislike the second because it may unintentionally introduce errors (again, such as with leading zeros) and introduces extraneous work.

推荐答案

您可以尝试在 Excel 中将列类型设置为文本",看看 SAS 是否会从中确定.值得一试.

You can try to set the columns type as "Text" in Excel to see if SAS will determine it from that. Worth a shot.

如果这不起作用,除非您使用 PC 文件服务器,或者在同一 SAS 服务器上安装了相同位数的 Excel 以直接访问文件,否则您将需要使用单独的数据步骤来转换列.

If that doesn't work, unless you use PC Files Server, or have Excel of the same bitness installed on the same SAS server for direct access to the file, you will need to use a separate data step to convert the columns.

proc import 
    file = "c:
awdata.xlsx"
    out=_rawdata(rename=(ID2 = _ID2) )
    dbms = xlsx replace;
run;

data rawdata;
    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;

    set _rawdata;

    ID2 = cats(_ID2);

    drop _:;
run;

如果您有 SAS/Access to Excel,您可以使用 DBDSOPTS 数据集选项.例如:

If you do have SAS/Access to Excel, you can control these variables directly with the DBDSOPTS data set option. For example:

libname myxlsx Excel 'C:
awdata.xlsx';

data rawdata;
    set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;

出现问题的原因是 proc import 中的 xlsx 引擎是 SAS 内部的,与 Excel 是分开的引擎.Excel 引擎使用 Microsoft Jet 或 Ace,而 xlsx 引擎使用的专有系统没有 Microsoft 那么多的控制权.为什么会这样,我不知道.

The reason why the problem is occurring is because the xlsx engine in proc import is internal to SAS, and is separate from the Excel engine. The Excel engine uses Microsoft Jet or Ace, whereas the xlsx engine uses a proprietary system that does not have as much control as Microsoft's. Why this is the case, I have no idea.

proc import 运行时,SAS 将尝试猜测它应该是什么格式(您可以使用 guessingrows 选项控制 xls 文件的格式).如果它检测到所有数字,它将假定一个数字变量.遗憾的是,如果没有安装 SAS/ACCESS to Excel 或 PC Files Server,您将无法直接控制变量类型.

When proc import is run, SAS will try to guess what format it should be (which you do have control over with xls files using the guessingrows option). If it detects all numbers, it will assume a numeric variable. Unfortunately, without SAS/ACCESS to Excel or PC Files Server installed, you cannot control the variable type directly.

这篇关于SAS:使用 PROC IMPORT 导入 .xlsx 时定义类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆