SAS:使用 PROC IMPORT 导入 .xlsx 时定义类型 [英] SAS: Define type when importing .xlsx with PROC IMPORT
问题描述
问题:在使用 PROC IMPORT 时如何定义从 .xlsx 文件导入的变量的变量类型?
Questions: How do I define the variable type of variables being imported from a .xlsx file when using PROC IMPORT?
我的工作
我正在使用 SAS v9.4.据我所知,它是香草 SAS.我没有 SAS/ACCESS 等.
I am using SAS v9.4. So far as I'm aware, it is vanilla SAS. I do not have SAS/ACCESS etc.
我的数据如下所示:
ID1 ID2 MONTH YEAR QTR VAR1 VAR2
ABC_1234 1 1 2010 1 869 3988
ABC_1235 12 2 2010 1 639 3144
ABC_1236 13 3 2010 2 698 3714
ABC_1237 45 4 2010 2 630 3213
我正在运行的程序是:
proc import out=rawdata
datafile = "c:
awdata.xlsx"
dbms = xlsx replace;
format ID1 $9. ;
format ID2 $3. ;
format MONTH best2. ;
format YEAR best4. ;
format QTR best1. ;
format VAR1 best3. ;
format VAR2 best4. ;
run;
当我运行这一步时,我得到以下日志输出:
When I run this step, I get the following log output:
错误:您试图在数据集 WORK.RAWDATA 中使用字符格式 $ 和数字变量 ID2.
ERROR: You are trying to use the character format $ with the numeric variable ID2 in data set WORK.RAWDATA.
这似乎告诉我的是 SAS 自动分配变量类型.我希望能够手动控制它.我找不到解释如何执行此操作的文档.INFORMAT、LENGTH 和 INPUT 语句似乎不适用于 PROC IMPORT.
What this seems to tell me is that SAS automatically assigns the variable type. I want to be able to control it manually. I cannot find documentation which explains how to do this. INFORMAT, LENGTH, and INPUT statements do not seem to work for PROC IMPORT.
我使用 PROC IMPORT 是因为它在 .xlsx 文件中取得了最大的成功.我能想到的两种可能的解决方案是 1)将 .xlsx 转换为 .csv 并在 DATA 步骤中使用 INFILE 以及 2)将数据作为数字输入并在稍后的步骤中将其转换为字符.我不喜欢第一个解决方案,因为它需要我手动操作数据,这是潜在的错误来源(例如删除前导零).我不喜欢第二个,因为它可能会无意中引入错误(同样,例如前导零)并引入无关的工作.
I am using PROC IMPORT because it has yielded the greatest success with .xlsx files overall. Two possible solutions I can think of are 1) convert .xlsx to .csv and use INFILE in a DATA step and 2) bring the data in as numeric and convert it to character in a later step. I dislike the first solution because it requires me to manually manipulate the data, a potential source of error (such as leading zeros being removed). I dislike the second because it may unintentionally introduce errors (again, such as with leading zeros) and introduces extraneous work.
推荐答案
您可以尝试在 Excel 中将列类型设置为文本",看看 SAS 是否会从中确定.值得一试.
You can try to set the columns type as "Text" in Excel to see if SAS will determine it from that. Worth a shot.
如果这不起作用,除非您使用 PC 文件服务器,或者在同一 SAS 服务器上安装了相同位数的 Excel 以直接访问文件,否则您将需要使用单独的数据步骤来转换列.
If that doesn't work, unless you use PC Files Server, or have Excel of the same bitness installed on the same SAS server for direct access to the file, you will need to use a separate data step to convert the columns.
proc import
file = "c:
awdata.xlsx"
out=_rawdata(rename=(ID2 = _ID2) )
dbms = xlsx replace;
run;
data rawdata;
format ID1 $9. ;
format ID2 $3. ;
format MONTH best2. ;
format YEAR best4. ;
format QTR best1. ;
format VAR1 best3. ;
format VAR2 best4. ;
set _rawdata;
ID2 = cats(_ID2);
drop _:;
run;
如果您有 SAS/Access to Excel,您可以使用 DBDSOPTS 数据集选项.例如:
If you do have SAS/Access to Excel, you can control these variables directly with the DBDSOPTS data set option. For example:
libname myxlsx Excel 'C:
awdata.xlsx';
data rawdata;
set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;
出现问题的原因是 proc import
中的 xlsx
引擎是 SAS 内部的,与 Excel
是分开的引擎.Excel
引擎使用 Microsoft Jet 或 Ace,而 xlsx
引擎使用的专有系统没有 Microsoft 那么多的控制权.为什么会这样,我不知道.
The reason why the problem is occurring is because the xlsx
engine in proc import
is internal to SAS, and is separate from the Excel
engine. The Excel
engine uses Microsoft Jet or Ace, whereas the xlsx
engine uses a proprietary system that does not have as much control as Microsoft's. Why this is the case, I have no idea.
当 proc import
运行时,SAS 将尝试猜测它应该是什么格式(您可以使用 guessingrows
选项控制 xls 文件的格式).如果它检测到所有数字,它将假定一个数字变量.遗憾的是,如果没有安装 SAS/ACCESS to Excel 或 PC Files Server,您将无法直接控制变量类型.
When proc import
is run, SAS will try to guess what format it should be (which you do have control over with xls files using the guessingrows
option). If it detects all numbers, it will assume a numeric variable. Unfortunately, without SAS/ACCESS to Excel or PC Files Server installed, you cannot control the variable type directly.
这篇关于SAS:使用 PROC IMPORT 导入 .xlsx 时定义类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!