SAS PROC IMPORT 多个 SAV 文件 - 强制 SPSS 值标签创建唯一的 SAS 格式名称 [英] SAS PROC IMPORT Multiple SAV Files- Force SPSS Value Labels to Create UNIQUE SAS Format Names

查看:54
本文介绍了SAS PROC IMPORT 多个 SAV 文件 - 强制 SPSS 值标签创建唯一的 SAS 格式名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时,如果我将多个 SAV 文件导入到 SAS 工作库中,稍后导入的一个变量会覆盖先前导入的具有相似名称的变量的显示文本(即格式).

Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.

我已经确定这是因为后一个数据集的变量为自定义格式(来自 SPSS Values Labels)生成了一个格式名称,它与来自的格式名称相同较早的变量,即使这些变量在 SAV 文件的值标签"属性中具有不同的定义.

I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.

在自动命名新的自定义格式之前,有没有办法通过在 PROC IMPORT 中自动检查格式名称是否已存在于工作库格式库中来强制 SAS 不重复使用相同的格式名称?或者有没有其他方法可以防止这种情况发生?

Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?

这是我的代码以及变量名称、格式名称等的示例.

Here is my code as well as an example of the variable names, format names, etc.

proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace; 
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace; 
run;

Dataset1 包含变量 Question_1.原始 SPSS 值标签为 1=是 2=否.导入此数据集时,SAS 自动为 Question_1 生成格式名称 QUESTION..当只导入Dataset1时,格式QUESTION的定义.对应于 Dataset1.SAV 中 Question_1 的 SPSS Value Labels

Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV

DatasetA 包含带有 SPSS 值标签的变量 Question_A 1=同意 2=不确定 3=不同意.当此数据集在 Dataset1 之后导入时,SAS 会自动生成格式名称 QUESTION.对于 Question_A,即使工作库已经包含名为 QUESTION 的格式..因此,这会覆盖格式 QUESTION 的定义.这是在导入 Dataset1 时生成的.导入 DatasetA 后,格式 QUESTION 的定义.对应于 DatasetA.SAV 中 Question_A 的 SPSS Value Labels

DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV

因此,当Dataset1和DatasetA都被导入时,变量Question_1和Question_A都有分配给它们的格式名称QUESTION——以及格式QUESTION的定义.SAS 工作文件夹中的 对应于 DatasetA.SAV 中的 SPSS 值标签,而不是 Dataset1.SAV.因此,Question_1 将显示为 1=Agree 2=Unsure,即使变量值实际上意味着 1=Yes 2=No.

Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.

我希望这两个变量在导入步骤中自动生成不同的自定义格式名称.有什么办法可以做到这一点吗?或者,有没有其他方法可以防止发生这种类型的覆盖?

I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?

谢谢.

推荐答案

防止文字覆盖的方法是为使用 FMTLIB= optional 语句读取的每个 SPSS 文件指向不同的格式目录.

The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.

proc import out=dataset1 replace 
   datafile="S:\folder\Dataset1.SAV" dbms=SAV 
; 
   fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace 
   datafile="S:\folder\Dataset2.SAV" dbms=SAV 
; 
   fmtlib=work.fmtcat2;
run;

然后您可以稍后重命名冲突的格式(并更改数据集中的附加格式以使用新名称).

You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).

因此,如果成员名称和格式名称足够短,您应该能够通过附加两者来生成唯一的新名称(在两者之间添加一些内容以避免冲突).因此,类似这样的事情将重命名格式,更改附加到变量的格式名称并将格式重建到 WORK.FORMATS 目录中.

So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.

%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);

proc import datafile=%sysfunc(quote(&file)) dbms=save
  out=&memname replace
; 
  fmtlib=work.&memname ;
run;

proc format lib=work.&memname cntlout=formats;
run;

data formats ;
  set formats end=eof;
  by fmtname type notsorted;
  oldname=fmtname;
  fmtname=catx('_',"&memname",oldname);
run;

proc contents data=&memname noprint out=contents;
run;

proc sql noprint;
  select distinct catx(' ',c.name,cats(f.fmtname,'.'))
    into :fmtlist separated by ' '
  from contents c inner join formats f
  on c.format = f.oldname
  ;
quit;

proc datasets nolist lib=work;
  modify &memname;
    format &fmtlist ;
  run;
quit;

proc format lib=work.formats cntlin=formats;
run;

%mend sav_import;

%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);

这篇关于SAS PROC IMPORT 多个 SAV 文件 - 强制 SPSS 值标签创建唯一的 SAS 格式名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆