将 SAS 数据集的标签设置为其变量名称 [英] Set the labels of a SAS Dataset equal to their variable name

查看:96
本文介绍了将 SAS 数据集的标签设置为其变量名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个相当大的几个数据集,这些数据集以 CSV 文件的形式提供给我.当我尝试导入其中一个文件时,数据会很好,但是文件中的变量数量对于 SAS 来说太大了,因此它停止读取变量名称并开始为它们分配序列号.为了保持数据集的变量名称,我在文件中读取的数据行从 1 开始,因此它没有将第一行读取为变量名称 -

I'm working with a rather large several dataset that are provided to me as a CSV files. When I attempt to import one of the files the data will come in fine but, the number of variables in the file is too large for SAS, so it stops reading the variable names and starts assigning them sequential numbers. In order to maintain the variable names off of the data set I read in the file with the data row starting on 1 so it did not read the first row as variable names -

proc import file="X:\xxx\xxx\xxx\Extract\Live\Live.xlsx" out=raw_names dbms=xlsx replace;
    SHEET="live";
    GETNAMES=no;
    DATAROW=1;
run;

然后我运行一个宏来开始分解数据集并根据每个变量中的第一个观察值重命名变量 -

I then run a macro to start breaking down the dataset and rename the variables based on the first observations in each variable -

%macro raw_sas_datasets(lib,output,start,end);
    data raw_names2;
        raw_names;
            if _n_ ne 1 then delete;
            keep A -- E &start. -- &end.;
    run;
    proc transpose data=raw_names2 out=raw_names2;
        var A -- &end.;
    run;
    data raw_names2;
        set raw_names2;
            col1=compress(col1);
    run;
    data raw_values;
        set raw;
            keep A -- E &start. -- &end.;
    run;
    %macro rename(old,new);
        data raw_values;
            set raw_values;
                rename &old.=&new.;
        run;
    %mend rename;
    data _null_;
        set raw_names2;
            call execute('%rename('||_name_||","||col1||")");
    run;
    %macro freq(var);
        proc freq data=raw_values noprint;
           tables &var. / out=&var.;
        run;
    %mend freq;
     data raw_names3;
        set raw_names2;
            if _n_ < 6 then delete;
     run;
    data _null_;
        set raw_names3;
           call execute('%freq('||col1||")");
    run;
    proc sort data=raw_values;
        by StudySubjectID;
    run;
    data &lib..&output.;
        set raw_values;
    run;
%mend raw_sas_datasets;

我遇到的问题是变量名称现在都设置正确并且数据排列正确,但标签仍然是原始 SAS 分配的序列号.有没有办法将所有标签设置为变量名?

The problem I'm running into is that the variable names are now all set properly and the data is lined up correctly, but the labels are still the original SAS assigned sequential numbers. Is there any way to set all of the labels equal to the variable names?

推荐答案

如果您只想删除变量标签(此时它们默认为变量名称),这很容易.来自 SAS 文档:

If you just want to remove the variable labels (at which point they default to the variable name), that's easy. From the SAS Documentation:

proc datasets lib=&lib.;
  modify &output.;
  attrib _all_ label=' ';
run;

不过,我怀疑您有比上述更简单的解决方案.

I suspect you have a simpler solution than the above, though.

  • 实际的重命名步骤需要以不同的方式完成.现在它正在一遍又一遍地重写整个数据集——对于很多变量来说,这是一个糟糕的主意.将您的重命名语句全部放入一个数据步骤,或放入 PROC 数据集或其他内容中.有关如何执行此操作的详细信息,请查找列表处理 SAS";在本网站或谷歌上,您会找到很多解决方案.

  • The actual renaming step needs to be done differently. Right now it's rewriting the entire dataset over and over again - for a lot of variables that is a terrible idea. Get your rename statements all into one datastep, or into a PROC DATASETS, or something else. Look up 'list processing SAS' for details on how to do that; on this site or on google you will find lots of solutions.

您可能可以让 SAS 读取整个第一行.变量的数量不是问题;它可能是线的长度.还有一个问题,我会在几个月前在这个网站上找到解决这个确切问题的问题.

You likely can get SAS to read in the whole first line. The number of variables isn't the problem; it is probably the length of the line. There's another question that I'll find if I can on this site from a few months ago that deals with this exact problem.

我的首选选项是无论如何不要对 CSV 使用 PROC IMPORT;我建议编写一个元数据表来存储变量名称和变量的长度/类型,然后使用它来编写导入代码.一开始需要做更多的工作,但每次研究只需完成一次,您就可以保证 PROC IMPORT 不会为您做出愚蠢的决定.

My preferred option is not to use PROC IMPORT for CSVs anyway; I would suggest writing a metadata table that stores the variable names and the length/types for the variables, then using that to write import code. A little more work at first, but only has to be done once per study and you guarantee PROC IMPORT isn't making silly decisions for you.

这篇关于将 SAS 数据集的标签设置为其变量名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆