如何从SAS中的较大文件创建截断的永久数据库 [英] How to create a truncated permanent database from a larger file in SAS

查看:169
本文介绍了如何从SAS中的较大文件创建截断的永久数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试读取逗号分隔的.txt文件(在下面的代码中称为file.txt)到SAS,以创建一个永久性数据库,其中只包含一些变量和观察。

以下是一个.txt文件的代码片段供参考:

  SUMLEV,REGION, DIVISION,STATE,NAME,POPESTIMATE2013,POPEST18PLUS2013,PCNT_POPEST18PLUS 
10,0,0,0,United States,316128839,242542967,76.7
40,3,6,1,Alabama,4833722,3722241,77
40,4,9,2,Alaska,735132,547000,74.4
40,4,8,4,Arizona,6626624,5009810,75.6
40,3,7,5, Arkansas,2959373,2249507,76

我的简写代码如下:

 选项nocenter nodate ls = 72 ps = 58; 
filename foldr1'C:\Users\redacted\Desktop\file.txt';
libname foldr2'C:\Users\redacted\Desktop\Data';
libname foldr3'C:\Users\redacted\Desktop\Formats';
options fmtsearch =(FMTfoldr.bf_fmts);

proc格式库= foldr3.bf_fmts;
[redacted]
run;

data foldr2.file;
infile foldr1 DLM =','firstobs = 2 obs = 52;
输入STATE $ NAME $ REGION $ POPESTIMATE2013;
PERCENT = POPESTIMATE2013 / 316128839;
format REGION $ regfmt .;
run;

proc print data = foldr2.file;
sum POPESTIMATE2013 PERCENT;
title'Title';
run;

在我的 INPUT 语句中,变量,我想包括在我的新的截断的数据库(STATE,NAME,REGION等)。



当我打印截断的数据库时,我注意到所有的 INPUT 变量不会对应于原始文件中的相同变量。
相反,我的变量打印如下:




  • STATE(第一个var列在INPUT)打印为SUMLEV
    .txt文件)

  • NAME(第二个var列在INPUT中)打印为REGION(第二个var列在.txt文件中)

  • 打印为DIVISION(第3个)的REGION(第3个)

  • 以STATE(第4个)打印的POPESTIMATE2013 >


看起来SAS是基于订单匹配我的 INPUT 变量,而不是名字。因此,由于我在我的 INPUT 语句中首先列出STATE ,所以SAS打印出原始.txt的文件(即SUMLEV变量)。



任何想法我的代码有什么问题?感谢您的帮助!

解决方案

您当前的代码正在读取CSV文件每行的前4个值,到您列出的名称的列。



输入语句列出了您要读入的所有列(以及从哪里读取)不会在输入文件中搜索命名列。



下面的代码会产生你想要的输出。 keep 语句列出了输出中所需的列。

  data foldr2.file; 
infile foldr1 dlm =,firstobs = 2 obs = 52;
/ *防止截断名称变量* /
informat NAME $ 20 .;
/ *为每列命名* /
输入SUMLEV区域划分状态名称$ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
/ *只保留您想要的列* /
保留STATE NAME REGION POPESTIMATE2013 PERCENT;
PERCENT = POPESTIMATE2013 / 316128839;
format REGION $ regfmt .;
run;

对于稍微更复杂的解决方案,请参阅Joe的优秀回答这里。将此方法应用于您的数据将需要提前设置列的长度,并将字符值转换为数字。

  data foldr2.file; 
infile foldr1 dlm =,firstobs = 2 obs = 52;
length STATE 8. NAME $ 13。地区8。
input @;
STATE = input(scan(_INFILE_,4,','),最好。
NAME = scan(_INFILE_,5,',');
REGION = input(scan(_INFILE_,2,','),最好。
POPESTIMATE2013 = input(scan(_INFILE_,6,','),最好。
PERCENT = POPESTIMATE2013 / 316128839;
format REGION $ regfmt .;
run;

如果你想更熟悉SAS,那么值得你来看看SAS 文档用于读取文件。


I'm trying to read a comma delimited .txt file (called 'file.txt' in the code below) into SAS in order to create a permanent database that includes only some of the variables and observations.

Here's a snippet of the .txt file for reference:

SUMLEV,REGION,DIVISION,STATE,NAME,POPESTIMATE2013,POPEST18PLUS2013,PCNT_POPEST18PLUS
10,0,0,0,United States,316128839,242542967,76.7
40,3,6,1,Alabama,4833722,3722241,77
40,4,9,2,Alaska,735132,547000,74.4
40,4,8,4,Arizona,6626624,5009810,75.6
40,3,7,5,Arkansas,2959373,2249507,76

My (abbreviated) code is as follows:

options nocenter nodate ls=72 ps=58;
filename foldr1 'C:\Users\redacted\Desktop\file.txt';
libname foldr2 'C:\Users\redacted\Desktop\Data';
libname foldr3 'C:\Users\redacted\Desktop\Formats';
options fmtsearch=(FMTfoldr.bf_fmts);

proc format library=foldr3.bf_fmts;
[redacted]
run;

data foldr2.file;
infile foldr1 DLM=',' firstobs=2 obs=52;
input STATE $ NAME $ REGION $ POPESTIMATE2013;
PERCENT=POPESTIMATE2013/316128839;
format REGION $regfmt.;
run;

proc print data=foldr2.file;
sum POPESTIMATE2013 PERCENT;
title 'Title';
run;

In my INPUT statement, I list the variables that I want to include in my new truncated database (STATE, NAME, REGION, etc.).

When I print my truncated database, I notice that all of my INPUT variables do not correspond to the same variables in the original file. Instead my variables print out like this:

  • STATE (1st var listed in INPUT) printed as SUMLEV (1st var listed in .txt file)
  • NAME (2nd var listed in INPUT) printed as REGION (2nd var listed in .txt file)
  • REGION (3rd " " " ") printed as DIVISION (3rd " " " ")
  • POPESTIMATE2013 (4th " " " ") printed as STATE (4th " " " ")

It seems that SAS is matching my INPUT variables based on order, not on name. So, because I list STATE first in my INPUT statement, SAS prints out the first variable of the original .txt file (i.e., the SUMLEV variable).

Any idea what's wrong with my code? Thanks for your help!

解决方案

Your current code is reading in the first 4 values from each line of the CSV file and assigning them to columns with the names you have listed.

The input statement lists all the columns you want to read in (and where to read them from), it does not search for named columns within the input file.

The code below should produce the output you want. The keep statement lists the columns that you want in the output.

data foldr2.file;
    infile foldr1 dlm = "," firstobs = 2 obs = 52;
    /* Prevent truncating the name variable */
    informat NAME $20.;
    /* Name each of the columns */
    input SUMLEV REGION DIVISION STATE NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
    /* Keep only the columns you want */
    keep STATE NAME REGION POPESTIMATE2013 PERCENT;
    PERCENT = POPESTIMATE2013/316128839;
    format REGION $regfmt.;
run;

For a slightly more involved solution see Joe's excellent answer here. Applying this approach to your data will require setting the lengths of your columns in advance and converting character values to numeric.

data foldr2.file;
    infile foldr1 dlm = "," firstobs = 2 obs = 52;
    length STATE 8. NAME $13. REGION 8. POPESTIMATE2013 8.;
    input @;
    STATE = input(scan(_INFILE_, 4, ','), best.);
    NAME = scan(_INFILE_, 5, ',');
    REGION = input(scan(_INFILE_, 2, ','), best.);
    POPESTIMATE2013 = input(scan(_INFILE_, 6, ','), best.);
    PERCENT = POPESTIMATE2013/316128839;
    format REGION $regfmt.;
run;

If you are looking to become more familiar with SAS it would be worth your while to take a look at the SAS documentation for reading files.

这篇关于如何从SAS中的较大文件创建截断的永久数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆