从长到宽转置数据 [英] Transpose Data from Long to Wide

查看:40
本文介绍了从长到宽转置数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 SAS 中的数据从长格式转换为宽格式.我遇到的问题是我有多个要转置的列.我在下面有几个示例数据集来演示我正在尝试做什么.我正在执行此操作的实际数据集将非常大,我认为处理此问题的一种方法可能是转置单个列,然后在最后合并,但我将要执行此操作的数据集正在运行明显更大(数万列),因此这将非常不可行.

I am trying to transpose data in SAS from a long format to a wide format. The problem I'm having is that I have multiple columns that I'm trying to transpose. I have a few example datasets below to demonstrate what I'm trying to do. The actual dataset I'm doing this on is going to be very large, I think one way to handle this could be to tranpose individual columns and then merge at the end, but the dataset I'm going to be doing this on is going to be significantly larger (tens of thousands of columns), so this will be pretty unfeasible.

以下是我开始使用的数据:

Below is the data I'm starting with:

data current_state;
input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;

以下是我希望转置的结果:

Below is what I would want the outcome of the transpose to be:

data desired_state;
input id $ d145 $ d213 $ d231 $ d234 $ d269 $ d360 $ d409 $ d435 $ d463 $ d609 $ d678 $ d690;
datalines;
1 0 50 0 50 0 0 0 50 0 0 50 0
2 25 25 0 0 25 25 0 0 0 0 0 25
3 0 0 10 0 0 0 10 0 10 10 0 10
;

我尝试了以下操作,但没有给我想要的输出.

I have attempted the following, which isn't giving me the desired output.

proc transpose data=current_state out=test1;
by id;
id att_1 att_2 att_3 att_4 att_5;
var Dollars;
run;

推荐答案

Proc TRANSPOSE 没有将并行透视"转换为单行的语法——多个 id 连接起来成为结果 id列名.

Proc TRANSPOSE does not have syntax for 'parallel pivoting' into a single row -- multiple ids are concatenated to become the resultant id for column name.

使用串行旋转可以实现平行旋转结果 - 首先旋转到更高的矢量布局,然后旋转到宽布局.

A parallel pivot result can be achieved done using serial pivoting - first pivot to a an even taller vector layout, and then to the wide layout.

proc transpose data=current_state out=stage1_vector;
by id Dollars;
var att_1 att_2 att_3 att_4 att_5;
run;

proc transpose data=stage1_vector out=want;
by id;
id col1;
run;

want 中的列顺序基于矢量布局中数据中的 col1 值外观.您可以通过创建一个虚拟记录来强制排序列顺序,其中 col1 值是您想要的出现顺序.

The column order in want is based on the col1 value appearance in the data in the vector layout. You can force a sorted column order by creating a dummy record where the col1 values are in the appearance order you want.

proc transpose data=current_state out=stage1_vector(drop=_name_);
by id Dollars;
var att_1 att_2 att_3 att_4 att_5;
run;

proc sort nodupkey data=stage1_vector(keep=col1) out=ids;
  by col1;
run;

data vector_view / view=vector_view;
  set ids stage1_vector;
run;

proc transpose data=vector_view out=want;
  by id;
  id col1;
run;

应该仔细考虑用零替换缺失值的要求,特别是如果您想用 d* 值(Proc MEANS 等)计算 id 的数量.无论如何,替换可以在转置后步骤中完成.

The requirement for replacing missing values with zeroes should be considered carefully, especially if you want to count the number of ids with a d* value (Proc MEANS etc). Regardless, the replacement can be done in a post-transpose step.

data want;
  set want;
  array zerome d:;

  /* re-purpose _n_ temporarily as a loop index variable
   * that does not have to be explicitly dropped
   */
  do _n_ = 1 to dim(zerome);
    if missing(zerome(_n_)) then zerome(_n_) = 0;
  end;
run;

这篇关于从长到宽转置数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆