如何使用dcast将一列拆分为不同的列而不进行汇总? [英] How to split one column into different columns with dcast without aggregating?

查看:155
本文介绍了如何使用dcast将一列拆分为不同的列而不进行汇总?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用dcast重塑数据。我正在处理每个样本有10-30个样本单位的样本。我无法汇总我的数据。

I'm trying to reshape my data using dcast. I'm working with samples where each sample has 10-30 sample units. I can't have my data aggregate.

我的数据采用以下格式:

My data is in this format:

ID  total
sample_1    1
sample_1    0
sample_1    2
sample_1    1
sample_1    0
sample_1    0
sample_1    2
sample_1    1
sample_1    0
sample_1    2
sample_1    1
sample_1    4
sample_2    2
sample_2    1
sample_2    2
sample_2    0
sample_2    0
sample_2    0
sample_2    1
sample_2    2
sample_2    1
sample_2    4
sample_2    5
sample_2    2
sample_2    1
sample_3    0
sample_3    0
sample_3    1
sample_3    2
sample_3    1
sample_3    0
sample_3    2
sample_3    1
sample_3    4
sample_3    5
sample_3    1
sample_3    1
sample_3    0
sample_3    0
sample_3    1

我希望它看起来像这样:

And I want it to looks like it:

sample_1    sample_2    sample_3
1           2           0
0           1           0
2           2           1
1           0           2
0           0           1
0           0           0
2           1           2
1           2           1
0           1           4
2           4           5
1           5           1
4           2           1
            1           0
                        0
                        1

我的样品ID变成不同的列。

Where my sample ID's turn into different columns.

我尝试了几种方法,但R不断对其进行汇总。

I tried in several ways but R keep aggregating it.

推荐答案

您可以使用进行此操作dcast(),但是您必须为每个 ID 添加行号。

You can do this with dcast() but you have to add row numbers for each ID.

data.table 包是 reshape2 会实现 dcast() data.table 具有方便的 rowid()函数,可在每个组中生成唯一的行ID。除此之外,我们得到:

The data.table package is another package besides reshape2 which implements dcast(). data.table has a handy rowid() function to generate unique row ids within each group. WIth that, we get:

library(data.table)
dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
#    ID sample_1 sample_2 sample_3
# 1:  1        1        2        0
# 2:  2        0        1        0
# 3:  3        2        2        1
# 4:  4        1        0        2
# 5:  5        0        0        1
# 6:  6        0        0        0
# 7:  7        2        1        2
# 8:  8        1        2        1
# 9:  9        0        1        4
#10: 10        2        4        5
#11: 11        1        5        1
#12: 12        4        2        1
#13: 13       NA        1        0
#14: 14       NA       NA        0
#15: 15       NA       NA        1

但是,我建议以长格式继续任何数据处理并使用分组。这比处理单个列要容易得多。例如,

However, I recommend to continue any data processing in long format and use grouping. That's much easier than to work on individual columns. For instance,

# count observations by group
DF[, .N, by = ID]
#         ID  N
#1: sample_1 12
#2: sample_2 13
#3: sample_3 15

# compute mean by group
DF[, mean(total), by = ID]
#         ID       V1
#1: sample_1 1.166667
#2: sample_2 1.615385
#3: sample_3 1.266667

# get min and max by group
DF[, .(min = min(total), max = max(total)), by = ID]
#         ID min max
#1: sample_1   0   4
#2: sample_2   0   5
#3: sample_3   0   5

# the same using range()
DF[, as.list(range(total)), by = ID]
#         ID V1 V2
#1: sample_1  0  4
#2: sample_2  0  5
#3: sample_3  0  5



数据



Data

DF <- structure(list(ID = c("sample_1", "sample_1", "sample_1", "sample_1", 
"sample_1", "sample_1", "sample_1", "sample_1", "sample_1", "sample_1", 
"sample_1", "sample_1", "sample_2", "sample_2", "sample_2", "sample_2", 
"sample_2", "sample_2", "sample_2", "sample_2", "sample_2", "sample_2", 
"sample_2", "sample_2", "sample_2", "sample_3", "sample_3", "sample_3", 
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3", 
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3"
), total = c(1L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 0L, 2L, 1L, 4L, 
2L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 1L, 4L, 5L, 2L, 1L, 0L, 0L, 1L, 
2L, 1L, 0L, 2L, 1L, 4L, 5L, 1L, 1L, 0L, 0L, 1L)), .Names = c("ID", 
"total"), row.names = c(NA, -40L), class = "data.frame")

这篇关于如何使用dcast将一列拆分为不同的列而不进行汇总?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆