如何使用data.frame或data.table长格式方法集成在多行上定义的属性 [英] how to integrate properties defined on multiple rows using a data.frame or data.table long format approach

查看:207
本文介绍了如何使用data.frame或data.table长格式方法集成在多行上定义的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始使用data.table包中的R.我发现它超级方便的转换和聚合数据。我错过的一件事是,你如何变换在多行上定义的数据?我需要首先以大格式重新整形data.frame /表格吗?

I have been recently starting to use the data.table package in R. I find it super-convenient for transforming and aggregating data. One thing that I miss is how do you transform data that are defined on multiple rows? Do I need to reshape the data.frame/table in a wide format first?

假设您有以下数据表:

dt=data.table(group=c("a","a","a","b","b","b"),
              subg=c("f1","f2","f3","f1","f2","f3"), 
              counts=c(3,4,5,8,9,10))

并且对于每个组,要计算每个子组的相对频率(c1 / c1 + c2 + c3))和作为c1,c2,c3(c1,c2,c3是与f1,f2和f3相关的计数)的函数的其他属性。

and for each group you want to calculate the relative frequency of each subgroup (c1/(c1+c2+c3)) and other properties as a function of c1, c2 ,c3 (c1, c2, c3 are the counts associated to f1, f2 and f3).

我可以看到如何以宽格式转换数据表,然后应用转换。有没有办法以长格式(最好是使用数据表)直接计算?

I can see how transform the data table in a wide format and then apply the transformation. Is there any way to calculate this directly in the long format (ideally using the data table)?

一般来说,组和子组可以用多个因子表示。 p>

In general the group and subgroup could be represented by multiple factors.

推荐答案

如果我正确理解OP,你想要这样的smth:

If I understand OP correctly, you want smth like this:

dt[, {bigN = .N; .SD[, .N / bigN, by = subg]}, by = group]

(非常类似):

dt[, {counts.sum = sum(counts); .SD[, counts / counts.sum, by = subg]},
     by = group]

这篇关于如何使用data.frame或data.table长格式方法集成在多行上定义的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆