根据R中的频率表创建一个包含各个试验的表(表函数的逆函数) [英] Creating a table with individual trials from a frequency table in R (inverse of table function)

查看:82
本文介绍了根据R中的频率表创建一个包含各个试验的表(表函数的逆函数)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 data.frame 的R中列出了一个频率数据表,其中列出了因子水平以及成功和失败的次数。我想将其从频率表转换为事件列表-即表命令的相反。具体来说,我想转一下:

I have a frequency table of data in a data.frame in R listing factor levels and counts of successes and failures. I would like to turn it from frequency table into a list of events - i.e. the opposite of the "table" command. Specifically, I would like to turn this:

factor.A factor.B success.count fail.count
-------- -------- ------------- ----------
 0        1        0             2
 1        1        2             1

到此:

factor.A factor.B result 
-------- -------- -------
 0        1        0
 0        1        0
 1        1        1
 1        1        1
 1        1        0

在我看来,重塑应该执行此操作,甚至应该执行一些我从未听说过的晦涩的基本函数,但是我已经没运气。甚至重复 data.frame 的各个行也很棘手-如何将可变数量的参数传递给 rbind

It seems to me that reshape ought to do this, or even some obscure base function that I have not heard of, but I've had no luck. Even repeating individual rows of a data.frame is tricky - how do you pass a variable number of arguments to rbind?

提示?

背景:
为什么?因为与汇总的二项式数据相比,交叉验证逻辑拟合是否适合此类数据集。

Background: Why? Because it it easier to cross-validate logistic fits to such a data set than the aggregated binomial data.

我正在使用广义线性模型作为二项式回归分析R,并且由于我的目的是预测性的,因此希望通过交叉验证来控制数据的正则化。

I'm analysing my with a generalised linear model as binomial regression in R and would like to cross validate to control regularisation of my data since my purpose is predictive.

但是,据我所知,R中的默认交叉验证例程不适用于二项式数据,只需跳过频率表的整个行,而不是单独进行试验即可。这意味着在我的成本函数中进行轻度抽样和大量抽样的因子组合具有相同的权重,这对我的数据是不合适的。

However, as far as I can tell, the default cross validation routines in R are not great for binomial data, simply skipping entire rows of the frequency table, rather than trials individually. This means that lightly and heavily sampled factor combinations have the same weight in my cost function, which is inappropriate for my data.

推荐答案

您可以尝试以下方法:

# create 'result' vector
# repeat 1s and 0s the number of times given in the respective 'count' column
result <- rep(rep(c(1, 0), nrow(df)), unlist(df[ , c("success.count", "fail.count")]))

# repeat each row in df the number of times given by the sum of 'count' columns
data.frame(df[rep(1:nrow(df), rowSums(df[ , c("success.count", "fail.count")]) ), c("factor.A", "factor.B")], result)

#     factor.A factor.B result
# 1          0        1      0
# 1.1        0        1      0
# 2          1        1      1
# 2.1        1        1      1
# 2.2        1        1      0

这篇关于根据R中的频率表创建一个包含各个试验的表(表函数的逆函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆