展开频率表,其中变量名称为值 [英] Expanding a Frequency Table Where the Variable Names are the Values

查看:114
本文介绍了展开频率表,其中变量名称为值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个数据框,其中每个观察值都链接到特定的ID,并且我有一组变量来定义值",就好像我有一个因子变量一样.但是,单元"中的值是频率.这是一个简化的版本:

I am working with a dataframe where each observation is linked to a specific ID, and I have a set of variables that define the "values" as if I had a factor variable. However, the value in the "cell" is the frequency. Here is a simplified version:

ID  1  2  3
A   2  3  2
B   1  4  1

我想获得两个扩展频率的向量,以便可以计算每个ID的内插中值.也就是说,我想要某种形式的东西:

I would like to get two vectors that expand the frequencies so that I can calculate an interpolated median for each ID. That is, I'd like something of the form:

A  B
1  1
1  2
2  2
2  2
2  2
3  3
3

psych包具有函数interp.median,该函数随后可以获取每个向量并返回每个ID的插值中值,我希望将这些ID作为新变量包括在原始数据帧中.我签出了vcdExtra软件包,可以使用其expand.dft函数执行此操作,但是我不确定它如何工作.

The psych package has a function interp.median that could then take each vector and return the interpolated median for each ID that I would like to include as a new variable in the original dataframe. I checked out the vcdExtra package which could maybe do this with its expand.dft function, but I'm not sure exactly how it would work.

任何帮助将不胜感激!

要进一步完善,如果最终结果是一个数据帧,并在末尾填充NA,则interp.median效果最好.也就是说,形式如下:

To refine a bit more, interp.median would work best if the final result was a data frame, with NAs padded at the end. That is, something of the form:

A  B
1  1
1  2
2  2
2  2
2  2
3  3
3  NA

推荐答案

如果dat是数据集

  lst <- by(dat[,-1], dat[,1], function(x) rep(seq_along(x), x))
  lst
  #dat[, 1]: A
  #[1] 1 1 2 2 2 3 3
  #------------------------------------------------------------ 
 #dat[, 1]: B
 #[1] 1 2 2 2 2 3

 indx <- max(sapply(lst,length))
 dat2 <- do.call(data.frame,lapply(lst, function(x) c(x,rep(NA,indx-length(x)))))
 dat2
 #  A  B
 #1 1  1
 #2 1  2
 #3 2  2
 #4 2  2
 #5 2  2
 #6 3  3
 #7 3 NA

  lst2 <- lapply(split(dat[,-1], dat$ID), function(x) rep(seq_along(unlist(x)), unlist(x)))

  do.call(data.frame,lapply(lst2, function(x) c(x,rep(NA,indx-length(x)))))

数据

 dat <-  structure(list(ID = c("A", "B"), `1` = c(2L, 1L), `2` = 3:4, 
`3` = c(2L, 1L)), .Names = c("ID", "1", "2", "3"), class = "data.frame", row.names = c(NA, 
 -2L))

这篇关于展开频率表,其中变量名称为值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆