如何基于R中识别向量的元素为数据帧分配重复次数? [英] How to assign number of repeats to dataframe based on elements of an identifying vector in R?

查看:93
本文介绍了如何基于R中识别向量的元素为数据帧分配重复次数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中的个人被分配了一个文本ID,该文本ID将一个地名与一个个人ID连接在一起(请参见下面的数据)。最终,我需要将数据集从长转换为宽(例如,使用整形),以便每个人仅包含一行。为此,我需要分配一个时间变量,重整可用于识别随时间变化的协变量,等等。我有(可能很糟糕)的代码来对重复两次的个人执行此操作,但需要最多可以识别18次重复出现。如果我删除哈希开头的行,则下面的代码可以正常工作,但最多只能标识两次重复。如果我把那行留在那(对于重复重复两次以上的人似乎是必要的),R会窒息,给出以下错误(大概是因为第一个重复重复两次):

I have a dataframe with individuals assigned a text id that concatenates a place-name with a personal id (see data, below). Ultimately, I need to do a transformation of the data set from "long" to "wide" (e.g., using "reshape") so that each individual comprises one row, only. In order to do that, I need to assign a "time" variable that reshape can use to identify time-varying covariates, etc. I have (probably bad) code to do this for individuals that repeat up to two times, but need to be able to identify up to 18 repeated occurrences. The code below works fine if I remove the line preceded by the hash, but only identifies up to two repeats. If I leave that line in (which would seem necessary for individuals repeated more than twice), R chokes, giving the following error (presumably because the first individual is repeated only twice):

Error in if (data$uid[i] == data$uid[i - 2]) { : 
  argument is of length zero

有人可以帮忙吗?

place <- rep("ny",10)
pid <- c(1,1,2,2,2,3,4,4,5,5)
uid<- paste(place,pid,sep="")
time <- rep(0,10)
data <- cbind(uid,time)
data <- as.data.frame(data)
data$time <- as.numeric(data$time)

#bad code
data$time[1] <- 1 #need to set first so that loop doesn't go to a row that doesn't exist     (i.e., row 0)
for (i in 2:NROW(data)){
    data$time[i] <- 1 #set first occurrence to 1
    if (data$uid[i] == data$uid[i-1]) {data$time[i] <- 2} #set second occurrence to 2, etc.
    #if (data$uid[i] == data$uid[i-2]) {data$time[i] <- 3}
    i <- i+1
}


推荐答案

在大型数据集上尝试上述解决方案后,我决定为此编写自己的循环。 非常非常耗时,仍然需要将数据分解为5万个元素向量,但最终确实可以工作:

After trying the above solutions on large data sets, I decided to write my own loop for this. It was very time-consuming and still required the data to be broken into 50k-element vectors, but it did work in the end:

system.time( for(i in 2:length(data$uid)) {
if(data$uid[i]==data$uid[i-1]) data$repeats[i] <- data$repeats[i-1]+1
  if ((i %% 1000)== 0) { #helps to keep track of how far the loop has gotten
    print(i) }
    i+1
}
)

感谢大家的帮助。

这篇关于如何基于R中识别向量的元素为数据帧分配重复次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆