R-有条件的滞后-如何滞后一定数量的细胞直到满足条件? [英] R - Conditional lagging - How to lag a certain amount of cells until a condition is met?

查看:98
本文介绍了R-有条件的滞后-如何滞后一定数量的细胞直到满足条件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试解决这个问题已有数周,但似乎无法解决。

Been trying to solve this for weeks, but can't seem to get it.

我有以下数据框

    post_id user_id
1    post-1   user1
2    post-2   user2
3 comment-1   user1
4 comment-2   user3
5 comment-3   user4
6    post-3   user2
7 comment-4   user2

创建一个新变量 parent_id 。因此对于每个观察,它都应执行以下步骤:

And want to create a new variable parent_id. So that for each observation it should perform the following steps:


  1. 检查是否 post_id post comment

  2. 如果 post_id post ,然后 parent_id 应该等于最早的 post_id 整个数据框。

  3. 如果 post_id 是第一篇文章,则 parent_id 应等于 NA

  4. 如果 post_id 条评论然后 parent_id 应该等于遇到的第一个 post_id

  1. Check if post_id is either post or comment
  2. If post_id is post then parent_id should equal the earliest post_id of the whole data frame.
  3. If post_id is the first post then parent_id should equal NA
  4. If post_id is comment then parent_id should equal to the first post_id it encounters.

输出应类似于:

    post_id user_id parent_id_man
1    post-1   user1            NA
2    post-2   user2        post-1
3 comment-1   user1        post-2
4 comment-2   user3        post-2
5 comment-3   user4        post-2
6    post-3   user2        post-1
7 comment-4   user2        post-3

我尝试了以下操作:

#Prepare data
df <- df %>% separate(post_id, into=c("type","number"), sep="-", remove=FALSE)
df$number <- as.numeric(df$number)
df <- df %>% mutate(comment_number = ifelse(type == "comment",number,99999))
df <- df %>% mutate(post_number = ifelse(type == "post",number,99999))

#Create parent_id column
df <- df %>% mutate(parent_id = ifelse(type == "post",paste("post-",min(post_number), sep=""),0))
df <- df %>% mutate(parent_id = ifelse(parent_id == post_id,"NA",parent_id))
df <- df %>% select(-comment_number, -post_number)

使用该代码,我可以执行步骤1、2和3 ,但第4步超出了我的范围。我觉得某种类型的条件滞后应该可以解决,但无法提出解决方法。

With that code I am able to perform Steps 1, 2 and 3, but step 4 is beyond me. I get the feeling that a certain type of conditional lagging based should be able to solve it, but can't come up with how to do it.

任何想法都将不胜感激!

Any ideas would be very much appreciated!

推荐答案

以您的解决方案为基础,

Building on your solution,

x <- which(df$type == 'post')
z <- which(df$type == 'comment')
df$parent_id[df$parent_id == 0] <- df$post_id[x[sapply(z, function(i) findInterval(i, x))]]
df$parent_id
#[1] "NA"     "post-1" "post-2" "post-2" "post-2" "post-1" "post-3"

这篇关于R-有条件的滞后-如何滞后一定数量的细胞直到满足条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆