将某行的值与data.table中的所有先前行进行比较 [英] Comparing value of a certain row with all previous rows in data.table

查看:87
本文介绍了将某行的值与data.table中的所有先前行进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中包含涉及某种产品类别的公司. 数据集如下:

I'm having a dataset containing firms involving in a certain category of products. Dataset looks like this:

df <- data.table(year=c(1979,1979,1980,1980,1980,1981,1981,1982,1982,1982,1982),
                 category = c("A","A","B","C","A","D","C","F","F","A","B"))

我要创建一个新变量,如下所示: 如果某公司进入一个新类别,而该公司以前没有上一年度(不是同一年),那么该条目将被标记为新",否则将被标记为旧" .

I want to create a new variable as follows: If a firm enters into a new category that it has not been previously engaged in previous years (not the same year), then that entry is labeld as "NEW", otherwise it will be labeld as "OLD".

这样,期望的结果将是:

As such, the desired outcome will be:

    year   category   Newness
 1: 1979        A     NEW
 2: 1979        A     NEW
 3: 1980        B     NEW
 4: 1980        C     NEW
 5: 1980        A     OLD
 6: 1981        D     NEW
 7: 1981        C     OLD
 8: 1982        F     NEW
 9: 1982        F     NEW
10: 1982        A     OLD
11: 1982        B     OLD

我倾向于使用data.table,因为我有超过150万个观测值,并且希望能够通过按公司ID分组来复制解决方案.

I'm inclined to use data.table as I have over 1.5 million observations, and want to be able to replicate the solution by grouping by firm IDs.

我们将不胜感激,在此先感谢您.

Any help would be greatly appreciated, and thank you in advance.

推荐答案

我们可以为每个category分配第一年为"NEW".

We can assign the first year as "NEW" for each category.

library(data.table)
df[, Newness := c("NEW", "OLD")[(match(year, unique(year)) > 1) + 1], category]
df

#    year category Newness
# 1: 1979        A     NEW
# 2: 1979        A     NEW
# 3: 1980        B     NEW
# 4: 1980        C     NEW
# 5: 1980        A     OLD
# 6: 1981        D     NEW
# 7: 1981        C     OLD
# 8: 1982        F     NEW
# 9: 1982        F     NEW
#10: 1982        A     OLD
#11: 1982        B     OLD

类似地,在dplyr中,可以这样写:

Similarly, in dplyr this can be written as :

library(dplyr)
df %>%
  group_by(category) %>%
  mutate(Newness =  c("NEW", "OLD")[(match(year, unique(year)) > 1) + 1])

这篇关于将某行的值与data.table中的所有先前行进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆