将某行的值与data.table中的所有先前行进行比较 [英] Comparing value of a certain row with all previous rows in data.table
问题描述
我有一个数据集,其中包含涉及某种产品类别的公司. 数据集如下:
I'm having a dataset containing firms involving in a certain category of products. Dataset looks like this:
df <- data.table(year=c(1979,1979,1980,1980,1980,1981,1981,1982,1982,1982,1982),
category = c("A","A","B","C","A","D","C","F","F","A","B"))
我要创建一个新变量,如下所示: 如果某公司进入一个新类别,而该公司以前没有上一年度(不是同一年),那么该条目将被标记为新",否则将被标记为旧" .
I want to create a new variable as follows: If a firm enters into a new category that it has not been previously engaged in previous years (not the same year), then that entry is labeld as "NEW", otherwise it will be labeld as "OLD".
这样,期望的结果将是:
As such, the desired outcome will be:
year category Newness
1: 1979 A NEW
2: 1979 A NEW
3: 1980 B NEW
4: 1980 C NEW
5: 1980 A OLD
6: 1981 D NEW
7: 1981 C OLD
8: 1982 F NEW
9: 1982 F NEW
10: 1982 A OLD
11: 1982 B OLD
我倾向于使用data.table,因为我有超过150万个观测值,并且希望能够通过按公司ID分组来复制解决方案.
I'm inclined to use data.table as I have over 1.5 million observations, and want to be able to replicate the solution by grouping by firm IDs.
我们将不胜感激,在此先感谢您.
Any help would be greatly appreciated, and thank you in advance.
推荐答案
我们可以为每个category
分配第一年为"NEW"
.
We can assign the first year as "NEW"
for each category
.
library(data.table)
df[, Newness := c("NEW", "OLD")[(match(year, unique(year)) > 1) + 1], category]
df
# year category Newness
# 1: 1979 A NEW
# 2: 1979 A NEW
# 3: 1980 B NEW
# 4: 1980 C NEW
# 5: 1980 A OLD
# 6: 1981 D NEW
# 7: 1981 C OLD
# 8: 1982 F NEW
# 9: 1982 F NEW
#10: 1982 A OLD
#11: 1982 B OLD
类似地,在dplyr
中,可以这样写:
Similarly, in dplyr
this can be written as :
library(dplyr)
df %>%
group_by(category) %>%
mutate(Newness = c("NEW", "OLD")[(match(year, unique(year)) > 1) + 1])
这篇关于将某行的值与data.table中的所有先前行进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!