替换R数据帧中除1行以外的重复值 [英] Replacing the duplicate values except 1 row in R dataframe
问题描述
如何根据ID将R数据帧的特定列的重复值替换为NA(第一行除外)。举一个例子:
How can I replace the duplicated values of a specific column of R dataframe to NA (except the first row) based on ID. To give an example:
x <- data.frame(id=c("p1","p1","p1","p2","p2"),date=c("d1","d1","d1","d2","d2"))
并应导致以下结果:
x2 <- data.frame(id=c("p1","p1","p1","p2","p2"),date=c("d1","NA","NA","d2","NA"))
我必须维护每个id的多行数据结构,根本不用希望将日期值重复一次。
I have to maintain the data structure of multiple rows per id, simply do not want the date values to be repeated but once.
谢谢
推荐答案
选项1:基本的R方法是使用 ave()
将重复的 date
值替换为 NA
,每个 id
。
Option 1: A base R method is to use ave()
replace the duplicated date
values with NA
for each group in id
.
x$date <- ave(
x$date,
x$id,
FUN = function(a) replace(a, duplicated(a), NA_integer_)
)
给出更新的 x
数据
id date
1 p1 d1
2 p1 <NA>
3 p1 <NA>
4 p2 d2
5 p2 <NA>
上述方法适用于<$ c $中的多个值c> date ,将重复项替换为 NA
。如果仅是第一个组值,则可以使用上面或下面的代码,这可能会更快。
The method above will work for multiple values in date
, replacing duplicates with NA
. If it's only the first group value you're after, you could use the code above or the following, which may be faster.
ave(
x$date,
x$id,
FUN = function(a) c(a[1], a[-1][NA])
)
此代码获取每个组中的第一个值,并用 NA
。尚不清楚您要哪个,因为示例数据每个 id
组只有一个值。
This code takes the first value in each group and replaces all the rest with NA
. It's not clear which one you want, since your example data only has one value per id
group.
选项2:使用 data.table 包的替代方法。由于 NA
是合乎逻辑的,因此 date [NA]
只是将值转换为 NA
,而不更改数据类型。
Option 2: An alternative using the data.table package. Since NA
is logical, date[NA]
simply turns values into NA
without changing the type of the data.
library(data.table)
setDT(x)[duplicated(date), date := date[NA], by = id]
p>
which gives
id date
1: p1 d1
2: p1 NA
3: p1 NA
4: p2 d2
5: p2 NA
这篇关于替换R数据帧中除1行以外的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!