删除几列中的重复值,但保留行 [英] Remove duplicate values across a few columns but keep rows
问题描述
我有一个看起来像这样的数据框:
I have a dataframe that looks like this:
dat <- data.frame(id=1:6,
z_1=c(100,290,38,129,0,290),
z_2=c(20,0,0,0,0,290),
z_3=c(0,0,38,0,0,98),
z_4=c(0,0,38,127,38,78),
z_5=c(23,0,25,0,0,98),
z_6=c(100,0,25,127,0,9))
日期
id z_1 z_2 z_3 z_4 z_5 z_6
1 1 100 20 0 0 23 100
2 2 290 0 0 0 0 0
3 3 38 0 38 38 25 25
4 4 129 0 0 127 0 127
5 5 0 0 0 38 0 0
6 6 290 290 98 78 98 9
我想删除每行中 z_x
的重复值,用 0
或 NA
替换所有重复项,但保留行&完整的列(即不删除任何列).这里的0不算作重复项,它们是缺失值.列中的重复值是可以的.我理想的输出如下所示:
I want to remove duplicate values of z_x
across each row, replacing any duplicates with either a 0
or NA
, but leaving the rows & columns intact (ie not dropping any). The 0s here do not count as duplicates, they are missing values. Duplicate values within a column are ok. My ideal output would look like this:
id z_1 z_2 z_3 z_4 z_5 z_6
1 1 100 20 0 0 23 0
2 2 290 0 0 0 0 0
3 3 38 0 0 0 25 0
4 4 129 0 0 127 0 0
5 5 0 0 0 38 0 0
6 6 290 0 98 78 0 9
我真的不在乎 z_x
中的值是以什么顺序出现的,因此,如果它们四处移动,也很好.有没有一种有效的方式来做到这一点,最好是采用某种整洁的方式?我知道我可以旋转更长的时间并删除重复的行,但是我的数据集非常大,我正在寻找一种无需旋转的方法.
I don't really care what order the values within the z_x
s appear in, so it's fine if they get moved around. Is there an efficient way to do this, preferably in some tidyverse way? I know I can pivot longer and drop duplicate rows, but my dataset is very large and I'm looking for a way to do this without pivoting.
推荐答案
使用 apply
的Base R方法:
Base R way using apply
:
cols <- grep('z_\\d+', names(dat))
dat[cols] <- t(apply(dat[cols], 1, function(x) replace(x, duplicated(x), 0)))
# id z_1 z_2 z_3 z_4 z_5 z_6
#1 1 100 20 0 0 23 0
#2 2 290 0 0 0 0 0
#3 3 38 0 0 0 25 0
#4 4 129 0 0 127 0 0
#5 5 0 0 0 38 0 0
#6 6 290 0 98 78 0 9
无需重塑的
tidyverse
方法可以使用 pmap
来完成:
tidyverse
way without reshaping can be done using pmap
:
library(tidyverse)
dat %>%
mutate(result = pmap(select(., matches('z_\\d+')), ~{
x <- c(...)
replace(x, duplicated(x), 0)
})) %>%
select(id, result) %>%
unnest_wider(result)
由于@thelatemail执行的测试表明,重塑是比按行处理数据更好的选择.
Since tests performed by @thelatemail suggests reshaping is a better option than handling the data rowwise you might want to consider it.
dat %>%
pivot_longer(cols = matches('z_\\d+')) %>%
group_by(id) %>%
mutate(value = replace(value, duplicated(value), 0)) %>%
pivot_wider()
这篇关于删除几列中的重复值,但保留行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!