删除R中的冗余列 [英] Delete Redundant columns in R
问题描述
我有类似的东西:
date pgm in.x logs out.y
20130514 na 12 j1 12
20131204 z2 03 j1 03
20130516 a01 04 j0 04
20130628 z1 05 j2 05
我注意到in和out值始终相同,因此我想删除out.y列.而且我还有其他类似的列,我希望能够检测到任何与.x列匹配的.y列,并在合并后将其删除.
I noticed that the in and out values are always the same so I want to delete the out.y column. And I have other columns like this I want to be able to detect any .y columns that match .x columns and delete them after I do the merge.
推荐答案
如果我们假定所有列冗余都应删除
If we assume all column redundancies should be removed
no_duplicate <- data_set[!duplicated(as.list(data_set))]
可以解决问题.
as.list
会将data.frame转换为其所有列的列表,而duplicated
将返回具有所有值的那些列的索引,这些索引与以前看到的列重复.
as.list
will convert the data.frame to a list of all its columns, and duplicated
will return indices for those columns that have all values as a duplicate of a previously seen column.
这并不直接尝试比较.x和.y列,但是具有为每个重复的列保留一个副本的效果,我认为这是主要目标.另一方面,它也会删除与另一个.x列重复的所有.x列.
This does not directly try to compare .x and .y columns, but has the effect of retaining one copy of each duplicated column, which I assume is the main goal. On the other hand, it will also remove any .x columns that are duplicates of another .x column.
如果我们要保留所有.x列,即使是重复的列,那么一个好的解决方案可能是在合并之前 进行过滤.假设您有data_x
和data_y
将被列标识符"合并:
If we want to retain all .x columns, even those that are duplicates, a good solution might be to do filtering before the merge. Assuming you have data_x
and data_y
that will be merged by column "identifier":
data_y_nonredundant <- data_y[!(as.list(data_y) %in% as.list(data_x) & names(data_y)!="identifier")]
data <- merge(data_x, data_y_nonredundant, by=c("identifier"))
这篇关于删除R中的冗余列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!