删除R中的冗余列 [英] Delete Redundant columns in R

查看:181
本文介绍了删除R中的冗余列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有类似的东西:

date        pgm      in.x     logs       out.y
20130514    na       12       j1         12
20131204    z2       03       j1         03
20130516    a01      04       j0         04
20130628    z1       05       j2         05

我注意到in和out值始终相同,因此我想删除out.y列.而且我还有其他类似的列,我希望能够检测到任何与.x列匹配的.y列,并在合并后将其删除.

I noticed that the in and out values are always the same so I want to delete the out.y column. And I have other columns like this I want to be able to detect any .y columns that match .x columns and delete them after I do the merge.

推荐答案

如果我们假定所有列冗余都应删除

If we assume all column redundancies should be removed

no_duplicate <- data_set[!duplicated(as.list(data_set))]

可以解决问题.

as.list会将data.frame转换为其所有列的列表,而duplicated将返回具有所有值的那些列的索引,这些索引与以前看到的列重复.

as.list will convert the data.frame to a list of all its columns, and duplicated will return indices for those columns that have all values as a duplicate of a previously seen column.

这并不直接尝试比较.x和.y列,但是具有为每个重复的列保留一个副本的效果,我认为这是主要目标.另一方面,它也会删除与另一个.x列重复的所有.x列.

This does not directly try to compare .x and .y columns, but has the effect of retaining one copy of each duplicated column, which I assume is the main goal. On the other hand, it will also remove any .x columns that are duplicates of another .x column.

如果我们要保留所有.x列,即使是重复的列,那么一个好的解决方案可能是在合并之前 进行过滤.假设您有data_xdata_y将被列标识符"合并:

If we want to retain all .x columns, even those that are duplicates, a good solution might be to do filtering before the merge. Assuming you have data_x and data_y that will be merged by column "identifier":

data_y_nonredundant <- data_y[!(as.list(data_y) %in% as.list(data_x) & names(data_y)!="identifier")]
data <- merge(data_x, data_y_nonredundant, by=c("identifier"))

这篇关于删除R中的冗余列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆