删除重复的列? [英] Delete duplicate columns?

查看:56
本文介绍了删除重复的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用数据框架将多个Excel文件整理为一个.文件中有重复的列.是否可以仅合并唯一列?

I am collating multiple excel files into one using data frames. There are duplicate columns in the files. Is it possible to merge only the unique columns?

这是我的代码:

library(rJava)
library (XLConnect)

data.files = list.files(pattern = "*.xls")

# Read the first file
df = readWorksheetFromFile(file=data.files[1], sheet=1, check.names=F) 

# Loop through the remaining files and merge them to the existing data frame
for (file in data.files[-1]) {
newFile = readWorksheetFromFile(file=file, sheet=1, check.names=F)
    df = merge(df, newFile, all = TRUE, check.names=F)
} 

推荐答案

首先,如果正确应用 merge ,则不应有任何重复的列,前提是重复的列也要包含EXCEL文件中的名称完全相同.使用 merge 时,EXCEL文件中至少必须有一列具有完全相同的名称,并包含用于合并它们的值.

First of all, if you apply merge correctly, there shouldn't be any duplicated columns, provided that the duplicated columns also have the exact same name in the EXCEL files. As you use merge, there must be at least one column in the EXCEL files that have the exact same name, and contains the values used to merge them.

因此,我认为您想根据每个列中的值检查结果数据框中是否存在重复的列.为此,您可以使用以下代码:

So I reckon you want to check in the resulting data frame whether there are duplicate columns based on the values in each column. For this, you could use the following:

keepUnique <- function(x){
  combs <- combn(names(x),2)

  dups <- mapply(identical,
                 x[combs[1,]],
                 x[combs[2,]])

  drop <- combs[2,][dups]
  x[ !names(x) %in% drop ]
}

哪个给:

> mydf <- cbind(iris,iris[,3])[1:5,]
> mydf
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species iris[, 3]
1          5.1         3.5          1.4         0.2  setosa       1.4
2          4.9         3.0          1.4         0.2  setosa       1.4
3          4.7         3.2          1.3         0.2  setosa       1.3
4          4.6         3.1          1.5         0.2  setosa       1.5
5          5.0         3.6          1.4         0.2  setosa       1.4
> keepUnique(mydf)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

您可以在读取文件后使用它,即添加行

You can use this after reading in a file, i.e. add the line

newFile <- keepUnique(newFile,df)

使用您自己的代码.

这篇关于删除重复的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆