合并data.frames导致match.names错误 [英] Merge data.frames cause match.names error

查看:152
本文介绍了合并data.frames导致match.names错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要合并许多data.frames。下面的代码示例重现一个错误。它看起来像一个错误。



此代码工作正常:

  df1<  -  data.frame(v = 1:10,v2 = rev(1:10))
df2< - data.frame(vv = 1:8,v2 = rev(5:12))
df12< - merge(x = df1,y = df2,by.x = 1,by.y = 1,all = TRUE,suffixes = c(x,.y))
df3< - data.frame(w = 2:6,v2 = 3:7)
df123< - merge(x = df12,y = df3,by.x = 1,by.y = 1,all = TRUE,后缀= c(。x,.y))
df4< - data.frame(x = 1:6,v2 = 1:6)
df1234< - merge(x = df123,y = df4,by.x = 1,by.y = 1,all = TRUE,suffixes = c(x,.y))

此代码在最后一行产生错误消息:match.names中的错误(clabs,names(xi)):names不符合以前的名字。唯一的变化是nrow(df4)> nrow(df123)

  df1<  -  data.frame(v = 1: 10,v2 = rev(1:10))
df2< - data.frame(vv = 1:8,v2 = rev(5:12))
df12 < df1,y = df2,by.x = 1,by.y = 1,all = TRUE,suffixes = c(x,.y))
df3< - data.frame = 2:6,v2 = 3:7)
df123< - merge(x = df12,y = df3,by.x = 1,by.y = 1,all = TRUE,suffixes = c .x,.y))
df4< - data.frame(x = 1:16,v2 = 1:16)
df1234< - merge(x = df123,y = df4,by.x = 1,by.y = 1,all = TRUE,suffixes = c(。x,.y))

我们来看看df123的列的名称

  name(df123)
[1]vv2.xv2.yv2

然后更改任意一个的姓氏

  names(df123)[4]<  - v3

现在这行代码将正常工作

  df1234<  -  merge(x = df123,y = df4,by.x = 1,by.y = 1,all = TRUE,suffixes = c(x,.y )

是bug吗?我在Win7上使用了R 2.13.1。
如果您需要一些其他信息,我会将其添加到该问题。

解决方案

这绝对是一个错误我在Windows 7中的R 2.14.1中进行了测试,但我怀疑操作系统的重要性。我在这里重新创建了一个较小的测试用例:

 #创建数据。 
df1 = data.frame(rbind(c(1,10,12,NA)))
df2 = data.frame(rbind(c(11,11)))

#工作正常。
merge(df1,df2,by = 1,all = T)

#X1 X2.x X3 X4 X2.y
#1 1 10 12 NA NA
#2 11 NA NA NA 11

#更改列的名称。
名称(df1)= c('v','v2.x','v2.y','v2')
名称(df2)= c('x','v2'

#相同的数据失败!
merge(df1,df2,by = 1,all = T)

#match.names中的错误(clabs,names(xi)):
#名称不匹配以前的名称

该错误发生在merge.data.frame方法中,在此行上: / p>

  x<  -  rbind(x,ya)

问题是x和ya不共享相同的列名。这个问题出现在这一行上,只是前一行的两行:

  ya < -  cbind(ya,x [rep .int(NA_integer_,nyy),nm.x,drop = FALSE])

nm.x 是一组名称c(v2.x,v2.y,v2.x)。而x是具有两个名称为v2.x的列的数据框架。有趣的是,当您从此data.frame中选择列时,似乎重命名其中一列!

  names(x) 
[1]vv2.xv2.yv2.x
nm.x
[1]v2.xv2.y v2.x
x [,nm.x]
v2.x v2.y v2.x.1
1 10 12 10

我试图通过使用列的位置而不是名称来解决这个问题,但结果名称仍然改变(但是值现在是你想要的)!

  x [,c(2,3,4)] 
v v2.x v2.y v2。 x.1
1 1 10 12 NA

我有将此作为错误发布


I need to merge many data.frames. Below the sample of the code to reproduce an error. It looks like a bug.

This code works well:

df1 <- data.frame(v=1:10, v2=rev(1:10))
df2 <- data.frame(vv=1:8, v2=rev(5:12))
df12 <- merge(x=df1, y=df2, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))
df3 <- data.frame(w=2:6, v2=3:7)
df123 <- merge(x=df12, y=df3, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))
df4 <- data.frame(x=1:6, v2=1:6)
df1234 <- merge(x=df123, y=df4, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))

This code produce the error message on the last line: Error in match.names(clabs, names(xi)) : names do not match previous names. The only change is that nrow(df4) > nrow(df123)

df1 <- data.frame(v=1:10, v2=rev(1:10))
df2 <- data.frame(vv=1:8, v2=rev(5:12))
df12 <- merge(x=df1, y=df2, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))
df3 <- data.frame(w=2:6, v2=3:7)
df123 <- merge(x=df12, y=df3, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))
df4 <- data.frame(x=1:16, v2=1:16)
df1234 <- merge(x=df123, y=df4, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))

Let's see names of columns of df123

names(df123)
[1] "v"    "v2.x" "v2.y" "v2" 

Then change the last name on arbitrary one

names(df123)[4] <- "v3"

And now this line of code will work correctly

df1234 <- merge(x=df123, y=df4, by.x=1, by.y=1, all=TRUE, suffixes=c(".x", ".y"))

Is it bug? I used R 2.13.1 on Win7. If you need some other information, I'll add it to the question.

解决方案

This is definitely a bug, I tested it in R 2.14.1 on Windows 7, but I doubt the operating system matters. I recreated a "smaller" test case of the bug here:

# Create data.
df1=data.frame(rbind(c(1,10,12,NA)))
df2=data.frame(rbind(c(11,11)))

# Works fine.
merge(df1,df2,by=1,all=T)

#   X1 X2.x X3 X4 X2.y
# 1  1   10 12 NA   NA
# 2 11   NA NA NA   11

# Change the names of the columns.
names(df1)= c('v','v2.x','v2.y','v2')
names(df2)= c('x','v2')

# Same data fails!
merge(df1,df2,by=1,all=T)

# Error in match.names(clabs, names(xi)) : 
#   names do not match previous names

The error occurs in the "merge.data.frame" method, on this line:

x <- rbind(x, ya)

The problem is that "x" and "ya" don't share the same column names. That problem occurs on this line, just two lines before the previous one:

ya <- cbind(ya, x[rep.int(NA_integer_, nyy), nm.x, drop = FALSE])

"nm.x" is a set of names c("v2.x","v2.y","v2.x"). and x is a data.frame with two columns with the name 'v2.x'. Interestingly, when you select the columns from this data.frame, it appears to rename one of the columns!

names(x)
[1] "v"    "v2.x" "v2.y" "v2.x"
nm.x
[1] "v2.x" "v2.y" "v2.x"
x[,nm.x]
  v2.x v2.y v2.x.1
1   10   12     10

I tried to solve this by using the position of the column, instead of the name, but the resulting name is still changed (but the values are now what you want)!

x[,c(2,3,4)]
  v v2.x v2.y v2.x.1
1 1   10   12   NA

I have posted this as a bug.

这篇关于合并data.frames导致match.names错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆