为什么rbindlist不尊重列名? [英] Why does rbindlist not respect column names?
问题描述
我刚刚发现这个错误,只是发现有些人称之为特征。这使 rbindlist
不像 do.call(rbind,l)
as rbind
将尊重列名。此外,在文档中没有提到这个完全意想不到的行为。这是真的有意吗?
代码示例:
library(data.table)
> DT1 < - data.table(a = 1,b = 2)
> DT2 < - data.table(b = 3,a = 4)
> DT1
a b
1:1 2
> DT2
ba
1:3 4
我期望 rbind
'这些将产生a = 1,4; b = 2,3。使用 rbind.data.table
和 rbind.data.frame
,尽管 rbind .data.table
产生警告。
rbind(DT1,DT2)
ab
1:1 2
2:4 3
警告消息:
在data.table ::。rbind.data.table ...):
参数2具有不同顺序的名称。列将由名称绑定以与base的一致性。您可以删除名称(通过使用未命名的列表),然后列将按位置连接,或设置use.names = FALSE。或者,将use.names显式设置为TRUE将删除此警告。
> rbind(as.data.frame(DT1),as.data.frame(DT2))
a b
1 1 2
2 4 3
> do.call('rbind',list(DT1,DT2))
ab
1:1 2
2:4 3
警告消息:
在data.table ::。rbind.data.table(...):
参数2的名称以不同的顺序。列将由名称绑定以与base的一致性。您可以删除名称(通过使用未命名的列表),然后列将按位置连接,或设置use.names = FALSE。或者,将use.names显式设置为TRUE将删除此警告。
rbindlist
损坏数据:
> rbindlist(list(DT1,DT2))
ab
1:1 2
2:3 4
此功能现在在 commit 1266 of v1.9.3 。从新闻: / h3>
o'rbindlist'gains'use.names'和'fill'参数,现在实现
完全在C.关闭#5249
- > use.names默认情况下为FALSE以实现向后兼容性(默认情况下不绑定
名称)
- > rbind(...)现在只是在内部调用rbindlist(),除了'use.names'
默认为TRUE,为了兼容base(和向后兼容性)。
- >填充默认值为FALSE。如果fill是TRUE,则use.names必须为TRUE。
- >输入列表的至少一个项必须具有非空列名。
- >重复的列按出现的顺序绑定,如base。
- >可能存在于单个项目中的属性将在绑定结果中丢失。
- >如果/如果可能,列强制为最高SEXPTYPE,如果它们不同。
- >和令人难以置信的快;)。
- >文档更新了很多。关闭DR#5158。
o'rbindlist'gains'use.names'和'fill'参数,现在实现
完全在C.关闭#5249
- > use.names默认情况下为FALSE以实现向后兼容性(默认情况下不绑定
名称)
- > rbind(...)现在只是在内部调用rbindlist(),除了'use.names'
默认为TRUE,为了兼容base(和向后兼容性)。
- >填充默认值为FALSE。如果fill是TRUE,则use.names必须为TRUE。
- >输入列表的至少一个项必须具有非空列名。
- >重复的列按出现的顺序绑定,如base。
- >可能存在于单个项目中的属性将在绑定结果中丢失。
- >如果/如果可能,列强制为最高SEXPTYPE,如果它们不同。
- >和令人难以置信的快;)。
- >文档更新了很多。关闭DR#5158。
有了这个,你可以设置 use.names = TRUE
以名称绑定。为了向后兼容,默认设置为 FALSE
。或者,您可以使用 rbind(..)
其中 use.names = TRUE
p>
示例:
1)只需设置 use.names = TRUE
DT1 DT2
rbindlist(list(DT1,DT2),use.names = TRUE,fill = FALSE)
#xy
#1:1 2
# 2:2 1
DT1< - data.table(x = 1,y = 2)
DT2
#当fill = FALSE时返回错误,但不能绑定无fill = TRUE
rbindlist(list(DT1,DT2),use.names = TRUE,fill = FALSE)
#错误在rbindlist(列表(DT1,DT2),use.names = TRUE,填充= FALSE):
#答案需要3列,而输入
#列表中的一个或多个项目只有2列。 ...
按照出现顺序:
DT1 DT2 < - data.table(y = -10,x = -2,y = -20,x = -1,y = -30)
rbindlist(list(DT1,DT2),use.names = TRUE)
#xxyyy
#1:1 2 10 20 30
#2:-2 - 1 -10 -20 -30
fill = TRUE
如果要通过名称绑定并填充缺少的列
DT1 DT2
rbindlist DT1,DT2),fill = TRUE)
#xyz
#1:1 2 NA
#2:NA 2 -1
HTH
I just discovered this bug, only to find that some people are calling it a "feature". This makes rbindlist
NOT like do.call("rbind",l)
as rbind
WILL respect column names. Further, there is no mention of this entirely unexpected behavior in the documentation. Is this really intentional?
Code example:
> library(data.table)
> DT1 <- data.table(a=1, b=2)
> DT2 <- data.table(b=3, a=4)
> DT1
a b
1: 1 2
> DT2
b a
1: 3 4
I would expect that rbind
'ing these would produce columns with a = 1,4 ; b = 2,3. And get that with rbind.data.table
and rbind.data.frame
, though rbind.data.table
produces warnings.
> rbind(DT1, DT2)
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
> rbind(as.data.frame(DT1), as.data.frame(DT2))
a b
1 1 2
2 4 3
> do.call('rbind', list(DT1, DT2))
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
rbindlist
, however, is happy to silently corrupt the data:
> rbindlist(list(DT1, DT2))
a b
1: 1 2
2: 3 4
This feature is now implemented in commit 1266 of v1.9.3. From NEWS:
o 'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented
entirely in C. Closes #5249
-> use.names by default is FALSE for backwards compatibility (doesn't bind by
names by default)
-> rbind(...) now just calls rbindlist() internally, except that 'use.names'
is TRUE by default, for compatibility with base (and backwards compatibility).
-> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
-> At least one item of the input list has to have non-null column names.
-> Duplicate columns are bound in the order of occurrence, like base.
-> Attributes that might exist in individual items would be lost in the bound result.
-> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
-> And incredibly fast ;).
-> Documentation updated in much detail. Closes DR #5158.
With this, you can set use.names=TRUE
to bind by names. It's set to FALSE
by default for backwards compatibility. Alternatively, you can use rbind(..)
where use.names=TRUE
, again for backwards compatibility.
See this post for more examples and this post for benchmarks.
Examples:
1) Just set use.names=TRUE
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)
rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
# x y
# 1: 1 2
# 2: 2 1
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(z=2, y=1)
# returns error when fill=FALSE but can't be bound without fill=TRUE
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) :
# Answer requires 3 columns whereas one or more item(s) in the input
# list has only 2 columns. ...
2) Also binds duplicate column names in the order of occurrence:
DT1 <- data.table(x=1, x=2, y=10, y=20, y=30)
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30)
rbindlist(list(DT1,DT2), use.names=TRUE)
# x x y y y
# 1: 1 2 10 20 30
# 2: -2 -1 -10 -20 -30
3) use fill=TRUE
if you want to bind by names and fill missing columns
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=2, z=-1)
rbindlist(list(DT1, DT2), fill=TRUE)
# x y z
# 1: 1 2 NA
# 2: NA 2 -1
HTH
这篇关于为什么rbindlist不尊重列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!