data.table和表意外行为 [英] data.table and table unexpected behavior
本文介绍了data.table和表意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
资料来自另一个我正在玩的问题:
The data comes from another question I was playing around with:
dt <- data.table(user=c(rep(3, 5), rep(4, 5)),
country=c(rep(1,4),rep(2,6)),
event=1:10, key="user")
# user country event
#1: 3 1 1
#2: 3 1 2
#3: 3 1 3
#4: 3 1 4
#5: 3 2 5
#6: 4 2 6
#7: 4 2 7
#8: 4 2 8
#9: 4 2 9
#10: 4 2 10
这是令人惊讶的行为: p>
And here's the surprising behavior:
dt[user == 3, as.data.frame(table(country))]
# country Freq
#1 1 4
#2 2 1
dt[user == 4, as.data.frame(table(country))]
# country Freq
#1 2 5
dt[, as.data.frame(table(country)), by = user]
# user country Freq
#1: 3 1 4
#2: 3 2 1
#3: 4 1 5
# ^^^ - why is this 1 instead of 2?!
感谢mnel和Victor K.自然的跟进是 - 不应该是2,这是一个bug?我预期
Thanks mnel and Victor K. The natural follow-up is - shouldn't it be 2, i.e. is this a bug? I expected
dt[, blah, by = user]
返回相同的结果到
rbind(dt[user == 3, blah], dt[user == 4, blah])
/ p>
Is that expectation incorrect?
推荐答案
惯用的data.table方法是使用.N
The idiomatic data.table approach is to use .N
dt[ , .N, by = list(user, country)]
这会更快,它也会保留国家作为与原来相同的类。
This will be far quicker and it will also retain country as the same class as in the original.
这篇关于data.table和表意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文