与data.table，是SD [which.max（Var1）]最快的方式来找到一个组的最大值？ [英] With data.table, is SD[which.max(Var1)] the fastest way to find the max of a group?

查看：99 发布时间：2017/3/12 13:04:33 r data.table

本文介绍了与data.table，是SD [which.max（Var1）]最快的方式来找到一个组的最大值？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  accts<  -  accts [， .SD [which.max（EE）]，by = DnB.Name]

大约350k行的DT，以及一些DnB.Name（Duns和Bradstreet公司名称）是具有不同雇员计数（EE）的重复，我只关心每个的最高数量，并且可以忽略其余。

解决方案

按EE排序，然后使用自连接取每个组的第一行：

 有序< -accts [ order（-EE）] #Dending order 
 setkey（ordered，DnB.Name）#must setkey before join 
 ordered [J（unique（DnB.Name）），mult =first]

有关参考，请参阅这篇关于交叉验证的帖子： http://stats.stackexchange.com / questions / 7884 / fast-ways-in-r-to-get-the-first-row-of-a-data-frame-by-a-identifier

编辑：更快，但奇怪的语法：

  accts [accts [，。 max（EE）]，by = DnB.Name] $ V1]

有一个类似的问题：
按组与data.table的子集

If needed I can put together a dataset, but my question is somewhat general.

accts <- accts[, .SD[which.max(EE)], by=DnB.Name]

I've got a DT of about 350k rows, and some of the DnB.Name's (Duns and Bradstreet Company Name) are duplicates with differing employee counts (EE), I only care about the highest number of each and can disregard the rest.

Anyway, DT is usually lightning quick, so I figure I must be doing something wrong?

解决方案

sort by EE, then take the first row for each group using a self join:

 ordered<-accts[order(-EE)] #Descending order
 setkey(ordered,DnB.Name) #must setkey before join
 ordered[J(unique(DnB.Name)),mult="first"]

For reference, check out this post on crossvalidated: http://stats.stackexchange.com/questions/7884/fast-ways-in-r-to-get-the-first-row-of-a-data-frame-grouped-by-an-identifier

EDIT: even faster, but weird syntax:

accts[accts[, .I[which.max(EE)], by = DnB.Name]$V1]

For reference, check this post with a similar question: Subset by group with data.table

这篇关于与data.table，是SD [which.max（Var1）]最快的方式来找到一个组的最大值？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与data.table，是SD [which.max（Var1）]最快的方式来找到一个组的最大值？ [英] With data.table, is SD[which.max(Var1)] the fastest way to find the max of a group?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

与data.table，是SD [which.max（Var1）]最快的方式来找到一个组的最大值？ [英] With data.table, is SD[which.max(Var1)] the fastest way to find the max of a group?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭