什么时候plyr比data.table好? [英] when is plyr better than data.table?
问题描述
更好的意思是更快或更容易阅读/更短的语法,或者也可能意味着该命令在 data.table
中甚至不可行。
Better here can mean faster or easier to read/shorter syntax or it could also mean that the command is not even doable in data.table
.
我不使用 plyr
很多,并想知道是否有情况下我应该。因为我不使用它很多,我可以想出的唯一例子是 rbind.fill
,据我所知没有一个 data.table
模拟和所有其他例子我看到smth在 plyr
和 data.table
,后者更快,更容易阅读/更紧凑。
I don't use plyr
a lot and would like to know if there are cases when I should. Because I don't use it a lot, the only example I can come up with is rbind.fill
that to my knowledge doesn't have a data.table
analog and every other example I've seen of smth being done in both plyr
and data.table
, the latter was faster and easier to read/more compact.
推荐答案
目的。
这里是每个包的简要摘要,从包本身:
Here is the brief summary of each package, from the packages themselves:
plyr包是一组干净且一致的工具,实现R中的拆分 - 应用 - 组合模式。数据分析中的常见模式:通过将其分解成小块,对每个块执行某些操作,然后将结果重新组合在一起,可以解决复杂的问题。
The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.
和
data.table
...以短且灵活的语法提供快速子集,快速分组,快速更新,快速排序连接和列表列,以加快开发速度。它受到A中的A [B]语法的启发,其中A是矩阵,B是2列矩阵。
data.table
... offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix.
它们重叠的位置在快速分组中,plyr也通过分割data.frames,转换为单个数据帧。 data.table
有许多其他功能,使对data.frame类结构的操作更快; plyr
具有将拆分 - 应用 - 组合范例应用于其他数据结构(例如列表和数组(作为输入和输出))的功能。
Where they overlap is in the "fast grouping" which plyr also does by splitting data.frames, operating on pieces, and recombining them into a single data.frame. data.table
has many other features which make operations on data.frame like structures fast; plyr
has features which apply the split-apply-combine paradigm to other data structures such as lists and arrays (both as inputs and outputs).
所以,真的,他们是两个不同的工具,碰巧有一个小区域的重叠,解决同一个问题域,但每一个做得更多,如果你想/需要额外的功能,然后应使用该包。
So, really, they are two different tools that happen to have a small area of overlap which address the same problem domain, but each does much more than that and if you want/need that additional functionality, then that package should be used.
这篇关于什么时候plyr比data.table好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!