什么时候plyr比data.table好? [英] when is plyr better than data.table?

查看:121
本文介绍了什么时候plyr比data.table好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更好的意思是更快或更容易阅读/更短的语法,或者也可能意味着该命令在 data.table 中甚至不可行。

Better here can mean faster or easier to read/shorter syntax or it could also mean that the command is not even doable in data.table.

我不使用 plyr 很多,并想知道是否有情况下我应该。因为我不使用它很多,我可以想出的唯一例子是 rbind.fill ,据我所知没有一个 data.table 模拟和所有其他例子我看到smth在 plyr data.table ,后者更快,更容易阅读/更紧凑。

I don't use plyr a lot and would like to know if there are cases when I should. Because I don't use it a lot, the only example I can come up with is rbind.fill that to my knowledge doesn't have a data.table analog and every other example I've seen of smth being done in both plyr and data.table, the latter was faster and easier to read/more compact.

推荐答案

目的。

这里是每个包的简要摘要,从包本身:

Here is the brief summary of each package, from the packages themselves:


plyr包是一组干净且一致的工具,实现R中的拆分 - 应用 - 组合模式。数据分析中的常见模式:通过将其分解成小块,对每个块执行某些操作,然后将结果重新组合在一起,可以解决复杂的问题。

The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.


data.table ...以短且灵活的语法提供快速子集,快速分组,快速更新,快速排序连接和列表列,以加快开发速度。它受到A中的A [B]语法的启发,其中A是矩阵,B是2列矩阵。

data.table ... offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in R where A is a matrix and B is a 2-column matrix.

它们重叠的位置在快速分组中,plyr也通过分割data.frames,转换为单个数据帧。 data.table 有许多其他功能,使对data.frame类结构的操作更快; plyr 具有将拆分 - 应用 - 组合范例应用于其他数据结构(例如列表和数组(作为输入和输出))的功能。

Where they overlap is in the "fast grouping" which plyr also does by splitting data.frames, operating on pieces, and recombining them into a single data.frame. data.table has many other features which make operations on data.frame like structures fast; plyr has features which apply the split-apply-combine paradigm to other data structures such as lists and arrays (both as inputs and outputs).

所以,真的,他们是两个不同的工具,碰巧有一个小区域的重叠,解决同一个问题域,但每一个做得更多,如果你想/需要额外的功能,然后应使用该包。

So, really, they are two different tools that happen to have a small area of overlap which address the same problem domain, but each does much more than that and if you want/need that additional functionality, then that package should be used.

这篇关于什么时候plyr比data.table好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆