PyTables的优点是什么? [英] What is the advantage of PyTables?

查看:66
本文介绍了PyTables的优点是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始学习PyTables,发现它非常有趣.我的问题是:

  • 在庞大的数据集方面,PyTables相对于数据库的基本优势是什么?
  • 此软件包的基本目的是什么(我可以在NumPy和Pandas中进行相同的结构化,所以PyTables有什么大不了的?)
  • 对大型数据集的分析真的有帮助吗?任何人都可以借助任何示例和比较来详细说明吗?

谢谢大家.

解决方案

在庞大的数据集方面,PyTables相对于数据库的基本优势是什么?

实际上,它是数据库.当然,它是一个分层数据库,而不是像dbm这样的1级键值数据库(显然不那么灵活)或像sqlite3这样的关系数据库(功能更强大,但更复杂).

但是,相对于非数字特定数据库的主要优势与numpy ndarray相对于普通Python list的优势完全相同.它经过优化,可以执行许多矢量化的数值运算,因此,如果您正在使用它,那么它将花费更少的时间和空间.

此软件包的基本用途是什么

首页(或者,如果愿意的话, 常见问题解答的第一行:

PyTables是用于管理分层数据集的程序包,旨在高效,轻松地处理大量数据.

还有一个页面,列出了 MainFeatures ,该页面链接在前端顶部附近页面.

(我可以在NumPy和Pandas中进行相同的结构化,所以PyTables有什么大不了的?)

真的吗?您可以在仅16GB RAM的计算机上以numpy或pandas处理64GB数据吗?还是32位计算机?

不,您不能.除非您将数据拆分成一堆单独的集,然后根据需要进行加载,处理和保存,但这将变得更加复杂且更加缓慢.

这就像问为什么只用常规的Python列表和迭代器可以做同样的事情时为什么需要numpy.当您有8个浮点数组时,Pure Python很棒,但当您有10000x10000浮点数组时则不是.当您有几个10000x10000阵列时,numpy很棒,但是当您有十二个互连的阵列,大小不超过20GB时,numpy很棒.

对大型数据集的分析真的有帮助吗?

是的

任何人都可以借助任何示例进行详细说明...

是的.而不是在此处复制所有示例,您为什么不只看文档首页上的简单示例,源代码树中的大量示例,指向实际用例的链接,只需单击两次即可文档的页面等?

如果您想让自己相信PyTables的有用性,请使用任何示例并将其扩展到最大32GB的数据,然后尝试弄清楚如何在numpy或pandas中做同样的事情. /p>

I have recently started learning about PyTables and found it very interesting. My question is:

  • What are the basic advantages of PyTables over database(s) when it comes to huge datasets?
  • What is the basic purpose of this package (I can do same sort of structuring in NumPy and Pandas, so what's the big deal with PyTables)?
  • Is it really helpful in analysis of big datasets? Can anyone elaborate with the help of any example and comparisons?

Thank you all.

解决方案

What are the basic advantages of PyTables over database(s) when it comes to huge datasets?

Effectively, it is a database. Of course it's a hierarchical database rather than a 1-level key-value database like dbm (which are obviously much less flexible) or a relational database like sqlite3 (which are more powerful, but more complicated).

But the main advantage over a non-numerics-specific database is exactly the same as the advantage of, say, a numpy ndarray over a plain Python list. It's optimized for performing lots of vectorized numeric operations, so if that's what you're doing with it, it's going to take less time and space.

What is the basic purpose of this package

Quoting from the first line of the front page (or, if you prefer, the first line of the FAQ):

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.

There's also a page listing the MainFeatures, linked near the top of the front page.

(I can do same sort of structuring in NumPy and Pandas, so what's the big deal with PyTables)?

Really? You can handle 64GB of data in numpy or pandas on a machine with only 16GB of RAM? Or a 32-bit machine?

No, you can't. Unless you split your data up into a bunch of separate sets that you load, process, and save as needed—but that's going to be much more complicated, and much slower.

It's like asking why you need numpy when you can do the same thing with just regular Python list and iterators. Pure Python is great when you have an array of 8 floats, but not when you have a 10000x10000 array of them. And numpy is great when you have a couple of 10000x10000 arrays, but not when you have a dozen interconnected arrays ranging up to 20GB in size.

Is it really helpful in analysis of big datasets?

Yes.

Can anyone elaborate with the help of any example…

Yes. Rather than copying all of the examples here, why don't you just look at the simple examples on the front page of the docs, the slew of examples in the source tree, the links to real-world use cases two clicks from the front page of the docs, etc.?

If you want to convince yourself of the usefulness of PyTables, take any of the examples and scale it up to 32GB worth of data, then try to figure out how you'd do the exact same thing in numpy or pandas.

这篇关于PyTables的优点是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆