numpy.ndarray与pandas.DataFrame [英] numpy.ndarray vs pandas.DataFrame

查看:187
本文介绍了numpy.ndarray与pandas.DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对程序中选择包含统计数据帧的数据结构的基础做出战略决策.

I need to make a strategic decision about choice of the basis for data structure holding statistical data frames in my program.

我在一张大桌子上存储了数十万条记录.每个字段将具有不同的类型,包括短字符串.我将对实时快速执行的数据进行多元回归分析和处理.我还需要使用相对流行并且得到良好支持的东西.

I store hundreds of thousands of records in one big table. Each field would be of a different type, including short strings. I'd perform multiple regression analysis and manipulations on the data that need to be done quick, in real time. I also need to use something, that is relatively popular and well supported.

我了解以下参赛者:

那是最基本的事情.不幸的是,它不支持字符串.而且无论如何我都需要使用numpy作为其统计部分,因此这一点毫无疑问.

That is the most basic thing to do. Unfortunately it doesn't support strings. And I need to use numpy anyway for its statistical part, so this one is out of question.

ndarray能够在每一列中保存不同类型的数组(例如np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])).看来是天生的赢家,但是...

The ndarray has ability to hold arrays of different types in each column (e.g. np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])). It seems a natural winner, but...

这是建立在统计用途的基础上的,但是它足够有效吗?

This one is built with statistical use in mind, but is it efficient enough?

我读到,pandas.DataFrame不再是基于 > (尽管它共享相同的接口).任何人都可以阐明它吗?还是可能有更好的数据结构?

I read, that the pandas.DataFrame is no longer based on the numpy.ndarray (although it shares the same interface). Can anyone shed some light on it? Or maybe there is an even better data structure out there?

推荐答案

pandas.DataFrame很棒,并且可以与许多numpy很好地交互. DataFrame的大部分内容都是用Cython编写的,并且经过了优化.我怀疑Pandas API的易用性和丰富性会大大超过通过在numpy上滚动自己的接口所能获得的任何潜在好处.

pandas.DataFrame is awesome, and interacts very well with much of numpy. Much of the DataFrame is written in Cython and is quite optimized. I suspect the ease of use and the richness of the Pandas API will greatly outweigh any potential benefit you could obtain by rolling your own interfaces around numpy.

这篇关于numpy.ndarray与pandas.DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆