pandas 高效的数据框集行 [英] pandas efficient dataframe set row

查看:232
本文介绍了 pandas 高效的数据框集行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先我预先分配了以下空的DataFrame:

  df = DataFrame(columns = range(10000),index = range (1000))

然后我想更新 df 逐行(高效),长度为10000的numpy数组为数据。我的问题是:我甚至不知道我应该用什么方法来完成这个任务。



谢谢!

解决方案

以下是3种方法,只有100列,1000行

 在[5]中:row = np.random.randn(100)

p>

 在[6]中:def method1():
...:df = DataFrame(columns = range(100) ,index = range(1000))
...:for x in xrange(len(df)):
...:df.iloc [i] = row
... :return df
...:

在列表中构建数组,创建框架一次

 在[9]中:def method2():
...:返回DataFrame([ row for i in range(1000)])
...:

在两端都有转置)

 在[13]中:def method3():
....:df = DataFrame(columns = range( 100),index = range(1000))。T
....:对于我在xrange(1000)中:
....:df [i] =行
.. ..:return df.T
....:

这些都有相同的输出框架

 在[22]中:(method2()== method1())。all()。all b $ b Out [22]:True 

在[23]中:(method2()== method3())。all()。all()
输出[23]:True


在[8]中:%timeit method1()
1循环,最好3:1.76 s每循环

在[10]中: %timeit method2()
1000循环,最好3:7.79 ms每循环

在[14]:%timeit method3()
1循环,最好的3:1.33每循环

CLEAR建立一个列表,那么一次创建框架是一个订单比做任何形式的任务更快。作业涉及复印。一次完成复制一次。


First I have the following empty DataFrame preallocated:

df=DataFrame(columns=range(10000),index=range(1000))

Then I want to update the df row by row (efficiently) with a length-10000 numpy array as data. My problem is: I don't even have an idea what method of DataFrame I should use to accomplish this task.

Thank you!

解决方案

Here's 3 methods, only 100 columns, 1000 rows

In [5]: row = np.random.randn(100)

Row wise assignment

In [6]: def method1():
   ...:     df = DataFrame(columns=range(100),index=range(1000))
   ...:     for i in xrange(len(df)):
   ...:         df.iloc[i] = row
   ...:     return df
   ...: 

Build up the arrays in a list, create the frame all at once

In [9]: def method2():
   ...:     return DataFrame([ row for i in range(1000) ])
   ...: 

Columnwise assignment (with transposes at both ends)

In [13]: def method3():
   ....:     df = DataFrame(columns=range(100),index=range(1000)).T
   ....:     for i in xrange(1000):
   ....:         df[i] = row
   ....:     return df.T
   ....: 

These all have the same output frame

In [22]: (method2() == method1()).all().all()
Out[22]: True

In [23]: (method2() == method3()).all().all()
Out[23]: True


In [8]: %timeit method1()
1 loops, best of 3: 1.76 s per loop

In [10]: %timeit method2()
1000 loops, best of 3: 7.79 ms per loop

In [14]: %timeit method3()
1 loops, best of 3: 1.33 s per loop

It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.

这篇关于 pandas 高效的数据框集行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆