pandas 高效的数据框集行 [英] pandas efficient dataframe set row
问题描述
df = DataFrame(columns = range(10000),index = range (1000))
然后我想更新 df
逐行(高效),长度为10000的numpy数组为数据。我的问题是:我甚至不知道我应该用什么方法来完成这个任务。
谢谢!
以下是3种方法,只有100列,1000行
在[5]中:row = np.random.randn(100)
p>
在[6]中:def method1():
...:df = DataFrame(columns = range(100) ,index = range(1000))
...:for x in xrange(len(df)):
...:df.iloc [i] = row
... :return df
...:
在列表中构建数组,创建框架一次
在[9]中:def method2():
...:返回DataFrame([ row for i in range(1000)])
...:
在两端都有转置)
在[13]中:def method3():
....:df = DataFrame(columns = range( 100),index = range(1000))。T
....:对于我在xrange(1000)中:
....:df [i] =行
.. ..:return df.T
....:
这些都有相同的输出框架
在[22]中:(method2()== method1())。all()。all b $ b Out [22]:True
在[23]中:(method2()== method3())。all()。all()
输出[23]:True
在[8]中:%timeit method1()
1循环,最好3:1.76 s每循环
在[10]中: %timeit method2()
1000循环,最好3:7.79 ms每循环
在[14]:%timeit method3()
1循环,最好的3:1.33每循环
CLEAR建立一个列表,那么一次创建框架是一个订单比做任何形式的任务更快。作业涉及复印。一次完成复制一次。
First I have the following empty DataFrame preallocated:
df=DataFrame(columns=range(10000),index=range(1000))
Then I want to update the df
row by row (efficiently) with a length-10000 numpy array as data. My problem is: I don't even have an idea what method of DataFrame I should use to accomplish this task.
Thank you!
Here's 3 methods, only 100 columns, 1000 rows
In [5]: row = np.random.randn(100)
Row wise assignment
In [6]: def method1():
...: df = DataFrame(columns=range(100),index=range(1000))
...: for i in xrange(len(df)):
...: df.iloc[i] = row
...: return df
...:
Build up the arrays in a list, create the frame all at once
In [9]: def method2():
...: return DataFrame([ row for i in range(1000) ])
...:
Columnwise assignment (with transposes at both ends)
In [13]: def method3():
....: df = DataFrame(columns=range(100),index=range(1000)).T
....: for i in xrange(1000):
....: df[i] = row
....: return df.T
....:
These all have the same output frame
In [22]: (method2() == method1()).all().all()
Out[22]: True
In [23]: (method2() == method3()).all().all()
Out[23]: True
In [8]: %timeit method1()
1 loops, best of 3: 1.76 s per loop
In [10]: %timeit method2()
1000 loops, best of 3: 7.79 ms per loop
In [14]: %timeit method3()
1 loops, best of 3: 1.33 s per loop
It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.
这篇关于 pandas 高效的数据框集行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!