反转Pandas DataFrame中列顺序的最大O复杂性是什么? [英] What is the Big O Complexity of Reversing the Order of Columns in Pandas DataFrame?
问题描述
所以可以说我在熊猫中有一个具有m行n列的DataFrame.再说一遍,我想反转列的顺序,这可以通过以下代码完成:
So lets say I have a DataFrame in pandas with a m rows and n columns. Let's also say that I wanted to reverse the order of the columns, which can be done with the following code:
df_reversed = df[df.columns[::-1]]
此操作的大复杂度是多少?我假设这将取决于列数,但还会取决于行数吗?
What is the Big O complexity of this operation? I'm assuming this would depend on the number of columns, but would it also depend on the number of rows?
推荐答案
我不知道Pandas是如何实现的,但是我确实进行了经验测试.我运行了以下代码(在Jupyter笔记本中)以测试操作的速度:
I don't know how Pandas implements this, but I did test it empirically. I ran the following code (in a Jupyter notebook) to test the speed of the operation:
def get_dummy_df(n):
return pd.DataFrame({'a': [1,2]*n, 'b': [4,5]*n, 'c': [7,8]*n})
df = get_dummy_df(100)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(1000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(10000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(100000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(1000000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(10000000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
输出为:
(200, 3)
1000 loops, best of 3: 419 µs per loop
(2000, 3)
1000 loops, best of 3: 425 µs per loop
(20000, 3)
1000 loops, best of 3: 498 µs per loop
(200000, 3)
100 loops, best of 3: 2.66 ms per loop
(2000000, 3)
10 loops, best of 3: 25.2 ms per loop
(20000000, 3)
1 loop, best of 3: 207 ms per loop
如您所见,在前3种情况下,操作的开销是大部分时间(400-500µs),但是从第4种情况开始,所需的时间开始与操作量成正比.数据,每次都增加一个数量级.
As you can see, in the first 3 cases, the overhead of the operation is what takes most of the time (400-500µs), but from the 4th case, the time it takes starts to be proportional to the amount of data, increasing in an order of magnitude each time.
所以,假设 n 也必须有一个比例,看来我们正在处理O(m * n)
So, assuming there must also be a proportion to n, it seems that we are dealing with O(m*n)
这篇关于反转Pandas DataFrame中列顺序的最大O复杂性是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!