反转Pandas DataFrame中列顺序的最大O复杂性是什么? [英] What is the Big O Complexity of Reversing the Order of Columns in Pandas DataFrame?

查看:136
本文介绍了反转Pandas DataFrame中列顺序的最大O复杂性是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以可以说我在熊猫中有一个具有m行n列的DataFrame.再说一遍,我想反转列的顺序,这可以通过以下代码完成:

So lets say I have a DataFrame in pandas with a m rows and n columns. Let's also say that I wanted to reverse the order of the columns, which can be done with the following code:

df_reversed = df[df.columns[::-1]]

此操作的大复杂度是多少?我假设这将取决于列数,但还会取决于行数吗?

What is the Big O complexity of this operation? I'm assuming this would depend on the number of columns, but would it also depend on the number of rows?

推荐答案

我不知道Pandas是如何实现的,但是我确实进行了经验测试.我运行了以下代码(在Jupyter笔记本中)以测试操作的速度:

I don't know how Pandas implements this, but I did test it empirically. I ran the following code (in a Jupyter notebook) to test the speed of the operation:

def get_dummy_df(n):
    return pd.DataFrame({'a': [1,2]*n, 'b': [4,5]*n, 'c': [7,8]*n})

df = get_dummy_df(100)
print df.shape
%timeit df_r = df[df.columns[::-1]]

df = get_dummy_df(1000)
print df.shape
%timeit df_r = df[df.columns[::-1]]

df = get_dummy_df(10000)
print df.shape
%timeit df_r = df[df.columns[::-1]]

df = get_dummy_df(100000)
print df.shape
%timeit df_r = df[df.columns[::-1]]

df = get_dummy_df(1000000)
print df.shape
%timeit df_r = df[df.columns[::-1]]

df = get_dummy_df(10000000)
print df.shape
%timeit df_r = df[df.columns[::-1]]

输出为:

(200, 3)
1000 loops, best of 3: 419 µs per loop
(2000, 3)
1000 loops, best of 3: 425 µs per loop
(20000, 3)
1000 loops, best of 3: 498 µs per loop
(200000, 3)
100 loops, best of 3: 2.66 ms per loop
(2000000, 3)
10 loops, best of 3: 25.2 ms per loop
(20000000, 3)
1 loop, best of 3: 207 ms per loop

如您所见,在前3种情况下,操作的开销是大部分时间(400-500µs),但是从第4种情况开始,所需的时间开始与操作量成正比.数据,每次都增加一个数量级.

As you can see, in the first 3 cases, the overhead of the operation is what takes most of the time (400-500µs), but from the 4th case, the time it takes starts to be proportional to the amount of data, increasing in an order of magnitude each time.

所以,假设 n 也必须有一个比例,看来我们正在处理O(m * n)

So, assuming there must also be a proportion to n, it seems that we are dealing with O(m*n)

这篇关于反转Pandas DataFrame中列顺序的最大O复杂性是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆