查找保留排序的pandas DataFrame的所有排列的快速方法? [英] Quick way to find all permutations of a pandas DataFrame that preserves a sort?

查看:104
本文介绍了查找保留排序的pandas DataFrame的所有排列的快速方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,我想找到其中的所有排列,它们对其中一列进行简单的升序排序. (有很多联系.)例如,在下面的DataFrame

I have a DataFrame, and I'd like find all the permutations of it that fulfill a simple ascending sort on one of the columns. (There are many ties.) For example, in the following DataFrame

df = pd.DataFrame({'name': ["Abe", "Bob", "Chris", "David", "Evan"], 
                   'age': [28, 20, 21, 22, 21]})

我想按年龄排序并获得订单["Bob", "Chris", "Evan", "David", "Abe"]["Bob", "Evan", "Chris", "David", "Abe"].

I'd be looking to sort by age and obtain the orders ["Bob", "Chris", "Evan", "David", "Abe"] and ["Bob", "Evan", "Chris", "David", "Abe"].

我是python(和熊猫)的新手,并且好奇是否有一种我看不到的简单方法来做到这一点.

I'm new to python (and to pandas) and curious if there is a simple way to do this that I don't see.

谢谢!

推荐答案

由于您是按年龄分组的,因此,请返回每个组的所有排列,然后取乘积(使用itertools的乘积和排列函数) :

Since you're grouping by age, let's do that and return all the permutations for each group and then take the product (using itertools' product and permutation functions):

In [11]: age = df.groupby("age")

如果我们查看单个组的排列:

If we look at the permutations of a single group:

In [12]: age.get_group(21)
Out[12]:
   age   name
2   21  Chris
4   21   Evan

In [13]: list(permutations(age.get_group(21).index))
Out[13]: [(2, 4), (4, 2)]

In [14]: [df.loc[list(p)] for p in permutations(age.get_group(21).index)]
Out[14]:
[   age   name
 2   21  Chris
 4   21   Evan,    age   name
 4   21   Evan
 2   21  Chris]

我们可以通过仅返回每个组的索引来在整个DataFrame上执行此操作(这假设索引是唯一的,如果在执行此操作之前不是reset_index……您可以能够在较低级别上做一些事情):

We can do this on the entire DataFrame by returning just the index for each group (this assumes that the index is unique, if it's not reset_index prior to doing this... you may be able to do something slightly more lower level):

In [21]: [list(permutations(grp.index)) for (name, grp) in age]
Out[21]: [[(1,)], [(2, 4), (4, 2)], [(3,)], [(0,)]]

In [22]: list(product(*[(permutations(grp.index)) for (name, grp) in age]))
Out[22]: [((1,), (2, 4), (3,), (0,)), ((1,), (4, 2), (3,), (0,))]

我们可以将它们加起来:

We can glue these up with sum:

In [23]: [sum(tups, ()) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[23]: [(1, 2, 4, 3, 0), (1, 4, 2, 3, 0)]

如果将这些列为列表,则可以应用loc(这将为您提供所需的结果):

If you make these a list you can apply loc (which gets you the desired result):

In [24]: [df.loc[list(sum(tups, ()))] for tups in product(*[list(permutations(grp.index)) for (name, grp) in age])]
Out[24]:
[   age   name
 1   20    Bob
 2   21  Chris
 4   21   Evan
 3   22  David
 0   28    Abe,    age   name
 1   20    Bob
 4   21   Evan
 2   21  Chris
 3   22  David
 0   28    Abe]

以及名称"列(的列表):

And the (list of) the name column:

In [25]: [list(df.loc[list(sum(tups, ())), "name"]) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[25]:
[['Bob', 'Chris', 'Evan', 'David', 'Abe'],
 ['Bob', 'Evan', 'Chris', 'David', 'Abe']]


注意:使用 numpy置换矩阵pd.tools.util.cartesian_product .我怀疑这太多了,除非它变得非常慢(除非它可能会变慢,因为可能会有很多排列),否则它不会进行探索.


Note: It may be faster to use a numpy permutation matrix and pd.tools.util.cartesian_product. I suspect it's much of a muchness and wouldn't explore this unless this was unusably slow (it's potentially going to be slow anyway because there could be many many permutations)...

这篇关于查找保留排序的pandas DataFrame的所有排列的快速方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆