在 Pandas 中混洗/排列 DataFrame [英] shuffling/permutating a DataFrame in pandas

查看：37 发布时间：2021/12/3 9:20:22 python numpy pandas

本文介绍了在 Pandas 中混洗/排列 DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Pandas 中按行或按列打乱数据帧的简单有效的方法是什么?IE.如何编写一个函数 shuffle(df, n,axis=0)，它接受一个数据帧、多次 shuffle n 和一个轴 (axis=0 是行，axis=1 是列)并返回数据帧的副本，该副本已被混洗 n 次.

What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis (axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.

编辑:关键是在不破坏数据框的行/列标签的情况下执行此操作.如果你只是 shuffle df.index 会丢失所有这些信息.我希望生成的 df 与原始结果相同，只是行的顺序或列的顺序不同.

Edit: key is to do this without destroying the row/column labels of the dataframe. If you just shuffle df.index that loses all that information. I want the resulting df to be the same as the original except with the order of rows or order of columns different.

Edit2:我的问题不清楚.当我说洗牌时，我的意思是独立洗牌每一行.因此，如果您有两列 a 和 b，我希望每一行都单独排列，这样您在 a 和 b 就像您将每一行作为一个整体重新排序一样.类似的东西:


Edit2: My question was unclear. When I say shuffle the rows, I mean shuffle each row independently. So if you have two columns a and b, I want each row shuffled on its own, so that you don't have the same associations between a and b as you do if you just re-order each row as a whole. Something like: 
for 1...n:
  for each col in df: shuffle column
return new_df

但希望比幼稚的循环更有效.这对我不起作用:
But hopefully more efficient than naive looping. This does not work for me:
def shuffle(df, n, axis=0):
        shuffled_df = df.copy()
        for k in range(n):
            shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis)
        return shuffled_df

df = pandas.DataFrame({'A':range(10), 'B':range(10)})
shuffle(df, 5)


推荐答案

In [16]: def shuffle(df, n=1, axis=0):     
    ...:     df = df.copy()
    ...:     for _ in range(n):
    ...:         df.apply(np.random.shuffle, axis=axis)
    ...:     return df
    ...:     

In [17]: df = pd.DataFrame({'A':range(10), 'B':range(10)})

In [18]: shuffle(df)

In [19]: df
Out[19]: 
   A  B
0  8  5
1  1  7
2  7  3
3  6  2
4  3  4
5  0  1
6  9  0
7  4  6
8  2  8
9  5  9


                        这篇关于在 Pandas 中混洗/排列 DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在 Pandas 中混洗/排列 DataFrame [英] shuffling/permutating a DataFrame in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 Pandas 中混洗/排列 DataFrame [英] shuffling/permutating a DataFrame in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭